WO2022134577A1 - Translation error identification method and apparatus, and computer device and readable storage medium - Google Patents

Translation error identification method and apparatus, and computer device and readable storage medium Download PDF

Info

Publication number
WO2022134577A1
WO2022134577A1 PCT/CN2021/109257 CN2021109257W WO2022134577A1 WO 2022134577 A1 WO2022134577 A1 WO 2022134577A1 CN 2021109257 W CN2021109257 W CN 2021109257W WO 2022134577 A1 WO2022134577 A1 WO 2022134577A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
error
grammatical
content
page
Prior art date
Application number
PCT/CN2021/109257
Other languages
French (fr)
Chinese (zh)
Inventor
刘丽珍
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2022134577A1 publication Critical patent/WO2022134577A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/38Creation or generation of source code for implementing user interfaces

Definitions

  • the present application relates to the technical field of computer development, and in particular, to a translation error recognition method, device, computer equipment and readable storage medium, which are applied in the technical field of natural language processing of artificial intelligence.
  • the purpose of this application is to provide a translation error identification method, device, computer equipment and readable storage medium, which are used to solve the problem of using a large amount of manpower to identify page errors one by one in the prior art, resulting in long time consuming and low efficiency in checking, It is also prone to omissions that result in erroneous parts being missed.
  • the application provides a translation error identification method for checking the errors that occur in the second text obtained by the translation of the first text, including:
  • the application also provides a translation error identification device, comprising:
  • a text input module configured to obtain the second text of the first page, wherein the second text is obtained by translating the first text
  • An error recognition module for identifying a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;
  • An error labeling module configured to identify the specified label corresponding to the syntax error object and the content error object in the DOM tree of the first page, and set the specified label as a target label; replace the target label with a grammar A content annotation tag, so that the first page displays the second text according to the grammatical content annotation tag.
  • the present application also provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, which is implemented when the processor of the computer device executes the computer program. Steps of the above translation error identification method.
  • the present application also provides a computer-readable storage medium, on which a computer program is stored, and the above-mentioned translation error is realized when the computer program stored in the readable storage medium is executed by a processor. Identify the steps of the method.
  • the translation error identification method, device, computer equipment and readable storage medium provided by the present application improve the error identification speed and comprehensiveness, reduce manpower input, and avoid wrong omission.
  • Fig. 1 is the flow chart of the first embodiment of the translation error identification method of the application
  • FIG. 2 is a schematic diagram of the environmental application of the translation error identification method in Embodiment 2 of the translation error identification method of the application;
  • Fig. 3 is the concrete method flow chart of the translation error identification method in the second embodiment of the translation error identification method of the present application;
  • FIG. 4 is a schematic diagram of a program module of Embodiment 3 of the translation error identification device of the present application.
  • FIG. 5 is a schematic diagram of a hardware structure of a computer device in Embodiment 4 of the computer device of the present application.
  • the translation error identification method, device, computer equipment and readable storage medium provided by this application are suitable for the technical field of UI design developed by computer, and provide a text input module, error identification module, error labeling module, trigger module, configuration Translation error identification method for input module, spelling format module, spelling format recognition module, and preloading module.
  • the present application by identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text; identifying the grammatical error object and/or the content error object, in the The corresponding specified tag in the DOM tree of the first page, and the specified tag is set as the target tag; the target tag is replaced with a grammatical content tagging tag, so that the first page displays all the grammatical content tagging tags according to the grammatical content tagging tag. the second text.
  • a translation error identification method of the present embodiment is used to check the errors occurring in the second text obtained by the translation of the first text, including:
  • S102 Obtain the second text of the first page, wherein the second text is obtained by translating the first text;
  • S105 Identifying a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;
  • S107 Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, causing the first page to display the second text according to the grammatical content markup tag.
  • the DOM tree is extracted from the first page, the text information corresponding to the preset specified tags is obtained from the DOM tree, and the second text is formed by aggregating.
  • the comprehensiveness of recognition reduces manpower input, and avoids mistakes and omissions, thereby helping to avoid the situation where users who speak the second text as their mother tongue cannot accurately grasp the information conveyed by the second text.
  • the labeling of the grammatical error object and the content error object not only realizes the The effect of accurately locating the wrong position of the second text in the first page also avoids that the DOM tree is affected by directly marking the grammatical error objects and content error objects in the first page by highlighting and replacing them. Pollution occurs when it affects the positioning of Elements in the DOM.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • This embodiment is a specific application scenario of the above-mentioned Embodiment 1. Through this embodiment, the method provided in this application can be described more clearly and specifically.
  • the server running the translation error identification method identify the grammatical error objects with grammatical errors in the second text, and identify the content error objects with content errors in the second text, and analyze the grammar for the grammatical errors.
  • the method provided in this embodiment is specifically described by taking the error object and/or the content error object marked as an example. It should be noted that this embodiment is only exemplary, and does not limit the protection scope of the embodiment of this application.
  • FIG. 2 schematically shows a schematic diagram of an environmental application of the translation error identification method according to the second embodiment of the present application.
  • the server 2 where the translation error identification method is located is connected to the client 4 through a network 3 respectively; the server 2 may provide services through one or more networks 3, and the network 3 may include various network devices, such as Routers, switches, multiplexers, hubs, modems, bridges, repeaters, firewalls, proxy devices and/or etc.
  • the network 3 may include physical links such as coaxial cable links, twisted pair cable links, fiber optic links, combinations thereof, and/or the like.
  • the network 3 may include wireless links, such as cellular links, satellite links, Wi-Fi links and/or the like; the clients 4 may be computer devices such as smartphones, tablets, laptops, desktops, and the like.
  • FIG. 3 is a flowchart of a specific method of a translation error identification method provided by an embodiment of the present application, and the method specifically includes steps S201 to S208.
  • the loading rate of the first page of the browser is slow, when the browser completes the translation of the first text to obtain the second text, and the loading is completed with the second
  • the method of triggering the translation error identification method avoids affecting the browser to translate the first text and load the process with the second text, thereby ensuring the translation and loading rate of the first page.
  • the system when the administrator completes the correction of the errors in the second text through the translation error identification method, and causes the browser to load the first page with the second text again, the system will be triggered again.
  • the method for identifying translation errors is described, so as to ensure that the administrator can identify errors in the modified second text again, thereby improving the efficiency of identifying and modifying errors in the second text.
  • the browser runs on the client, and is used to display the first page, so that the developer can observe and use it.
  • S202 Obtain the second text of the first page, where the second text is obtained by translating the first text.
  • the DOM tree is extracted from the first page, the text information corresponding to the preset specified tags is obtained from the DOM tree, and the second text is formed by summarizing.
  • the DOM tree is extracted from the first page through the document.body.innerHTML code, and the BODY part of the DOM tree is obtained.
  • the specified label is an HTML label set according to requirements
  • the HTML label is a hypertext markup language (abbreviation in foreign language: HTML) markup label, which is the most basic unit in the HTML language
  • the HTML label is an HTML (under the standard general markup language).
  • the data attribute value of the specified label can also be obtained by means of bj.dataset.xx; the value attribute value of the specified label can be obtained by means of obj.value, and the data attribute value and/or the value attribute value can be obtained.
  • Function texts are formed in association with the designated texts, and the function texts reflect the prescribed attributes of the designated texts in the second text, so as to facilitate the management personnel to identify and manage the definitions and specifications of the designated texts.
  • the text displayed on the page is not only limited to the page itself, but also includes the pop-up box generated by operating on the page (such as clicking a button on the page), therefore, in order to avoid translation errors in the text on the pop-up box, which may cause the user to The information to be conveyed by the pop-up box cannot be accurately obtained.
  • the text information displayed on the pop-up box is also included in the second text by extracting the text information in the configuration file of the browser and loading it into the second text. In the second text, the text information on the pop-up box is avoided when checking the second text, resulting in the problem that the second text is not checked comprehensively.
  • S204 Identify misspelled words and incorrectly formatted characters in the second text, so as to obtain a misspelling object and a format error object, respectively.
  • this step is carried out by Identify misspelled words and incorrectly formatted characters in the second text, so as to obtain misspelled objects and formatted objects respectively; improve the speed and comprehensiveness of error recognition, reduce manpower input, and avoid wrong omission.
  • the step of recognizing misspelled words and formatted characters in the second text to obtain misspelling objects and format error objects respectively includes:
  • a storage space can be opened in the memory module of the computer device as a storage space for the spelling error list and the format error list.
  • the spelling errors in the spelling error list are deduplicated to reduce the memory occupation of the spelling error list
  • the format errors in the format error list are deduplicated to reduce the memory occupation of the format error list .
  • S42 Perform spelling proofreading on the words in the second text, take words with spelling errors as spelling error objects and save them in the spelling error list.
  • words in the second text are sequentially used as variables of a preset regular expression to obtain a word expression, and the word expression is used to search whether the dictionary has the word, so as to realize The technical effect of spelling proofreading the vocabulary in the second text; if the vocabulary does not exist in the dictionary, it is determined that the vocabulary has a spelling error; if the vocabulary exists in the dictionary, it is determined that the vocabulary does not exist. Has spelling errors.
  • a vocabulary list corpus used for natural language processing is used as the dictionary.
  • S43 Determine whether the format of the second text complies with a preset format rule, take the words and/or symbols with format errors as format error objects, and save the format error objects in the format error list.
  • the format rules include case rules and punctuation rules, for example, the case rules are after the carriage return ( ⁇ r, namely: return), the line feed ( ⁇ n, namely: newline), and the tab separator.
  • the first letter of the word is capitalized, and the punctuation rule is that all punctuation marks are the punctuation marks of the language corresponding to the second text.
  • the capitalization rules and punctuation rules can be set as required, and the set capitalization rules and punctuation rules are saved to a preset configuration file, so as to facilitate invoking to determine whether the second text has a format error.
  • S205 Identify a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text.
  • this step is based on the following steps: The part-of-speech of each vocabulary in the second text, identifying the grammatical error objects in the second text with grammatical errors, and matching each vocabulary in the second text with the vocabulary in the first text to identify the Content error objects with wrong content in the second text can accurately identify words with grammatical errors and content translation errors in the second text; improve the speed and comprehensiveness of error recognition, reduce manpower input, and avoid wrong omission.
  • the step of identifying a grammatical error object with a grammatical error in the second text, and identifying a content error object with a content error in the second text includes:
  • S51 Perform word segmentation on the second text to obtain text words, and mark the parts of speech of the text words to obtain a sequence of parts of speech arranged according to the order of the text words.
  • the second text paragraphs and sentences are segmented through a preset word segmentation library, and the entire text is segmented into word segments that can be marked, so as to implement word segmentation on the second text to obtain text words ; Mark the parts of speech of the text words through a preset dictionary, and arrange the text words with parts of speech according to the order of the text words in the second text to obtain a part-of-speech sequence.
  • ICTCLAS is used as the word segmentation database to perform word segmentation on the second text, wherein ICTCLAS refers to the Chinese Lexical Analysis System (Institute of Computing Technology, Chinese Lexical Analysis System). System), its main functions include Chinese word segmentation, part-of-speech tagging, named entity recognition, new word recognition, and support for user dictionaries.
  • the dictionary refers to a dictionary with part-of-speech tags corresponding to the second text, such as: The Natural Language Processing Dictionary, The Prolog Dictionary, The Artificial Intelligence Dictionary, Machine The Machine Learning Dictionary, etc.
  • S52 Determine whether there is an erroneous logical sequence in the part-of-speech sequence through a preset logic rule; if so, set the text word corresponding to the erroneous logical sequence as a grammatical error object; if not, determine the The second text described has no grammatical errors.
  • a preset error grammar template is used as the logic rule, wherein the error grammar template is a part-of-speech error sequence constructed by at least one part-of-speech, for example: verb, verb, verb.
  • the logic rules are stored by constructing a grammar library, the logic rules are stored in the grammar library in XML format, and the rules in the grammar library are defined according to the pattern.xsd and/or rules.xsd templates. Error grammar template, and use it as the logic rule, so that the logic rule becomes a regular expression, so as to find the text word with grammar error in the part-of-speech sequence and set it as the grammar error object.
  • the logic rules can also be defined in other formats.
  • S53 Translate the text words to obtain back-translated words, and determine whether the back-translated words appear in the first text; if so, set the text words corresponding to the back-translated words as content errors object; if not, it is determined that the second text has no grammatical errors.
  • a multi-thread processing method is used to translate the text words to obtain back-translated words, and the back-translated words are used as variables of a preset regular expression to obtain a retrieval expression; through the retrieval expression retrieve the first text to determine whether the back-translated word appears in the first text; if so, set the text word corresponding to the back-translated word as a content error object; if not, then It is determined that the second text has no grammatical errors.
  • the words with the associated relationship in the text need to be translated by the same thread, so that the multi-threaded processing method cannot be used to translate the text without word segmentation; and this step
  • the text words in the text are obtained by segmenting the second text, so there is no correlation between the obtained text words, so the text words can be translated directly through multi-thread processing, improving the The technical effect of translating text words and examining them.
  • the method further includes:
  • the corresponding summary information is obtained based on the grammatical error object and/or the content error object.
  • the summary information is obtained by hashing the grammatical error object and/or the content error object, for example, by using the sha256s algorithm.
  • Uploading summary information to the blockchain ensures its security and fairness and transparency to users.
  • the user equipment can download the summary information from the blockchain in order to verify whether the grammatical error object and/or the content error object has been tampered with.
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • S206 Identify the specified tags corresponding to the misspelling object and the format error object in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with spelling format annotations label, causing the first page to display the second text according to the spelling format labeling label.
  • this step identifies the corresponding designated tags of the misspelling object and the format error object in the DOM tree of the first page, and assigns them
  • the method of replacing the label with the spelling format label, and then labeling the misspelling object and the format error object not only realizes the effect of accurately locating the wrong position of the second text on the first page, but also avoids the error caused by the first page.
  • the misspelling object and the format error object in the page are directly marked by highlighting replacement, which causes the DOM tree to be polluted and affects the positioning of the Element (element) in the DOM number.
  • the content of the spelling format labeling label can be set as required, such as: highlighting label, commenting label, etc., so that the text corresponding to the spelling format labeling label, on the first page, will be marked with Highlights, comments, etc. are displayed.
  • parent.childNodes Get all child nodes Obtain all child nodes (ie child nodes) in the DOM tree, then identify the specified label corresponding to the misspelling object and/or format error object in the child node, and set it as the target Label;
  • the square brackets [] and the content inside are the misspelling object and the format error object, and the meaning expressed in combination with the square brackets is any one of the characters (single characters) inside.
  • S207 Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, causing the first page to display the second text according to the grammatical content markup tag.
  • this step identifies the corresponding specified label of the grammatical error object and the content error object in the DOM tree of the first page, and puts them in the DOM tree.
  • the method of replacing the grammatical content labeling label, and then labeling the grammatical error object and the content error object not only realizes the effect of accurately locating the wrong position of the second text on the first page, but also avoids the first page.
  • the syntax error object and the content error object in the page are directly marked by highlighting replacement, which causes the DOM tree to be polluted and affects the positioning of Element (element) in the DOM number.
  • the content of the grammatical content labeling label can be set as required, such as: highlighting labels, commenting labels, etc., so that the text corresponding to the grammatical content labeling label, on the first page, will be marked with Highlights, comments, etc. are displayed.
  • parent.childNodes Get all child nodes Obtain all child nodes (ie child nodes) in the DOM tree, then identify the specified label corresponding to the syntax error object and/or content error object in the child node, and set it as the target Label;
  • the square brackets [] and the content inside are syntax error objects and content error objects, and the meaning expressed in combination with the square brackets is any one of the characters (single characters) inside.
  • S208 Acquire a first URL of the first page, identify a second URL according to the first URL, and send the second URL to the browser.
  • the first URL of the first page identifying the second URL according to the first URL, and sending the second URL to the browser, so that the browser can preload the second URL with the second URL.
  • the second page of text, or directly loading the second page after the execution of S207 is completed, so as to improve the efficiency of the administrator continuing to check the errors of the second text in the second page.
  • the first URL of the first page is obtained, the server where the first URL is located is accessed, and the second URL located next to the first URL is obtained, and the second URL is sent to the browser.
  • the first URL is: https://zhidao.baidu.com/question/3591601.html
  • access the server where the first URL is located access the server where the first URL is located, and obtain the second URL located next to the first URL: https://zhidao.baidu.com/question/3591602.html.
  • the URL is a Uniform Resource Locator, which is an address of a standard resource on the Internet. And every file bai on the Internet has a unique URL, which contains information that points out the location of the file.
  • a translation error recognition device 1 of the present embodiment includes:
  • a text input module 12 configured to obtain the second text of the first page, wherein the second text is obtained by translating the first text
  • An error identification module 15 configured to identify a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;
  • the error labeling module 17 is used to identify the specified label corresponding to the grammatical error object and the content error object in the DOM tree of the first page, and set the specified label as a target label; replace the target label with A grammatical content annotation tag, so that the first page displays the second text according to the grammatical content annotation tag.
  • the translation error identification device 1 further includes:
  • the triggering module 11 is configured to enable the translation error identification device 1 to be configured to complete the first page with the second text in response to the browser loading.
  • the translation error identification device 1 further includes:
  • the configuration input module 13 is configured to extract the text information in the configuration file of the browser and load it into the second text.
  • the translation error identification device 1 further includes:
  • the spelling format module 14 is configured to identify the misspelled words and the incorrectly formatted characters in the second text, so as to obtain a spelling error object and a format error object respectively.
  • the translation error identification device 1 further includes:
  • the spelling format recognition module 16 is used to identify the specified tag corresponding to the misspelling object and the format error object in the DOM tree of the first page, and set the specified tag as a target tag; set the target The label is replaced with a spelling format markup label, so that the first page displays the second text according to the spelling format markup label.
  • the translation error identification device 1 further includes:
  • the preloading module 18 is configured to acquire the first URL of the first page, identify the second URL according to the first URL, and send the second URL to the browser.
  • the technical solution is applied in the field of UI design developed by computer, by identifying the grammatical error objects with grammatical errors in the second text, and identifying the content error objects with content errors in the second text, and classifying the grammatical errors
  • the specified label corresponding to the object and/or the content error object is replaced with a grammatical content labeling label, so that the first page becomes an H5 page displaying the second text according to the grammatical content labeling label.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the present application also provides a computer device 5, the components of the translation error recognition device 1 of the third embodiment can be dispersed in different computer devices, and the computer device 5 can be a smart phone, tablet computer, Notebook computers, desktop computers, rack servers, blade servers, tower servers or rack servers (including independent servers, or server clusters composed of multiple application servers), etc.
  • the computer device in this embodiment at least includes but is not limited to: a memory 51 and a processor 52 that can be communicatively connected to each other through a system bus, as shown in FIG. 5 .
  • FIG. 5 only shows a computer device having a component -, but it should be understood that it is not required to implement all the shown components, and more or less components may be implemented instead.
  • the memory 51 (ie, a readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disc, etc.
  • the memory 51 may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device.
  • the memory 51 may also be an external storage device of a computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, Flash Card (Flash Card), etc.
  • the memory 51 may also include both the internal storage unit of the computer device and its external storage device.
  • the memory 51 is generally used to store the operating system and various application software installed in the computer equipment, such as the program code of the translation error identification device of the third embodiment.
  • the memory 51 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 52 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 52 is typically used to control the overall operation of the computer device.
  • the processor 52 is configured to run the program code or process data stored in the memory 51 , for example, run the translation error identification device, so as to implement the translation error identification methods of the first and second embodiments.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • the present application also provides a computer-readable storage medium, which can be non-volatile or volatile, such as flash memory, hard disk, multimedia card, card-type memory (for example, , SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM) ), magnetic storage, magnetic disk, optical disk, server, App application mall, etc., on which computer programs are stored, and when the programs are executed by the processor 52, corresponding functions are realized.
  • the computer-readable storage medium of this embodiment is used to store the translation error identification device, and when executed by the processor 52, implements the translation error identification methods of the first embodiment and the second embodiment.

Abstract

The present application relates to the field of computer development. Disclosed are a translation error identification method and apparatus, and a computer device and a readable storage medium. The translation error identification method comprises: acquiring second text of a first page, wherein the second text is obtained by translating first text; identifying a grammatical error object including a grammatical error in the second text and a content error object including a content error in the second text; identifying, in a DOM tree of the first page, a specified label corresponding to the grammatical error object and the content error object, and setting the specified label as a target label; and replacing the target label with a grammatical content marking label, such that the first page displays the second text according to the grammatical content marking label. In the present application, words and phrases including grammatical errors and content translation errors in second text are accurately identified, so as to automatically identify a part including an error in the second text, thereby improving error identification speed and identification comprehensiveness, reducing the manpower input, and avoiding the omission of errors.

Description

翻译错误识别方法、装置、计算机设备及可读存储介质Translation error identification method, apparatus, computer device and readable storage medium
本申请要求于2020年12月22日递交的申请号为CN 202011528714.1、名称为“翻译错误识别方法、装置、计算机设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number CN 202011528714.1 and the title of "Translation Error Recognition Method, Device, Computer Equipment and Readable Storage Medium" filed on December 22, 2020, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及计算机开发技术领域,尤其涉及一种翻译错误识别方法、装置、计算机设备及可读存储介质,其应用在人工智能的自然语言处理的技术领域。The present application relates to the technical field of computer development, and in particular, to a translation error recognition method, device, computer equipment and readable storage medium, which are applied in the technical field of natural language processing of artificial intelligence.
背景技术Background technique
网站在设计网页时,通常会采用多种语言的文本(如第一文本:中文,第二文本:英文),然而,测试人员在测试第二文本(英文)的页面时,由于页面中的内容十分庞杂,通常难以对每个词汇翻译的内容及每一段的语法中出现的错误进行识别。When designing web pages, websites usually use texts in multiple languages (such as the first text: Chinese, the second text: English). However, when testers test the pages of the second text (English), due to the content of the page It is very complex, and it is often difficult to identify the content of each vocabulary translation and the grammatical errors in each paragraph.
发明人发现,纵使管理者采用大量人力对上述网站中页面的错误进行逐一识别,其过程不仅检查耗时长,效率低下,还很容易出现疏漏导致出现错误的部分被遗漏。The inventors found that even if the administrator uses a lot of manpower to identify the errors on the pages of the above website one by one, the process is not only time-consuming and inefficient, but also prone to omissions that cause errors to be omitted.
发明内容SUMMARY OF THE INVENTION
本申请的目的是提供一种翻译错误识别方法、装置、计算机设备及可读存储介质,用于解决现有技术存在的采用大量人力对页面的错误进行逐一识别,导致检查耗时长,效率低下,还很容易出现疏漏导致出现错误的部分被遗漏的问题。The purpose of this application is to provide a translation error identification method, device, computer equipment and readable storage medium, which are used to solve the problem of using a large amount of manpower to identify page errors one by one in the prior art, resulting in long time consuming and low efficiency in checking, It is also prone to omissions that result in erroneous parts being missed.
为实现上述目的,本申请提供一种翻译错误识别方法,用于检查由第一文本翻译获得的第二文本中出现的错误,包括: In order to achieve the above object, the application provides a translation error identification method for checking the errors that occur in the second text obtained by the translation of the first text, including:
获取第一页面的第二文本,其中,所述第二文本由第一文本翻译获得;acquiring the second text of the first page, wherein the second text is obtained by translating the first text;
识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象;Identifying grammatical error objects with grammatical errors in the second text, and content error objects with content errors in the second text;
识别所述第一页面的DOM树中与所述语法错误对象和内容错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, so that all The first page displays the second text according to the grammatical content annotation tag.
为实现上述目的,本申请还提供一种翻译错误识别装置,包括:In order to achieve the above purpose, the application also provides a translation error identification device, comprising:
文本输入模块,用于获取第一页面的第二文本,其中,所述第二文本由第一文本翻译获得;a text input module, configured to obtain the second text of the first page, wherein the second text is obtained by translating the first text;
错误识别模块,用于识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象;An error recognition module for identifying a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;
错误标注模块,用于识别所述第一页面的DOM树中与所述语法错误对象和内容错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。An error labeling module, configured to identify the specified label corresponding to the syntax error object and the content error object in the DOM tree of the first page, and set the specified label as a target label; replace the target label with a grammar A content annotation tag, so that the first page displays the second text according to the grammatical content annotation tag.
为实现上述目的,本申请还提供一种计算机设备,其包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述计算机设备的处理器执行所述计算机程序时实现上述翻译错误识别方法的步骤。In order to achieve the above object, the present application also provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, which is implemented when the processor of the computer device executes the computer program. Steps of the above translation error identification method.
为实现上述目的,本申请还提供一种计算机可读存储介质,所述可读存储介质上存储有计算机程序,所述可读存储介质存储的所述计算机程序被处理器执行时实现上述翻译错误识别方法的步骤。In order to achieve the above purpose, the present application also provides a computer-readable storage medium, on which a computer program is stored, and the above-mentioned translation error is realized when the computer program stored in the readable storage medium is executed by a processor. Identify the steps of the method.
本申请提供的翻译错误识别方法、装置、计算机设备及可读存储介质,提高了错误的识别速度及识别全面度,降低人力的投入,避免了错误的遗漏。The translation error identification method, device, computer equipment and readable storage medium provided by the present application improve the error identification speed and comprehensiveness, reduce manpower input, and avoid wrong omission.
附图说明Description of drawings
图1为本申请翻译错误识别方法实施例一的流程图;Fig. 1 is the flow chart of the first embodiment of the translation error identification method of the application;
图2为本申请翻译错误识别方法实施例二中翻译错误识别方法的环境应用示意图;2 is a schematic diagram of the environmental application of the translation error identification method in Embodiment 2 of the translation error identification method of the application;
图3是本申请翻译错误识别方法实施例二中翻译错误识别方法的具体方法流程图;Fig. 3 is the concrete method flow chart of the translation error identification method in the second embodiment of the translation error identification method of the present application;
图4为本申请翻译错误识别装置实施例三的程序模块示意图;4 is a schematic diagram of a program module of Embodiment 3 of the translation error identification device of the present application;
图5为本申请计算机设备实施例四中计算机设备的硬件结构示意图。FIG. 5 is a schematic diagram of a hardware structure of a computer device in Embodiment 4 of the computer device of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
本申请提供的翻译错误识别方法、装置、计算机设备及可读存储介质,适用于计算机开发的UI设计技术领域,为提供一种基于文本输入模块、错误识别模块、错误标注模块、触发模块、配置输入模块、拼写格式模块、拼写格式识别模块、预加载模块的翻译错误识别方法。本申请通过识别第二文本中出现语法错误的语法错误对象,及识别所述第二文本中出现内容错误的内容错误对象;识别所述语法错误对象和/或所述内容错误对象,在所述第一页面的DOM树中对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。The translation error identification method, device, computer equipment and readable storage medium provided by this application are suitable for the technical field of UI design developed by computer, and provide a text input module, error identification module, error labeling module, trigger module, configuration Translation error identification method for input module, spelling format module, spelling format recognition module, and preloading module. In the present application, by identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text; identifying the grammatical error object and/or the content error object, in the The corresponding specified tag in the DOM tree of the first page, and the specified tag is set as the target tag; the target tag is replaced with a grammatical content tagging tag, so that the first page displays all the grammatical content tagging tags according to the grammatical content tagging tag. the second text.
实施例一:Example 1:
请参阅图1,本实施例的一种翻译错误识别方法,用于检查由第一文本翻译获得的第二文本中出现的错误,包括:Referring to FIG. 1, a translation error identification method of the present embodiment is used to check the errors occurring in the second text obtained by the translation of the first text, including:
S102:获取第一页面的第二文本,其中,所述第二文本由第一文本翻译获得;S102: Obtain the second text of the first page, wherein the second text is obtained by translating the first text;
S105:识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象;S105: Identifying a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;
S107:识别所述第一页面的DOM树中与所述语法错误对象和内容错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。S107: Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, causing the first page to display the second text according to the grammatical content markup tag.
在示例性的实施例中,从所述第一页面中提取DOM树,从所述DOM树中获取预置指定标签所对应的文字信息,并汇总形成第二文本。In an exemplary embodiment, the DOM tree is extracted from the first page, the text information corresponding to the preset specified tags is obtained from the DOM tree, and the second text is formed by aggregating.
根据所述第二文本中各词汇的词性,识别所述第二文本中出现语法错误的语法错误对象,及根据所述第二文本中各词汇与第一文本中的词汇进行匹配,以识别所述第二文本中出现内容错误的内容错误对象,实现准确识别出第二文本中出现语法错误和内容翻译错误的词汇,以自动识别第二文本中出现错误的部分,提高了错误的识别速度及识别全面度,降低人力的投入,避免了错误的遗漏,进而有助于避免以第二文本为母语的用户无法准确把握第二文本所传达的信息的情况发生。Identifying grammatical error objects with grammatical errors in the second text according to the part of speech of each vocabulary in the second text, and matching each vocabulary in the second text with the vocabulary in the first text to identify all the grammatical error objects in the second text Describe the content error object with content error in the second text, realize the accurate identification of the words with grammatical errors and content translation errors in the second text, so as to automatically identify the wrong part in the second text, improve the error recognition speed and speed. The comprehensiveness of recognition reduces manpower input, and avoids mistakes and omissions, thereby helping to avoid the situation where users who speak the second text as their mother tongue cannot accurately grasp the information conveyed by the second text.
通过识别语法错误对象和内容错误对象在第一页面的DOM树中对应的指定标签,并将其替换为语法内容标注标签的方式,进而对语法错误对象和内容错误对象的标注,不仅实现了在第一页面中对第二文本出现错误的位置进行精准定位的效果,还避免了因对第一页面中语法错误对象和内容错误对象直接采用高亮替换的方式进行标注,导致所述DOM树受到污染,对DOM数中对Element(元素)的定位造成影响的情况发生。By identifying the corresponding designated tags of the grammatical error object and the content error object in the DOM tree of the first page, and replacing it with the method of labeling the grammatical content, the labeling of the grammatical error object and the content error object not only realizes the The effect of accurately locating the wrong position of the second text in the first page also avoids that the DOM tree is affected by directly marking the grammatical error objects and content error objects in the first page by highlighting and replacing them. Pollution occurs when it affects the positioning of Elements in the DOM.
实施例二:Embodiment 2:
本实施例为上述实施例一的一种具体应用场景,通过本实施例,能够更加清楚、具体地阐述本申请所提供的方法。This embodiment is a specific application scenario of the above-mentioned Embodiment 1. Through this embodiment, the method provided in this application can be described more clearly and specifically.
下面,以在运行有翻译错误识别方法的服务器中,识别所述第二文本中出现语法错误的语法错误对象,及识别所述第二文本中出现内容错误的内容错误对象,并对所述语法错误对象和/或所述内容错误对象进行标注为例,来对本实施例提供的方法进行具体说明。需要说明的是,本实施例只是示例性的,并不限制本申请实施例所保护的范围。Next, in the server running the translation error identification method, identify the grammatical error objects with grammatical errors in the second text, and identify the content error objects with content errors in the second text, and analyze the grammar for the grammatical errors. The method provided in this embodiment is specifically described by taking the error object and/or the content error object marked as an example. It should be noted that this embodiment is only exemplary, and does not limit the protection scope of the embodiment of this application.
图2示意性示出了根据本申请实施例二的翻译错误识别方法的环境应用示意图。FIG. 2 schematically shows a schematic diagram of an environmental application of the translation error identification method according to the second embodiment of the present application.
在示例性的实施例中,翻译错误识别方法所在的服务器2通过网络3分别连接客户端4;所述服务器2可以通过一个或多个网络3提供服务,网络3可以包括各种网络设备,例如路由器,交换机,多路复用器,集线器,调制解调器,网桥,中继器,防火墙,代理设备和/或等等。网络3可以包括物理链路,例如同轴电缆链路,双绞线电缆链路,光纤链路,它们的组合和/或类似物。网络3可以包括无线链路,例如蜂窝链路,卫星链路,Wi-Fi链路和/或类似物;所述客户端4可为智能手机、平板电脑、笔记本电脑、台式电脑等计算机设备。In an exemplary embodiment, the server 2 where the translation error identification method is located is connected to the client 4 through a network 3 respectively; the server 2 may provide services through one or more networks 3, and the network 3 may include various network devices, such as Routers, switches, multiplexers, hubs, modems, bridges, repeaters, firewalls, proxy devices and/or etc. The network 3 may include physical links such as coaxial cable links, twisted pair cable links, fiber optic links, combinations thereof, and/or the like. The network 3 may include wireless links, such as cellular links, satellite links, Wi-Fi links and/or the like; the clients 4 may be computer devices such as smartphones, tablets, laptops, desktops, and the like.
图3是本申请一个实施例提供的一种翻译错误识别方法的具体方法流程图,该方法具体包括步骤S201至S208。FIG. 3 is a flowchart of a specific method of a translation error identification method provided by an embodiment of the present application, and the method specifically includes steps S201 to S208.
S201:响应于浏览器加载完成具有所述第二文本的第一页面。S201: Complete the first page with the second text in response to the browser loading.
为避免实施的分担所述浏览器的算力,导致浏览器的第一页面加载速率缓慢,当浏览器完成对所述第一文本进行翻译得到第二文本,且,加载完成具有所述第二文本的第一页面时,触发所述翻译错误识别方法的方式,避免影响浏览器翻译所述第一文本,及加载具有所述第二文本的进程,保证了第一页面翻译及加载的速率。In order to avoid sharing the computing power of the browser, the loading rate of the first page of the browser is slow, when the browser completes the translation of the first text to obtain the second text, and the loading is completed with the second When the first page of the text is used, the method of triggering the translation error identification method avoids affecting the browser to translate the first text and load the process with the second text, thereby ensuring the translation and loading rate of the first page.
并且,当管理者通过所述翻译错误识别方法完成对所述第二文本中的错误的修改后,再次使所述浏览器加载具有所述第二文本的第一页面时,将会再次触发所述翻译错误识别方法,以保证管理者再次对其修改后的第二文本进行错误识别,提高了对第二文本中错误识别及修改作业的效率。In addition, when the administrator completes the correction of the errors in the second text through the translation error identification method, and causes the browser to load the first page with the second text again, the system will be triggered again. The method for identifying translation errors is described, so as to ensure that the administrator can identify errors in the modified second text again, thereby improving the efficiency of identifying and modifying errors in the second text.
所述浏览器运行在客户端上,其用于展示所述第一页面,以便于开发人员观察使用。The browser runs on the client, and is used to display the first page, so that the developer can observe and use it.
S202:获取第一页面的第二文本,其中,所述第二文本由第一文本翻译获得。S202: Obtain the second text of the first page, where the second text is obtained by translating the first text.
本步骤中,从所述第一页面中提取DOM树,从所述DOM树中获取预置指定标签所对应的文字信息,并汇总形成第二文本。In this step, the DOM tree is extracted from the first page, the text information corresponding to the preset specified tags is obtained from the DOM tree, and the second text is formed by summarizing.
在示例性的实施例中,通过document.body.innerHTML代码,从所述第一页面中提取DOM树,并获得所述DOM树的BODY部分。In an exemplary embodiment, the DOM tree is extracted from the first page through the document.body.innerHTML code, and the BODY part of the DOM tree is obtained.
所述指定标签是根据需求设置的HTML标签,所述HTML标签是超文本标记语言(外国语简称:HTML)标记标签,其为HTML语言中最基本的单位,HTML标签是HTML(标准通用标记语言下的一个应用)最重要的组成部分,用于对所述第一页面中的文本、图片、输入框等进行标记;例如:所述指定标签包括<div><span><a><button><label><ui><li><input>。The specified label is an HTML label set according to requirements, the HTML label is a hypertext markup language (abbreviation in foreign language: HTML) markup label, which is the most basic unit in the HTML language, and the HTML label is an HTML (under the standard general markup language). An application) of the most important component, used to mark the text, pictures, input boxes, etc. in the first page; for example: the specified tags include <div><span><a><button>< label><ui><li><input>.
通过getElementsByTagName(“tagName”).innerHTML()代码,从所述获取所述DOM树(如上述举例的DOM树的BODY部分)指定标签对应的指定文本,汇总所述指定文本得到所述第二文本。Through the getElementsByTagName("tagName").innerHTML() code, obtain the specified text corresponding to the specified tag in the DOM tree (such as the BODY part of the DOM tree in the above example), and summarize the specified text to obtain the second text .
进一步地,还可通过bj.dataset.xx的方式获取所述指定标签的数据属性值;通过obj.value方式获取所述指定标签的value属性值,将所述数据属性值和/或value属性值与所述指定文本关联形成功能文本,所述功能文本反映了所述第二文本中各指定文本的规定属性,以便于管理人员对各指定文本的定义和规范进行识别和管理。Further, the data attribute value of the specified label can also be obtained by means of bj.dataset.xx; the value attribute value of the specified label can be obtained by means of obj.value, and the data attribute value and/or the value attribute value can be obtained. Function texts are formed in association with the designated texts, and the function texts reflect the prescribed attributes of the designated texts in the second text, so as to facilitate the management personnel to identify and manage the definitions and specifications of the designated texts.
S203:提取所述浏览器的配置文件中的文本信息,并将其载入所述第二文本;S203: Extract the text information in the configuration file of the browser, and load it into the second text;
由于页面上显示的文本不仅限于页面本身,也包括通过在页面上进行操作(如:点击页面上的按钮)所生成的弹框,因此,为避免弹框上的文本出现翻译错误,导致使用者无法准确获得该弹框所要传达的信息,本步骤通过提取所述浏览器的配置文件中的文本信息并将其载入所述第二文本的方式,将弹框上展示的文本信息也纳入到所述第二文本之中,进而避免检查第二文本时疏漏掉所述弹框上的文本信息,造成第二文本检查不全面的问题发生。Since the text displayed on the page is not only limited to the page itself, but also includes the pop-up box generated by operating on the page (such as clicking a button on the page), therefore, in order to avoid translation errors in the text on the pop-up box, which may cause the user to The information to be conveyed by the pop-up box cannot be accurately obtained. In this step, the text information displayed on the pop-up box is also included in the second text by extracting the text information in the configuration file of the browser and loading it into the second text. In the second text, the text information on the pop-up box is avoided when checking the second text, resulting in the problem that the second text is not checked comprehensively.
S204:识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象。S204: Identify misspelled words and incorrectly formatted characters in the second text, so as to obtain a misspelling object and a format error object, respectively.
为实现自动识别第二文本中出现错误的部分,提高错误识别速度,降低人力的投入,进而避免以第二文本为母语的用户无法准确把握第二文本所传达的信息的情况发生,本步骤通过识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象;提高了错误的识别速度及识别全面度,降低人力的投入,避免了错误的遗漏。In order to realize the automatic identification of the wrong part in the second text, improve the speed of wrong identification, reduce the input of manpower, and thus avoid the situation that the users who speak the second text as their mother tongue cannot accurately grasp the information conveyed by the second text, this step is carried out by Identify misspelled words and incorrectly formatted characters in the second text, so as to obtain misspelled objects and formatted objects respectively; improve the speed and comprehensiveness of error recognition, reduce manpower input, and avoid wrong omission.
在一个优选的是实施例中,所述识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象的步骤包括:In a preferred embodiment, the step of recognizing misspelled words and formatted characters in the second text to obtain misspelling objects and format error objects respectively includes:
S41:创建拼写错误列表和格式错误列表。S41: Create a list of spelling errors and a list of formatting errors.
本步骤中,通过创建拼写错误列表“SpellingErrorList”和格式错误列表“FormatErrorList”,实现规范化的管理所述第二文本中出现的拼写错误和格式错误,以便于后续对拼写错误和格式错误采用不同的标注方式进行标注,便于管理者的查看和识别。In this step, by creating a spelling error list "SpellingErrorList" and a format error list "FormatErrorList", standardized management of spelling errors and format errors occurring in the second text is implemented, so as to facilitate subsequent use of different spelling errors and format errors for spelling errors and format errors The labeling method is used for labeling, which is convenient for managers to view and identify.
于本实施例中,可在计算机设备的内存模块中开辟存储空间,以作为所述拼写错误列表和格式错误列表的保存空间。In this embodiment, a storage space can be opened in the memory module of the computer device as a storage space for the spelling error list and the format error list.
优选的,对所述拼写错误列表中的拼写错误进行去重,以降低拼写错误列表的内存占有量,对所述格式错误列表中的格式错误进行去重,以降低格式错误列表的内存占有量。Preferably, the spelling errors in the spelling error list are deduplicated to reduce the memory occupation of the spelling error list, and the format errors in the format error list are deduplicated to reduce the memory occupation of the format error list .
S42:对所述第二文本中的词汇进行拼写校对,将具有拼写错误的词汇作为拼写错误对象并将其保存至所述拼写错误列表中。S42: Perform spelling proofreading on the words in the second text, take words with spelling errors as spelling error objects and save them in the spelling error list.
本步骤中,将所述第二文本中的词汇依次作为预置的正则表达式的变量得到词汇表达式,通过所述字词表达式检索所述词典中是否具有所述词汇,以实现对所述第二文本中的词汇进行拼写校对的技术效果;若所述词典中不具有所述词汇,则判定所述词汇具有拼写错误;若所述词典中具有所述词汇,则判定所述词汇不具有拼写错误。In this step, words in the second text are sequentially used as variables of a preset regular expression to obtain a word expression, and the word expression is used to search whether the dictionary has the word, so as to realize The technical effect of spelling proofreading the vocabulary in the second text; if the vocabulary does not exist in the dictionary, it is determined that the vocabulary has a spelling error; if the vocabulary exists in the dictionary, it is determined that the vocabulary does not exist. Has spelling errors.
其中,采用用于进行自然语言处理的词汇列表语料库作为所述词典。Wherein, a vocabulary list corpus used for natural language processing is used as the dictionary.
S43:判断所述第二文本的格式是否符合预置的格式规则,将出现格式错误的词汇和/或符号作为格式错误对象,并将所述格式错误对象保存至所述格式错误列表中。S43: Determine whether the format of the second text complies with a preset format rule, take the words and/or symbols with format errors as format error objects, and save the format error objects in the format error list.
本步骤中,所述格式规则包括大小写规则和标点规则,如:所述大小写规则是在回车符(\r 即:return)、换行符(\n 即:newline)、tab分隔符之后的单词首字母为大写,标点规则是所有的标点符号均为第二文本对应语言的标点符号。In this step, the format rules include case rules and punctuation rules, for example, the case rules are after the carriage return (\r, namely: return), the line feed (\n, namely: newline), and the tab separator. The first letter of the word is capitalized, and the punctuation rule is that all punctuation marks are the punctuation marks of the language corresponding to the second text.
所述大小写规则和标点规则可根据需要设置,将并将设置好的大小写规则和标点规则保存至预置的配置文件中,以便于调用判断第二文本是否具有格式错误。The capitalization rules and punctuation rules can be set as required, and the set capitalization rules and punctuation rules are saved to a preset configuration file, so as to facilitate invoking to determine whether the second text has a format error.
S205:识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象。为实现自动识别第二文本中出现错误的部分,提高错误识别速度,降低人力的投入,进而避免以第二文本为母语的用户无法准确把握第二文本所传达的信息的情况发生,本步骤根据所述第二文本中各词汇的词性,识别所述第二文本中出现语法错误的语法错误对象,及根据所述第二文本中各词汇与第一文本中的词汇进行匹配,以识别所述第二文本中出现内容错误的内容错误对象,实现准确识别出第二文本中出现语法错误和内容翻译错误的词汇;提高了错误的识别速度及识别全面度,降低人力的投入,避免了错误的遗漏。S205: Identify a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text. In order to realize the automatic recognition of the wrong part in the second text, improve the speed of error recognition, reduce the input of manpower, and thus avoid the situation that users who speak the second text as their mother tongue cannot accurately grasp the information conveyed by the second text, this step is based on the following steps: The part-of-speech of each vocabulary in the second text, identifying the grammatical error objects in the second text with grammatical errors, and matching each vocabulary in the second text with the vocabulary in the first text to identify the Content error objects with wrong content in the second text can accurately identify words with grammatical errors and content translation errors in the second text; improve the speed and comprehensiveness of error recognition, reduce manpower input, and avoid wrong omission.
在一个优选的实施例中,所述识别所述第二文本中出现语法错误的语法错误对象,及识别所述第二文本中出现内容错误的内容错误对象的步骤,包括:In a preferred embodiment, the step of identifying a grammatical error object with a grammatical error in the second text, and identifying a content error object with a content error in the second text, includes:
S51:对所述第二文本进行分词得到文本字词,标注所述文本字词的词性得到按照所述文本字词的排序排列的词性序列。S51: Perform word segmentation on the second text to obtain text words, and mark the parts of speech of the text words to obtain a sequence of parts of speech arranged according to the order of the text words.
本步骤中,通过预置的分词库切分所述第二文本段落和语句,将整块的文本切分可标注的字词段,以实现对所述第二文本进行分词得到文本字词;通过预置的字典标注所述文本字词的词性,并按照所述文本字词在第二文本中的排序,排列具有词性的文本字词得到词性序列。In this step, the second text paragraphs and sentences are segmented through a preset word segmentation library, and the entire text is segmented into word segments that can be marked, so as to implement word segmentation on the second text to obtain text words ; Mark the parts of speech of the text words through a preset dictionary, and arrange the text words with parts of speech according to the order of the text words in the second text to obtain a part-of-speech sequence.
需要说明的是,采用第三方分词库 ICTCLAS作为所述分词库对所述第二文本进行分词,其中,ICTCLAS是指汉语词法分析系统 (Institute of Computing Technology, Chinese Lexical Analysis System),其主要功能包括中文分词、词性标注、命名实体识别、新词识别、同时支持用户词典。It should be noted that a third-party lexicon is used ICTCLAS is used as the word segmentation database to perform word segmentation on the second text, wherein ICTCLAS refers to the Chinese Lexical Analysis System (Institute of Computing Technology, Chinese Lexical Analysis System). System), its main functions include Chinese word segmentation, part-of-speech tagging, named entity recognition, new word recognition, and support for user dictionaries.
所述字典是指带有第二文本对应语言词性标注的字典,例如:自然语言处理词典(The Natural Language Processing Dictionary)、Prolog词典(The Prolog Dictionary) 、人工智能词典(The Artificial Intelligence Dictionary) 、机器学习词典(The Machine Learning Dictionary)等。The dictionary refers to a dictionary with part-of-speech tags corresponding to the second text, such as: The Natural Language Processing Dictionary, The Prolog Dictionary, The Artificial Intelligence Dictionary, Machine The Machine Learning Dictionary, etc.
S52:通过预置的逻辑规则判断所述词性序列中是否出现错误的逻辑序列;若是,则将所述出现错误的逻辑序列所对应的文本字词设为语法错误对象;若否,则判定所述第二文本不具有语法错误。S52: Determine whether there is an erroneous logical sequence in the part-of-speech sequence through a preset logic rule; if so, set the text word corresponding to the erroneous logical sequence as a grammatical error object; if not, determine the The second text described has no grammatical errors.
本步骤中,通过预置的错误语法模板作为所述逻辑规则,其中,所述错误语法模板是由至少一个词性所构建的词性错误序列,例如:动词,动词,动词。In this step, a preset error grammar template is used as the logic rule, wherein the error grammar template is a part-of-speech error sequence constructed by at least one part-of-speech, for example: verb, verb, verb.
判断所述词性序列中是否出现所述词性错误序列;若是(如,所述词性序列中某一段出现了连续三个动词依次排列的情况),则将所述词性错误序列对应的文本字词设为语法错误对象;若否,则判定所述第二文本不具有语法错误。Determine whether the wrong part-of-speech sequence occurs in the part-of-speech sequence; if (for example, three consecutive verbs are arranged in sequence in a certain segment of the part-of-speech sequence), set the text word corresponding to the wrong part-of-speech sequence as is a grammar error object; if not, it is determined that the second text does not have a grammar error.
于本实施例中,通过构建语法库保存所述逻辑规则,所述逻辑规则是以XML格式保存在所述语法库中,按照 pattern.xsd 和/或 rules.xsd模板定义所述语法库中的错误语法模板,并将其作为所述逻辑规则,使所述逻辑规则成为正则表达式,以便于查找词性序列中出现语法错误的文本字词并将其设为语法错误对象。同时,还可按照其他格式对所述逻辑规则进行定义。In this embodiment, the logic rules are stored by constructing a grammar library, the logic rules are stored in the grammar library in XML format, and the rules in the grammar library are defined according to the pattern.xsd and/or rules.xsd templates. Error grammar template, and use it as the logic rule, so that the logic rule becomes a regular expression, so as to find the text word with grammar error in the part-of-speech sequence and set it as the grammar error object. Meanwhile, the logic rules can also be defined in other formats.
S53:翻译所述文本字词得到回译字词,判断所述回译字词是否在所述第一文本中出现;若是,则将所述回译字词对应的文本字词设为内容错误对象;若否,则判定所述第二文本不具有语法错误。S53: Translate the text words to obtain back-translated words, and determine whether the back-translated words appear in the first text; if so, set the text words corresponding to the back-translated words as content errors object; if not, it is determined that the second text has no grammatical errors.
本步骤中,采用多线程处理方式对所述文本字词进行翻译得到回译字词,将所述回译字词作为预置的正则表达式的变量得到检索表达式;通过所述检索表达式检索所述第一文本,以判断所述回译字词是否在所述第一文本中出现;若是,则将所述回译字词对应的文本字词设为内容错误对象;若否,则判定所述第二文本不具有语法错误。In this step, a multi-thread processing method is used to translate the text words to obtain back-translated words, and the back-translated words are used as variables of a preset regular expression to obtain a retrieval expression; through the retrieval expression Retrieve the first text to determine whether the back-translated word appears in the first text; if so, set the text word corresponding to the back-translated word as a content error object; if not, then It is determined that the second text has no grammatical errors.
需要说明的是,由于对具有关联关系的文本进行翻译,导致文本中具有关联关系的字词需要由同一线程进行翻译,导致无法使用多线程处理方式对未进行分词的文本进行翻译;而本步骤中的文本字词是经对所述第二文本进行分词所获得的,因此获得的各文本字词之间已不具有关联关系,因此可直接通过多线程处理方式对文本字词进行翻译,提高了翻译文本字词并对其进行检查的技术效果。It should be noted that, due to the translation of the text with the associated relationship, the words with the associated relationship in the text need to be translated by the same thread, so that the multi-threaded processing method cannot be used to translate the text without word segmentation; and this step The text words in the text are obtained by segmenting the second text, so there is no correlation between the obtained text words, so the text words can be translated directly through multi-thread processing, improving the The technical effect of translating text words and examining them.
所述识别所述第二文本中出现语法错误的语法错误对象,及识别所述第二文本中出现内容错误的内容错误对象之后,所述方法还包括:After identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text, the method further includes:
将所述语法错误对象和/或内容错误对象上传至区块链中。Uploading the syntax error object and/or content error object to the blockchain.
需要说明的是,基于语法错误对象和/或内容错误对象得到对应的摘要信息,具体来说,摘要信息由语法错误对象和/或内容错误对象进行散列处理得到,比如利用sha256s算法处理得到。将摘要信息上传至区块链可保证其安全性和对用户的公正透明性。用户设备可以从区块链中下载得该摘要信息,以便查证语法错误对象和/或内容错误对象是否被篡改。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。It should be noted that the corresponding summary information is obtained based on the grammatical error object and/or the content error object. Specifically, the summary information is obtained by hashing the grammatical error object and/or the content error object, for example, by using the sha256s algorithm. Uploading summary information to the blockchain ensures its security and fairness and transparency to users. The user equipment can download the summary information from the blockchain in order to verify whether the grammatical error object and/or the content error object has been tampered with. The blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
S206:识别所述第一页面的DOM树中与所述拼写错误对象和所述格式错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为拼写格式标注标签,使所述第一页面按照所述拼写格式标注标签显示所述第二文本。S206: Identify the specified tags corresponding to the misspelling object and the format error object in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with spelling format annotations label, causing the first page to display the second text according to the spelling format labeling label.
为将拼写错误对象和格式错误对象在第一页面上标注出来,以便于管理者识别,本步骤通过识别拼写错误对象和格式错误对象在第一页面的DOM树中对应的指定标签,并将其替换为拼写格式标注标签的方式,进而对拼写错误对象和格式错误对象的标注,不仅实现了在第一页面中对第二文本出现错误的位置进行精准定位的效果,还避免了因对第一页面中拼写错误对象和格式错误对象直接采用高亮替换的方式进行标注,导致所述DOM树受到污染,对DOM数中对Element(元素)的定位造成影响的情况发生。In order to mark the misspelling object and the format error object on the first page for easy identification by the administrator, this step identifies the corresponding designated tags of the misspelling object and the format error object in the DOM tree of the first page, and assigns them The method of replacing the label with the spelling format label, and then labeling the misspelling object and the format error object, not only realizes the effect of accurately locating the wrong position of the second text on the first page, but also avoids the error caused by the first page. The misspelling object and the format error object in the page are directly marked by highlighting replacement, which causes the DOM tree to be polluted and affects the positioning of the Element (element) in the DOM number.
于本实施例中,所述拼写格式标注标签的内容可根据需要设置,例如:高亮标签,批注标签等,使得所述拼写格式标注标签对应的文本,在所述第一页面中,将以高亮、批注等形式展示。In this embodiment, the content of the spelling format labeling label can be set as required, such as: highlighting label, commenting label, etc., so that the text corresponding to the spelling format labeling label, on the first page, will be marked with Highlights, comments, etc. are displayed.
示例性地,首先,采用element.childNodes 的方式对所述第一页面的DOM树中,具有干扰特性的标签进行清理,以减少干扰;Exemplarily, first, use element.childNodes In the DOM tree of the first page, the tags with interference characteristics are cleaned up to reduce interference;
然后,通过parent.childNodes 得到所有子节点获取所述DOM树中所有子节点(即child节点),再识别所述子节点中,与所述拼写错误对象和/或格式错误对象对应的指定标签,并将其设为目标标签;Then, via parent.childNodes Get all child nodes Obtain all child nodes (ie child nodes) in the DOM tree, then identify the specified label corresponding to the misspelling object and/or format error object in the child node, and set it as the target Label;
最后,通过 innerText.replce(keyword,result) 的方式,将所述子节点中的目标标签,替换为拼写格式标注标签,得到具有拼写格式标注标签所对应的标注效果的第一页面,例如: <span id="child"><b>keyword</b> 2</span> (递归处理:当child节点不含子节点时进行replace操作)。所用的正则替换表达式如下:Finally, by In the method of innerText.replce(keyword,result), replace the target label in the child node with the spelling format labeling label to obtain the first page with the labeling effect corresponding to the spelling format labeling label, for example: <span id= "child"><b>keyword</b> 2</span> (recursive processing: replace when the child node has no child nodes). The regular replacement expression used is as follows:
keyword.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'),其意思是把“-\/\\^$*+?.()|[\]{}”通过循环高亮显示。keyword.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), which means to highlight "-\/\\^$*+?.()|[\]{}" through a loop.
其中,中括号[] 及里面的内容是拼写错误对象和格式错误对象,结合所述中括号所表达的含义是里面的字符(单字符)的任意一个。Among them, the square brackets [] and the content inside are the misspelling object and the format error object, and the meaning expressed in combination with the square brackets is any one of the characters (single characters) inside.
S207:识别所述第一页面的DOM树中与所述语法错误对象和内容错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。S207: Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, causing the first page to display the second text according to the grammatical content markup tag.
为将语法错误对象和内容错误对象在第一页面上标注出来,以便于管理者识别,本步骤通过识别语法错误对象和内容错误对象在第一页面的DOM树中对应的指定标签,并将其替换为语法内容标注标签的方式,进而对语法错误对象和内容错误对象的标注,不仅实现了在第一页面中对第二文本出现错误的位置进行精准定位的效果,还避免了因对第一页面中语法错误对象和内容错误对象直接采用高亮替换的方式进行标注,导致所述DOM树受到污染,对DOM数中对Element(元素)的定位造成影响的情况发生。In order to mark the grammatical error object and the content error object on the first page, so as to facilitate the identification of the administrator, this step identifies the corresponding specified label of the grammatical error object and the content error object in the DOM tree of the first page, and puts them in the DOM tree. The method of replacing the grammatical content labeling label, and then labeling the grammatical error object and the content error object, not only realizes the effect of accurately locating the wrong position of the second text on the first page, but also avoids the first page. The syntax error object and the content error object in the page are directly marked by highlighting replacement, which causes the DOM tree to be polluted and affects the positioning of Element (element) in the DOM number.
于本实施例中,所述语法内容标注标签的内容可根据需要设置,例如:高亮标签,批注标签等,使得所述语法内容标注标签对应的文本,在所述第一页面中,将以高亮、批注等形式展示。In this embodiment, the content of the grammatical content labeling label can be set as required, such as: highlighting labels, commenting labels, etc., so that the text corresponding to the grammatical content labeling label, on the first page, will be marked with Highlights, comments, etc. are displayed.
示例性地,首先,采用element.childNodes 的方式对所述第一页面的DOM树中,具有干扰特性的标签进行清理,以减少干扰;Exemplarily, first, use element.childNodes In the DOM tree of the first page, the tags with interference characteristics are cleaned up to reduce interference;
然后,通过parent.childNodes 得到所有子节点获取所述DOM树中所有子节点(即child节点),再识别所述子节点中,与所述语法错误对象和/或内容错误对象对应的指定标签,并将其设为目标标签;Then, via parent.childNodes Get all child nodes Obtain all child nodes (ie child nodes) in the DOM tree, then identify the specified label corresponding to the syntax error object and/or content error object in the child node, and set it as the target Label;
最后,通过 innerText.replce(keyword,result) 的方式,将所述子节点中的目标标签,替换为语法内容标注标签,得到具有语法内容标注标签所对应的标注效果的第一页面,例如: <span id="child"><b>keyword</b> 2</span> (递归处理:当child节点不含子节点时进行replace操作)。所用的正则替换表达式如下:Finally, by In the method of innerText.replce(keyword,result), replace the target label in the child node with the grammatical content labeling label to obtain the first page with the labeling effect corresponding to the grammatical content labeling label, for example: <span id= "child"><b>keyword</b> 2</span> (recursive processing: replace when the child node has no child nodes). The regular replacement expression used is as follows:
keyword.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'),其意思是把“-\/\\^$*+?.()|[\]{}”通过循环高亮显示。keyword.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), which means to highlight "-\/\\^$*+?.()|[\]{}" through a loop.
其中,中括号[] 及里面的内容是语法错误对象和内容错误对象,结合所述中括号所表达的含义是里面的字符(单字符)的任意一个。Among them, the square brackets [] and the content inside are syntax error objects and content error objects, and the meaning expressed in combination with the square brackets is any one of the characters (single characters) inside.
S208:获取所述第一页面的第一URL,根据所述第一URL识别第二URL,并将所述第二URL发送至所述浏览器。S208: Acquire a first URL of the first page, identify a second URL according to the first URL, and send the second URL to the browser.
由于一个网站中往往具有多个页面,因此,如果要管理者一个个的加载页面,将会造成检查工作操作过多,效率低下的问题,为提高页面中的文本的检查效率,本步骤通过获取所述第一页面的第一URL,根据所述第一URL识别第二URL,并将所述第二URL发送至所述浏览器的方式,使所述浏览器可以预加载具有所述第二文本的第二页面,或在所述S207执行完成之后直接加载所述第二页面,以提高管理者继续检查第二页面中第二文本的错误的效率。Since there are often multiple pages in a website, if the administrator needs to load the pages one by one, it will cause too many inspection operations and low efficiency. The first URL of the first page, identifying the second URL according to the first URL, and sending the second URL to the browser, so that the browser can preload the second URL with the second URL. The second page of text, or directly loading the second page after the execution of S207 is completed, so as to improve the efficiency of the administrator continuing to check the errors of the second text in the second page.
于本实施例中,获取所述第一页面的第一URL,访问所述第一URL所在的服务器,并获得位于所述第一URL下一位次的第二URL,将所述第二URL发送至所述浏览器。In this embodiment, the first URL of the first page is obtained, the server where the first URL is located is accessed, and the second URL located next to the first URL is obtained, and the second URL is sent to the browser.
例如,第一URL是:https://zhidao.baidu.com/question/3591601.html,访问所述第一URL所在的服务器,并获得位于所述第一URL下一位次的第二URL:https://zhidao.baidu.com/question/3591602.html。For example, the first URL is: https://zhidao.baidu.com/question/3591601.html, access the server where the first URL is located, and obtain the second URL located next to the first URL: https://zhidao.baidu.com/question/3591602.html.
需要说明的是,所述URL是统一资源定位符,是互联网上标准资源的地址。而互联网上的每个文件bai都有唯一的一个的URL,它包含的信息指出文件的位置。It should be noted that the URL is a Uniform Resource Locator, which is an address of a standard resource on the Internet. And every file bai on the Internet has a unique URL, which contains information that points out the location of the file.
实施例三:Embodiment three:
请参阅图4,本实施例的一种翻译错误识别装置1,包括:Please refer to FIG. 4 , a translation error recognition device 1 of the present embodiment includes:
文本输入模块12,用于获取第一页面的第二文本,其中,所述第二文本由第一文本翻译获得;a text input module 12, configured to obtain the second text of the first page, wherein the second text is obtained by translating the first text;
错误识别模块15,用于识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象;An error identification module 15, configured to identify a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;
错误标注模块17,用于识别所述第一页面的DOM树中与所述语法错误对象和内容错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。The error labeling module 17 is used to identify the specified label corresponding to the grammatical error object and the content error object in the DOM tree of the first page, and set the specified label as a target label; replace the target label with A grammatical content annotation tag, so that the first page displays the second text according to the grammatical content annotation tag.
可选的,所述翻译错误识别装置1还包括:Optionally, the translation error identification device 1 further includes:
触发模块11,用于使所述翻译错误识别装置1被设置为,响应于浏览器加载完成具有所述第二文本的第一页面。The triggering module 11 is configured to enable the translation error identification device 1 to be configured to complete the first page with the second text in response to the browser loading.
可选的,所述翻译错误识别装置1还包括:Optionally, the translation error identification device 1 further includes:
配置输入模块13,用于提取所述浏览器的配置文件中的文本信息,并将其载入所述第二文本。The configuration input module 13 is configured to extract the text information in the configuration file of the browser and load it into the second text.
可选的,所述翻译错误识别装置1还包括:Optionally, the translation error identification device 1 further includes:
拼写格式模块14,用于识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象。The spelling format module 14 is configured to identify the misspelled words and the incorrectly formatted characters in the second text, so as to obtain a spelling error object and a format error object respectively.
可选的,所述翻译错误识别装置1还包括:Optionally, the translation error identification device 1 further includes:
拼写格式识别模块16,用于识别所述第一页面的DOM树中与所述拼写错误对象和所述格式错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为拼写格式标注标签,使所述第一页面按照所述拼写格式标注标签显示所述第二文本。The spelling format recognition module 16 is used to identify the specified tag corresponding to the misspelling object and the format error object in the DOM tree of the first page, and set the specified tag as a target tag; set the target The label is replaced with a spelling format markup label, so that the first page displays the second text according to the spelling format markup label.
可选的,所述翻译错误识别装置1还包括:Optionally, the translation error identification device 1 further includes:
预加载模块18,用于获取所述第一页面的第一URL,根据所述第一URL识别第二URL,并将所述第二URL发送至所述浏览器。The preloading module 18 is configured to acquire the first URL of the first page, identify the second URL according to the first URL, and send the second URL to the browser.
本技术方案应用于计算机开发的UI设计领域,通过识别所述第二文本中出现语法错误的语法错误对象,及识别所述第二文本中出现内容错误的内容错误对象,并将所述语法错误对象和/或所述内容错误对象对应的指定标签替换为语法内容标注标签,使所述第一页面成为按照所述语法内容标注标签显示所述第二文本的H5页面。The technical solution is applied in the field of UI design developed by computer, by identifying the grammatical error objects with grammatical errors in the second text, and identifying the content error objects with content errors in the second text, and classifying the grammatical errors The specified label corresponding to the object and/or the content error object is replaced with a grammatical content labeling label, so that the first page becomes an H5 page displaying the second text according to the grammatical content labeling label.
实施例四:Embodiment 4:
为实现上述目的,本申请还提供一种计算机设备5,实施例三的翻译错误识别装置1的组成部分可分散于不同的计算机设备中,计算机设备5可以是执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个应用服务器所组成的服务器集群)等。本实施例的计算机设备至少包括但不限于:可通过系统总线相互通信连接的存储器51、处理器52,如图5所示。需要指出的是,图5仅示出了具有组件-的计算机设备,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。In order to achieve the above purpose, the present application also provides a computer device 5, the components of the translation error recognition device 1 of the third embodiment can be dispersed in different computer devices, and the computer device 5 can be a smart phone, tablet computer, Notebook computers, desktop computers, rack servers, blade servers, tower servers or rack servers (including independent servers, or server clusters composed of multiple application servers), etc. The computer device in this embodiment at least includes but is not limited to: a memory 51 and a processor 52 that can be communicatively connected to each other through a system bus, as shown in FIG. 5 . It should be pointed out that FIG. 5 only shows a computer device having a component -, but it should be understood that it is not required to implement all the shown components, and more or less components may be implemented instead.
本实施例中,存储器51(即可读存储介质)包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器51可以是计算机设备的内部存储单元,例如该计算机设备的硬盘或内存。在另一些实施例中,存储器51也可以是计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,存储器51还可以既包括计算机设备的内部存储单元也包括其外部存储设备。本实施例中,存储器51通常用于存储安装于计算机设备的操作系统和各类应用软件,例如实施例三的翻译错误识别装置的程序代码等。此外,存储器51还可以用于暂时地存储已经输出或者将要输出的各类数据。In this embodiment, the memory 51 (ie, a readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disc, etc. In some embodiments, the memory 51 may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory 51 may also be an external storage device of a computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, Flash Card (Flash Card), etc. Of course, the memory 51 may also include both the internal storage unit of the computer device and its external storage device. In this embodiment, the memory 51 is generally used to store the operating system and various application software installed in the computer equipment, such as the program code of the translation error identification device of the third embodiment. In addition, the memory 51 can also be used to temporarily store various types of data that have been output or will be output.
处理器52在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器52通常用于控制计算机设备的总体操作。本实施例中,处理器52用于运行存储器51中存储的程序代码或者处理数据,例如运行翻译错误识别装置,以实现实施例一和实施例二的翻译错误识别方法。The processor 52 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 52 is typically used to control the overall operation of the computer device. In this embodiment, the processor 52 is configured to run the program code or process data stored in the memory 51 , for example, run the translation error identification device, so as to implement the translation error identification methods of the first and second embodiments.
实施例五:Embodiment 5:
为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器52执行时实现相应功能。本实施例的计算机可读存储介质用于存储翻译错误识别装置,被处理器52执行时实现实施例一和实施例二的翻译错误识别方法。In order to achieve the above purpose, the present application also provides a computer-readable storage medium, which can be non-volatile or volatile, such as flash memory, hard disk, multimedia card, card-type memory (for example, , SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM) ), magnetic storage, magnetic disk, optical disk, server, App application mall, etc., on which computer programs are stored, and when the programs are executed by the processor 52, corresponding functions are realized. The computer-readable storage medium of this embodiment is used to store the translation error identification device, and when executed by the processor 52, implements the translation error identification methods of the first embodiment and the second embodiment.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent protection of this application.

Claims (20)

  1. 一种翻译错误识别方法,用于检查由第一文本翻译获得的第二文本中出现的错误,其中,包括: A translation error identification method for checking errors occurring in a second text obtained by translating a first text, comprising:
    获取第一页面的第二文本,其中,所述第二文本由第一文本翻译获得;acquiring the second text of the first page, wherein the second text is obtained by translating the first text;
    识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象;Identifying grammatical error objects with grammatical errors in the second text, and content error objects with content errors in the second text;
    识别所述第一页面的DOM树中与所述语法错误对象和内容错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, so that all The first page displays the second text according to the grammatical content annotation tag.
  2. 根据权利要求1所述的翻译错误识别方法,其中,所述获取第一页面的第二文本之前,所述方法还包括: The method for identifying translation errors according to claim 1, wherein before the acquiring the second text of the first page, the method further comprises:
    所述翻译错误识别方法被设置为,响应于浏览器加载完成具有所述第二文本的第一页面。The translation error identification method is configured to complete the first page with the second text in response to the browser loading.
  3. 根据权利要求1所述的翻译错误识别方法,其中,所述获取第一页面的第二文本之后,所述方法还包括: The method for identifying translation errors according to claim 1, wherein after acquiring the second text of the first page, the method further comprises:
    识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象。Identifying misspelled words and malformed characters in the second text to obtain misspelling objects and malformed objects, respectively.
  4. 根据权利要求3所述的翻译错误识别方法,其中,所述识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象的步骤,包括: The method for recognizing translation errors according to claim 3, wherein the step of recognizing misspelled words and formatted characters in the second text to obtain misspelled objects and formatted objects respectively, comprises:
    创建拼写错误列表和格式错误列表;Create lists of spelling errors and formatting errors;
    对所述第二文本中的词汇进行拼写校对,将具有拼写错误的词汇作为拼写错误对象并将其保存至所述拼写错误列表中;Perform spelling proofreading on the vocabulary in the second text, take the vocabulary with the spelling error as the misspelling object and save it in the spelling error list;
    判断所述第二文本的格式是否符合预置的格式规则,将出现格式错误的词汇和/或符号作为格式错误对象,并将所述格式错误对象保存至所述格式错误列表中。It is judged whether the format of the second text complies with a preset format rule, the words and/or symbols in which the format is wrong are used as a format error object, and the format error object is saved in the format error list.
  5. 根据权利要求1所述的翻译错误识别方法,其中,所述识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象的步骤,包括: The method for identifying translation errors according to claim 1, wherein the step of identifying a grammatical error object with a grammatical error in the second text and a content error object with a content error in the second text comprises:
    对所述第二文本进行分词得到文本字词,标注所述文本字词的词性得到按照所述文本字词的排序排列的词性序列;Perform word segmentation on the second text to obtain text words, and mark the parts of speech of the text words to obtain a sequence of parts of speech arranged according to the ordering of the text words;
    通过预置的逻辑规则判断所述词性序列中是否出现错误的逻辑序列;若是,则将所述出现错误的逻辑序列所对应的文本字词设为语法错误对象;若否,则判定所述第二文本不具有语法错误;Determine whether there is an erroneous logical sequence in the part-of-speech sequence through preset logic rules; if so, set the text word corresponding to the erroneous logical sequence as a grammatical error object; if not, determine the first 2. The text has no grammatical errors;
    翻译所述文本字词得到回译字词,判断所述回译字词是否在所述第一文本中出现;若是,则将所述回译字词对应的文本字词设为内容错误对象;若否,则判定所述第二文本不具有语法错误;Translating the text word to obtain a back-translated word, and determining whether the back-translated word appears in the first text; if so, setting the text word corresponding to the back-translated word as a content error object; If not, it is determined that the second text has no grammatical errors;
    所述识别所述第二文本中出现语法错误的语法错误对象,及识别所述第二文本中出现内容错误的内容错误对象之后,所述方法还包括:After identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text, the method further includes:
    将所述语法错误对象和/或内容错误对象上传至区块链中。Uploading the syntax error object and/or content error object to the blockchain.
  6. 根据权利要求3所述的翻译错误识别方法,其中,所述识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象之后,所述方法还包括: The method for recognizing translation errors according to claim 3, wherein after recognizing words with misspellings and characters with format errors in the second text to obtain misspelling objects and format error objects, respectively, the method Also includes:
    识别所述第一页面的DOM树中与所述拼写错误对象和所述格式错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为拼写格式标注标签,使所述第一页面按照所述拼写格式标注标签显示所述第二文本。Identifying the specified tag corresponding to the misspelling object and the format error object in the DOM tree of the first page, and setting the specified tag as a target tag; replacing the target tag with a spelling format annotation tag, causing the first page to display the second text in accordance with the spelling format callout tag.
  7. 根据权利要求1所述的翻译错误识别方法,其中,所述将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本之后,所述方法还包括: The method for identifying translation errors according to claim 1, wherein after replacing the target tag with a grammatical content tagging tag, and causing the first page to display the second text according to the grammatical content tagging tag, the The method also includes:
    获取所述第一页面的第一URL,根据所述第一URL识别第二URL,并将所述第二URL发送至所述浏览器。Acquire the first URL of the first page, identify the second URL according to the first URL, and send the second URL to the browser.
  8. 一种翻译错误识别装置,其中,包括: A translation error identification device, comprising:
    文本输入模块,用于获取第一页面的第二文本,其中,所述第二文本由第一文本翻译获得;a text input module, configured to obtain the second text of the first page, wherein the second text is obtained by translating the first text;
    错误识别模块,用于识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象;An error recognition module for identifying a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;
    错误标注模块,用于识别所述第一页面的DOM树中与所述语法错误对象和内容错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。An error labeling module, configured to identify the specified label corresponding to the syntax error object and the content error object in the DOM tree of the first page, and set the specified label as a target label; replace the target label with a grammar A content annotation tag, so that the first page displays the second text according to the grammatical content annotation tag.
  9. 一种计算机设备,其包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述计算机设备的处理器执行所述计算机程序时实现所述翻译错误识别方法,所述翻译错误识别方法的步骤,包括: A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the method for identifying translation errors is implemented when the processor of the computer device executes the computer program, The steps of the method for identifying translation errors include:
    获取第一页面的第二文本,其中,所述第二文本由第一文本翻译获得;acquiring the second text of the first page, wherein the second text is obtained by translating the first text;
    识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象;Identifying grammatical error objects with grammatical errors in the second text, and content error objects with content errors in the second text;
    识别所述第一页面的DOM树中与所述语法错误对象和内容错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, so that all The first page displays the second text according to the grammatical content annotation tag.
  10. 根据权利要求9所述的计算机设备,其中,所述获取第一页面的第二文本之前,所述方法还包括: The computer device according to claim 9, wherein before the acquiring the second text of the first page, the method further comprises:
    所述翻译错误识别方法被设置为,响应于浏览器加载完成具有所述第二文本的第一页面。The translation error identification method is configured to complete the first page with the second text in response to the browser loading.
  11. 根据权利要求9所述的计算机设备,其中,所述获取第一页面的第二文本之后,所述方法还包括: The computer device according to claim 9, wherein after the acquiring the second text of the first page, the method further comprises:
    识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象;Identifying misspelled words and malformed characters in the second text to obtain misspelled objects and malformed objects respectively;
    所述识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象的步骤,包括:The step of identifying the misspelled vocabulary and the wrongly formatted character in the second text to obtain the misspelled object and the wrongly formatted object, respectively, includes:
    创建拼写错误列表和格式错误列表;Create lists of spelling errors and formatting errors;
    对所述第二文本中的词汇进行拼写校对,将具有拼写错误的词汇作为拼写错误对象并将其保存至所述拼写错误列表中;Perform spelling proofreading on the vocabulary in the second text, take the vocabulary with the spelling error as the misspelling object and save it in the spelling error list;
    判断所述第二文本的格式是否符合预置的格式规则,将出现格式错误的词汇和/或符号作为格式错误对象,并将所述格式错误对象保存至所述格式错误列表中。It is judged whether the format of the second text complies with a preset format rule, the words and/or symbols in which the format is wrong are used as a format error object, and the format error object is saved in the format error list.
  12. 根据权利要求9所述的计算机设备,其中,所述识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象的步骤,包括: The computer device according to claim 9, wherein the step of identifying a grammatical error object with a grammatical error in the second text and a content error object with a content error in the second text comprises:
    对所述第二文本进行分词得到文本字词,标注所述文本字词的词性得到按照所述文本字词的排序排列的词性序列;Perform word segmentation on the second text to obtain text words, and mark the parts of speech of the text words to obtain a sequence of parts of speech arranged according to the ordering of the text words;
    通过预置的逻辑规则判断所述词性序列中是否出现错误的逻辑序列;若是,则将所述出现错误的逻辑序列所对应的文本字词设为语法错误对象;若否,则判定所述第二文本不具有语法错误;Determine whether there is an erroneous logical sequence in the part-of-speech sequence through preset logic rules; if so, set the text word corresponding to the erroneous logical sequence as a grammatical error object; if not, determine the first 2. The text has no grammatical errors;
    翻译所述文本字词得到回译字词,判断所述回译字词是否在所述第一文本中出现;若是,则将所述回译字词对应的文本字词设为内容错误对象;若否,则判定所述第二文本不具有语法错误;Translating the text word to obtain a back-translated word, and determining whether the back-translated word appears in the first text; if so, setting the text word corresponding to the back-translated word as a content error object; If not, it is determined that the second text has no grammatical errors;
    所述识别所述第二文本中出现语法错误的语法错误对象,及识别所述第二文本中出现内容错误的内容错误对象之后,所述方法还包括:After identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text, the method further includes:
    将所述语法错误对象和/或内容错误对象上传至区块链中。Uploading the syntax error object and/or content error object to the blockchain.
  13. 根据权利要求11所述的计算机设备,其中,所述识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象之后,所述方法还包括: The computer device according to claim 11, wherein after identifying the misspelled words and formatted characters in the second text to obtain the misspelling object and the format error object, respectively, the method further comprises: :
    识别所述第一页面的DOM树中与所述拼写错误对象和所述格式错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为拼写格式标注标签,使所述第一页面按照所述拼写格式标注标签显示所述第二文本。Identifying the specified tag corresponding to the misspelling object and the format error object in the DOM tree of the first page, and setting the specified tag as a target tag; replacing the target tag with a spelling format annotation tag, causing the first page to display the second text in accordance with the spelling format callout tag.
  14. 根据权利要求9所述的计算机设备,其中,所述将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本之后,所述方法还包括: The computer device according to claim 9, wherein after replacing the target tag with a grammatical content tagging tag, and causing the first page to display the second text according to the grammatical content tagging tag, the method Also includes:
    获取所述第一页面的第一URL,根据所述第一URL识别第二URL,并将所述第二URL发送至所述浏览器。Acquire the first URL of the first page, identify the second URL according to the first URL, and send the second URL to the browser.
  15. 一种计算机可读存储介质,所述可读存储介质上存储有计算机程序,其中,所述可读存储介质存储的所述计算机程序被处理器执行时实现所述翻译错误识别方法,所述翻译错误识别方法的步骤,包括: A computer-readable storage medium on which a computer program is stored, wherein, when the computer program stored in the readable storage medium is executed by a processor, the translation error identification method is implemented, and the translation The steps of the error identification method include:
    获取第一页面的第二文本,其中,所述第二文本由第一文本翻译获得;acquiring the second text of the first page, wherein the second text is obtained by translating the first text;
    识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象;Identifying grammatical error objects with grammatical errors in the second text, and content error objects with content errors in the second text;
    识别所述第一页面的DOM树中与所述语法错误对象和内容错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本。Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, so that all The first page displays the second text according to the grammatical content annotation tag.
  16. 根据权利要求15所述的可读存储介质,其中,所述获取第一页面的第二文本之前,所述方法还包括: The readable storage medium of claim 15, wherein before the acquiring the second text of the first page, the method further comprises:
    所述翻译错误识别方法被设置为,响应于浏览器加载完成具有所述第二文本的第一页面。The translation error identification method is configured to complete the first page with the second text in response to the browser loading.
  17. 根据权利要求15所述的可读存储介质,其中,所述获取第一页面的第二文本之后,所述方法还包括: The readable storage medium of claim 15, wherein after the acquiring the second text of the first page, the method further comprises:
    识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象;Identifying misspelled vocabulary and malformed characters in the second text to obtain misspelled objects and malformed objects respectively;
    所述识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象的步骤,包括:The step of identifying the misspelled vocabulary and the wrongly formatted character in the second text to obtain the misspelled object and the wrongly formatted object, respectively, includes:
    创建拼写错误列表和格式错误列表;Create lists of spelling errors and formatting errors;
    对所述第二文本中的词汇进行拼写校对,将具有拼写错误的词汇作为拼写错误对象并将其保存至所述拼写错误列表中;Perform spelling proofreading on the vocabulary in the second text, take the vocabulary with the spelling error as the misspelling object and save it in the spelling error list;
    判断所述第二文本的格式是否符合预置的格式规则,将出现格式错误的词汇和/或符号作为格式错误对象,并将所述格式错误对象保存至所述格式错误列表中。It is judged whether the format of the second text complies with a preset format rule, the words and/or symbols in which the format is wrong are used as a format error object, and the format error object is saved in the format error list.
  18. 根据权利要求15所述的可读存储介质,其中,所述识别所述第二文本中出现语法错误的语法错误对象,以及所述第二文本中出现内容错误的内容错误对象的步骤,包括: The readable storage medium according to claim 15, wherein the step of identifying a grammatical error object with a grammatical error in the second text and a content error object with a content error in the second text comprises:
    对所述第二文本进行分词得到文本字词,标注所述文本字词的词性得到按照所述文本字词的排序排列的词性序列;Perform word segmentation on the second text to obtain text words, and mark the parts of speech of the text words to obtain a sequence of parts of speech arranged according to the ordering of the text words;
    通过预置的逻辑规则判断所述词性序列中是否出现错误的逻辑序列;若是,则将所述出现错误的逻辑序列所对应的文本字词设为语法错误对象;若否,则判定所述第二文本不具有语法错误;Determine whether there is an erroneous logical sequence in the part-of-speech sequence through preset logic rules; if so, set the text word corresponding to the erroneous logical sequence as a grammatical error object; if not, determine the first 2. The text has no grammatical errors;
    翻译所述文本字词得到回译字词,判断所述回译字词是否在所述第一文本中出现;若是,则将所述回译字词对应的文本字词设为内容错误对象;若否,则判定所述第二文本不具有语法错误;Translating the text word to obtain a back-translated word, and determining whether the back-translated word appears in the first text; if so, setting the text word corresponding to the back-translated word as a content error object; If not, it is determined that the second text has no grammatical errors;
    所述识别所述第二文本中出现语法错误的语法错误对象,及识别所述第二文本中出现内容错误的内容错误对象之后,所述方法还包括:After identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text, the method further includes:
    将所述语法错误对象和/或内容错误对象上传至区块链中。Uploading the syntax error object and/or content error object to the blockchain.
  19. 根据权利要求17所述的计算机设备,其中,所述识别所述第二文本中出现拼写错误的词汇及出现格式错误的字符,以分别得到拼写错误对象和格式错误对象之后,所述方法还包括: The computer device according to claim 17, wherein after identifying words with misspellings and characters with formatting errors in the second text to obtain a misspelling object and a formatting error object, respectively, the method further comprises: :
    识别所述第一页面的DOM树中与所述拼写错误对象和所述格式错误对象对应的指定标签,并将所述指定标签设为目标标签;将所述目标标签替换为拼写格式标注标签,使所述第一页面按照所述拼写格式标注标签显示所述第二文本。Identifying the specified tag corresponding to the misspelling object and the format error object in the DOM tree of the first page, and setting the specified tag as a target tag; replacing the target tag with a spelling format annotation tag, causing the first page to display the second text in accordance with the spelling format callout tag.
  20. 根据权利要求15所述的计算机设备,其中,所述将所述目标标签替换为语法内容标注标签,使所述第一页面按照所述语法内容标注标签显示所述第二文本之后,所述方法还包括: The computer device according to claim 15, wherein after replacing the target tag with a grammatical content tagging tag, and causing the first page to display the second text according to the grammatical content tagging tag, the method Also includes:
    获取所述第一页面的第一URL,根据所述第一URL识别第二URL,并将所述第二URL发送至所述浏览器。Acquire the first URL of the first page, identify the second URL according to the first URL, and send the second URL to the browser.
PCT/CN2021/109257 2020-12-22 2021-07-29 Translation error identification method and apparatus, and computer device and readable storage medium WO2022134577A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011528714.1A CN112667208A (en) 2020-12-22 2020-12-22 Translation error recognition method and device, computer equipment and readable storage medium
CN202011528714.1 2020-12-22

Publications (1)

Publication Number Publication Date
WO2022134577A1 true WO2022134577A1 (en) 2022-06-30

Family

ID=75407584

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109257 WO2022134577A1 (en) 2020-12-22 2021-07-29 Translation error identification method and apparatus, and computer device and readable storage medium

Country Status (2)

Country Link
CN (1) CN112667208A (en)
WO (1) WO2022134577A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881011A (en) * 2022-07-12 2022-08-09 中国人民解放军国防科技大学 Multichannel Chinese text correction method, device, computer equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667208A (en) * 2020-12-22 2021-04-16 深圳壹账通智能科技有限公司 Translation error recognition method and device, computer equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8296124B1 (en) * 2008-11-21 2012-10-23 Google Inc. Method and apparatus for detecting incorrectly translated text in a document
CN107329958A (en) * 2017-06-08 2017-11-07 努比亚技术有限公司 Language transfer method and device based on webpage
CN110083845A (en) * 2019-04-25 2019-08-02 数译(成都)信息技术有限公司 Web page translation method and system
CN112015430A (en) * 2020-09-07 2020-12-01 平安国际智慧城市科技股份有限公司 JavaScript code translation method and device, computer equipment and storage medium
CN112100063A (en) * 2020-08-31 2020-12-18 腾讯科技(深圳)有限公司 Interface language display test method and device, computer equipment and storage medium
CN112667208A (en) * 2020-12-22 2021-04-16 深圳壹账通智能科技有限公司 Translation error recognition method and device, computer equipment and readable storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662932B (en) * 2012-03-15 2014-05-14 中国科学院自动化研究所 Method for establishing tree structure and tree-structure-based machine translation system
CN103365838B (en) * 2013-07-24 2016-04-20 桂林电子科技大学 Based on the english composition grammar mistake method for automatically correcting of diverse characteristics
CN108519974A (en) * 2018-03-31 2018-09-11 华南理工大学 English composition automatic detection of syntax error and analysis method
CN108804428A (en) * 2018-06-12 2018-11-13 苏州大学 Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation
CN111767709A (en) * 2019-03-27 2020-10-13 武汉慧人信息科技有限公司 Logic method for carrying out error correction and syntactic analysis on English text
CN111695343A (en) * 2020-06-23 2020-09-22 深圳壹账通智能科技有限公司 Wrong word correcting method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8296124B1 (en) * 2008-11-21 2012-10-23 Google Inc. Method and apparatus for detecting incorrectly translated text in a document
CN107329958A (en) * 2017-06-08 2017-11-07 努比亚技术有限公司 Language transfer method and device based on webpage
CN110083845A (en) * 2019-04-25 2019-08-02 数译(成都)信息技术有限公司 Web page translation method and system
CN112100063A (en) * 2020-08-31 2020-12-18 腾讯科技(深圳)有限公司 Interface language display test method and device, computer equipment and storage medium
CN112015430A (en) * 2020-09-07 2020-12-01 平安国际智慧城市科技股份有限公司 JavaScript code translation method and device, computer equipment and storage medium
CN112667208A (en) * 2020-12-22 2021-04-16 深圳壹账通智能科技有限公司 Translation error recognition method and device, computer equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881011A (en) * 2022-07-12 2022-08-09 中国人民解放军国防科技大学 Multichannel Chinese text correction method, device, computer equipment and storage medium
CN114881011B (en) * 2022-07-12 2022-09-23 中国人民解放军国防科技大学 Multichannel Chinese text correction method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112667208A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
US10067931B2 (en) Analysis of documents using rules
WO2022022045A1 (en) Knowledge graph-based text comparison method and apparatus, device, and storage medium
CN113110988B (en) Testing applications with defined input formats
US11244203B2 (en) Automated generation of structured training data from unstructured documents
US11010673B2 (en) Method and system for entity relationship model generation
US9268749B2 (en) Incremental computation of repeats
CN114616572A (en) Cross-document intelligent writing and processing assistant
US9224103B1 (en) Automatic annotation for training and evaluation of semantic analysis engines
US9858268B2 (en) Chinese name transliteration
US20120158742A1 (en) Managing documents using weighted prevalence data for statements
WO2022134577A1 (en) Translation error identification method and apparatus, and computer device and readable storage medium
US10108590B2 (en) Comparing markup language files
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
US20210073257A1 (en) Logical document structure identification
Mundotiya et al. Development of a Dataset and a Deep Learning Baseline Named Entity Recognizer for Three Low Resource Languages: Bhojpuri, Maithili, and Magahi
US20200183813A1 (en) Automated test script generator
CN107862045B (en) Cross-language plagiarism detection method based on multiple features
KR20210013991A (en) Apparatus, method, computer program, computer-readable storage device, server and system for drafting patent document
CN114492419B (en) Text labeling method, system and device based on newly added key words in labeling
US10268674B2 (en) Linguistic intelligence using language validator
CN110618809B (en) Front-end webpage input constraint extraction method and device
US20210342521A1 (en) Learning device, extraction device, and learning method
CN116484223A (en) Model training method, standard format document generation method and device
CN114358000A (en) Extracting structured information from unstructured documents
CN114091435A (en) Text content checking method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21908582

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.10.2023)