CN114217901A - Translation management and evaluation method for Chinese Tibetan language data under domestic operating system - Google Patents

Translation management and evaluation method for Chinese Tibetan language data under domestic operating system Download PDF

Info

Publication number
CN114217901A
CN114217901A CN202210155863.0A CN202210155863A CN114217901A CN 114217901 A CN114217901 A CN 114217901A CN 202210155863 A CN202210155863 A CN 202210155863A CN 114217901 A CN114217901 A CN 114217901A
Authority
CN
China
Prior art keywords
translation
data
source
operating system
software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210155863.0A
Other languages
Chinese (zh)
Other versions
CN114217901B (en
Inventor
余杰
刘晓东
彭龙
马俊
谭郁松
吴庆波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210155863.0A priority Critical patent/CN114217901B/en
Publication of CN114217901A publication Critical patent/CN114217901A/en
Application granted granted Critical
Publication of CN114217901B publication Critical patent/CN114217901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • G06F9/454Multi-language systems; Localisation; Internationalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a translation management and evaluation method of Chinese Tibetan language data under a domestic operating system, which can carry out overall management on the Tibetan language data with a large number of pieces of tens of millions of software in the domestic operating system based on Linux, and effectively reduce the management cost of Chinese Tibetan language maintenance of the domestic operating system, and comprises the following steps: analyzing all software source codes under a domestic operating system based on Linux to obtain source entries and constructing a source entry data set; responding to the updating of the translation data of the source entry, and constructing a translation mapping relation between newly-added translation data and the source entry; according to the translation mapping relation, cross-software inspection of the source vocabulary entries to be processed is carried out in the domestic operating system, and translation data are updated or added for the software with the detected source vocabulary entries to be processed; and evaluating the translation quality of the newly added translation data, and outputting an evaluation result of the translation correctness.

Description

Translation management and evaluation method for Chinese Tibetan language data under domestic operating system
Technical Field
The invention relates to the technical field of translation data management, in particular to a translation management and evaluation method of Chinese and Tibetan language data under a domestic operating system.
Background
Along with the overall globalization and informatization of computers and computer networks, domestic operating systems based on Linux have been deeply inserted into different countries, regions and industrial fields, and are widely and comprehensively used all over the world. In order to provide friendly and consistent user interface and interactive experience for the users of the domestic operating system using Tibetan language in the Tibet region of China, the Tibetan language support adaptation of the domestic operating system is an important research direction of the modern domestic operating system. However, because the domestic operating system needs to support a plurality of languages, has a large data volume, is difficult to continuously construct data and maintain consistency, and has a limited number of relevant translators and translation levels, the Tibetan language support adaptation of the domestic operating system faces huge research and development difficulty and management pressure, and becomes a difficult problem of localization of the Tibetan language of the domestic operating system.
Homemade operating system systems often contain thousands of pieces of software. The existing Tibetan language support frame is mainly based on software-level maintenance, mainly uses single software as a minimum unit to carry out Tibetan language localized maintenance, and has the defects of zero fragmentation, poor data association, high management cost and the like.
Disclosure of Invention
Aiming at the problems, the invention provides a translation management and evaluation method of Chinese Tibetan language data under a domestic operating system, which can carry out overall management on the Tibetan language data with a large number of software in tens of millions in the domestic operating system based on Linux, and effectively reduce the management cost of Chinese Tibetan language maintenance of the domestic operating system.
The technical scheme is as follows: the translation management and evaluation method of the Chinese Tibetan language data under the domestic operating system comprises the following steps:
analyzing all software source codes under a domestic operating system based on Linux to obtain source entries and constructing a source entry data set;
responding to the updating of the translation data of the source entry, and constructing a translation mapping relation between newly-added translation data and the source entry;
according to the translation mapping relation, cross-software inspection of the source vocabulary entries to be processed is carried out in the domestic operating system, and translation data are updated or added for the software with the detected source vocabulary entries to be processed;
and evaluating the translation quality of the newly added translation data, and outputting an evaluation result of the translation correctness.
Further, the updating of the translation data of the source entry is obtained by the following steps: and providing a translation service of the source entry data set through the interactive web service, and recording newly added translation data translated by the translator.
Further, the building of the translation mapping relationship between the newly-added translation data and the source entry specifically includes the following steps:
performing language validity check on the source entries, detecting whether the source entries have a source entry data set or not for detecting valid source entries, and taking the source entries which are detected to be illegal or not in the source entry data set as illegal data for abandoning;
and for the source entries which pass the language legality detection and exist in the source entry data set, continuously detecting whether the source entries have translation data or not, if the source entries have the translation data, updating the associated translation mapping relation, and if the source entries do not have the translation data, constructing a new translation mapping relation.
Further, if the source entry is not a legitimate natural language in chinese or tibetan, an error or warning is provided in the interactive web service.
Further, when the source vocabulary entry of the Tibetan language is subjected to language validity check, whether the source vocabulary entry is a valid natural language or not is analyzed by identifying the ISO 639 language identification code of the text data of the Tibetan language.
Furthermore, in response to the updating or new building of the translation mapping relation, a related translation updating event is produced, in response to the generation of the translation updating event, a source entry to be processed is obtained by analyzing the translation updating event, and the source entry to be processed is checked in all software in the domestic operating system.
Further, the updating or adding of translation data to the software for which the existence of the source entry to be processed is detected specifically includes the following steps:
for software with a source entry to be processed detected, detecting whether translation data of the source entry to be processed exists in the software, and if the translation data does not exist, adding translation data of the source entry according to a translation mapping relation; if the translation data exist, comparing whether the original translation data in the software is consistent with the newly added translation data in the translation mapping relation or not, if not, updating by adopting the newly added translation data, and if so, abandoning the newly added translation data;
and submitting the updated or newly added translation data to the corresponding software source code for updating.
Further, the evaluating the translation quality of the newly added translation data and outputting the evaluation result of the translation correctness specifically includes the following steps:
performing semantic correlation analysis on newly added translation data in the translation mapping relation, and marking the translation correctness probability of the translation data to be analyzed as 0 if the semantic similarity does not reach a set threshold value; if the difference reaches the threshold value, continuing to perform translation data difference calculation;
analyzing the difference between the newly added translation data and the standard translation sample, and calculating the translation correctness probability of the newly added translation data according to the difference value;
summarizing the translation correctness probabilities of all source vocabulary entries in a domestic operating system, and outputting a software translation correctness report.
Further, the software translation correctness report is fed back to the translator through the interactive web service for optimizing the translation quality.
Further, the method is realized based on a translation management and evaluation system of the Chinese Tibetan language data under a domestic operating system, and the translation management and evaluation system of the Chinese Tibetan language data under the domestic operating system comprises communication connection:
the source entry acquisition module is used for analyzing all software source codes under a domestic operating system based on Linux, acquiring source entries and constructing a source entry data set;
the translation mapping relation building module is used for building a translation mapping relation between newly-added translation data and the source entry according to the obtained update of the translation data of the source entry;
the updating module is used for performing cross-software to-be-processed source entry check in a domestic operating system according to the translation mapping relation and updating or increasing translation data for the software with the detected to-be-processed source entry;
and the evaluation module is used for evaluating the translation quality of the newly added translation data and outputting an evaluation result of the translation correctness.
Further, the method is implemented based on a computer device, the computer device comprising: comprising a processor, a memory, and a program;
the program is stored in the memory, and the processor calls the program stored in the memory to execute the translation management and evaluation method of the Chinese and Tibetan language data under the domestic operating system.
Further, the method is implemented based on a computer-readable storage medium for storing a program for executing the method for translation management and evaluation of chinese-Tibetan language data under the domestic operating system.
The invention relates to a translation management and evaluation method of Chinese and Tibetan language data under a domestic operating system, which aims at the problems of fragmentation, poor data association and high management cost of the translation of Tibetan language data in tens of millions in a large amount of software of the domestic operating system, obtains source entries by analyzing all software source codes under the domestic operating system, can realize the complete recording of all Tibetan language data in the large amount of software in the domestic operating system by translating the source entries, can avoid the repeated translation of the Tibetan language data, constructs the translation mapping relationship between newly-added translation data and the source entries when newly-added translation data of the source entries are generated each time, realizes the high association between the source entries and the translation data through the translation mapping relationship, can realize the quick search of the Tibetan language data of cross-software through the translation mapping relationship, can carry out cross-software to-be-processed source entry inspection in the domestic operating system according to the translation mapping relationship, the method has the advantages that the translation data of the software with the source entry to be processed is updated or added, synchronous updating of cross-software Tibetan language translation data is achieved, overall management of Tibetan language translation data can be achieved, Tibetan language localization maintenance is not needed to be conducted through independent processing of the software one by one, management cost of Chinese Tibetan language maintenance of a domestic operating system is effectively reduced, the translation service of a source entry data set can be provided for obtaining of new translation data of the source entry by means of interactive web services, translation work of translators is facilitated, and the new translation data obtained through translation of professional translators are translated by the professional translators. In addition, the translation management and evaluation method of the Chinese Tibetan language data under the domestic operating system also provides a Tibetan language translation quality evaluation scheme based on a natural language understanding technology, can provide a translation correctness report, and can feed the translation correctness report back to a translator through an interactive web service, thereby being beneficial to optimizing the Tibetan language translation quality.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a method for translation management and evaluation of Chinese and Tibetan language data under a native operating system according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating the detailed steps of step 1 in the method according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating the detailed steps of step 2 in the method according to an embodiment of the present invention;
FIG. 4 is a flow chart illustrating the specific steps of step 3 of the method according to an embodiment of the present invention;
FIG. 5 is a flow chart illustrating the detailed steps of step 4 in the method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a translation management and evaluation system for data in Tibetan language under a native operating system according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In the following description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific aspects of embodiments of the invention or by which embodiments of the invention may be practiced. It is to be understood that embodiments of the invention may be utilized in other respects, and include structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
It should be understood that although steps are illustrated in the disclosed embodiments as numbered for ease of understanding, the numbers do not represent the order in which the steps are performed, nor do they represent that the steps numbered in order must be performed together. It should be understood that one or several of the steps numbered in sequence may be individually performed to solve the corresponding technical problem and achieve a predetermined technical solution. Even though illustrated in the figures as a plurality of steps listed together, does not necessarily indicate that the steps must be performed together; the figures are merely exemplary to list the steps together for ease of understanding.
Referring to fig. 1, the method for translation management and evaluation of chinese-Tibetan language data under a domestic operating system of the present invention at least includes the following steps:
step 1: analyzing all software source codes under a domestic operating system based on Linux to obtain source entries and constructing a source entry data set;
step 2: responding to the updating of the translation data of the source entry, and constructing a translation mapping relation between newly-added translation data and the source entry;
and step 3: according to the translation mapping relation, cross-software inspection of the source vocabulary entries to be processed is carried out in the domestic operating system, and translation data are updated or added for the software with the detected source vocabulary entries to be processed;
and 4, step 4: and performing translation quality evaluation on the newly added translation data in the translation mapping relation, and outputting an evaluation result of the translation correctness.
The method of the embodiment aims to establish Tibetan language data association among software in the domestic operating system, carry out overall management on tens of millions of Tibetan language data in the domestic operating system, and effectively reduce the management cost of Chinese Tibetan language maintenance of the domestic operating system.
Specifically, in an embodiment of the present invention, referring to fig. 2, in step 1, the following steps are included:
step 101: acquiring a software source code of a domestic operating system full range based on Linux by software source code version control integration;
step 102: analyzing a software source code, acquiring all source entries, and forming a source entry data set;
step 103: and analyzing the language category and the data existence type of the source entry data set of the Chinese and Tibetan.
In step 1 of this embodiment, a source entry is obtained by analyzing all software source codes in the domestic operating system, and by translating the source entry, all Tibetan language data in a large amount of software in the domestic operating system can be completely recorded, and repeated translation of the Tibetan language data can be avoided.
Specifically, in an embodiment of the present invention, referring to fig. 3, in step 2, the following steps are included:
step 201: providing translation service for a translator through interactive web service, and recording newly added translation data translated by the translator;
step 202: after translation is carried out by a translator, the missing and old translation mapping relation in the software is updated, and the mapping relation is stored in a memory;
step 203: and refreshing the old source entry and the Tibetan language data set through text matching.
In step 2 of this embodiment, for the acquisition of the new translation data of the source entry, a translation service of a source entry data set can be provided by means of an interactive web service, which is convenient for a translator to perform translation work, all the work of the new translation data obtained by a professional translator through translation is established on a default principle, that is, the quality of the new translation data is better than that of the old translation data, the high association between the source entry and the translation data is realized through a translation mapping relationship, and the rapid search of the Tibetan language data of the cross-software can be realized through the translation mapping relationship.
In one embodiment of the present invention, see fig. 4, in step 3, the following steps are included:
step 301: checking cross-software target translation data according to the translation data mapping relation;
step 302: for software with a source entry to be processed detected to exist in the range of an operating system, detecting whether translation data of the source entry to be processed exists in the software, and if the translation data does not exist, adding translation data of the source entry newly according to a translation mapping relation; if the translation data exist, comparing whether the original translation data in the software is consistent with the newly added translation data in the translation mapping relation or not, if not, updating by adopting the newly added translation data, and if so, abandoning the newly added translation data;
step 303: and submitting the updated or newly added translation data to the corresponding software source code for updating.
In step 3 of this embodiment, cross-software inspection of the source vocabulary entry to be processed is performed in the domestic operating system according to the translation mapping relationship, and for the software in which the source vocabulary entry to be processed is detected, translation data is updated or newly added, so that synchronous update of cross-software Tibetan language translation data is realized, thereby realizing overall management of Tibetan language translation data, avoiding the need of processing one by one software individually for Tibetan language localization maintenance, and effectively reducing the management cost of Chinese Tibetan language maintenance of the domestic operating system.
In one embodiment of the present invention, see fig. 5, in step 4, the following steps are included:
step 401: acquiring newly added translation data to be analyzed from a static translation storage through a translation mapping relation;
step 402: performing semantic correlation analysis on the newly added translation data to be analyzed; if the semantic similarity does not reach a set threshold value, marking the translation correctness probability of the translation data to be analyzed as 0; if the difference reaches the threshold value, continuing to perform translation data difference calculation;
step 403: analyzing the difference between the newly added translation data and the standard translation sample, and calculating the translation correctness probability of the newly added translation data according to the difference value;
step 404: and summarizing the correctness probabilities of all the source entries in the software into a software translation correctness report and outputting the report.
In step 4 of this embodiment, a Tibetan language translation quality assessment scheme based on a natural language understanding technology is provided, a translation correctness report is provided through analysis of semantic relevance and difference based on natural language understanding, and the translation correctness report can be fed back to a translator through an interactive web service, which is helpful for optimizing Tibetan language translation quality.
The embodiment provides a unified coordination, management, integrated construction and evaluation management method for the Tibetan language support of the domestic operating system by providing interactive Tibetan language data construction and translation, synchronizing Tibetan language translation data of cross-software and evaluating the translation quality based on natural language understanding, and realizes the large-scale and integrated management of the Tibetan language data of the domestic operating system software.
In another embodiment of the invention, in step 1, specifically, a software source code of a home-made operating system full range is obtained through software source code version control integration, a Tibetan language text data set is provided through formats such as XML spaadsheet, gettext MO/PO, CSV, Qt Linguist and the like, and is used for constructing a source entry data set and analyzing language types and data existence types of the source entry data sets of the chinese and Tibetan languages;
in step 2, importing a software source code of a domestic operating system into the interactive web service through protocols such as Git and the like, providing translation service for a translator through the interactive web service, and collecting new translation data translated by the translator;
for a source entry with newly added translation data, identifying an ISO 639 language identification code of the source entry, analyzing whether the Tibetan language of a source entry data set is a legal natural language, and providing error reporting or warning in the interactive web service if the source entry is not the natural language of the legal Tibetan language;
for the source vocabulary entry passing the legality detection, detecting whether the target source vocabulary entry has translation in the related software;
if the translation data exist, updating the associated translation mapping relation, and refreshing a Tibetan language translation database;
if not, a new translation mapping relation is constructed, so that the interactive software hiding language data construction is realized.
In step 3, when a translation mapping relation is updated or newly established, a related translation updating event is generated, the translation updating event is obtained through the connector, and the source entry to be processed is obtained through analysis from the translation updating event;
detecting whether translation data of a source entry to be processed exist in all software in a domestic operating system, and if the translation data do not exist, adding the translation data of the source entry according to a translation mapping relation; if the translation data exist, comparing whether the original translation data in the software is consistent with the newly added translation data in the translation mapping relation or not, if not, updating by adopting the newly added translation data, and if so, abandoning the newly added translation data;
and submitting the updated or newly added translation data to the corresponding software source code for updating, so that the synchronization of the Tibetan language data of the cross-software is realized.
In step 4, based on the natural language understanding technology, performing translation quality evaluation on newly added translation data in the translation mapping relationship, and outputting an evaluation result of translation correctness, specifically including:
acquiring newly added translation data to be analyzed from a static translation storage through a translation mapping relation;
performing semantic correlation analysis on newly added translation data in the translation mapping relation, and marking the translation correctness probability of the translation data to be analyzed as 0 if the semantic similarity does not reach a set threshold value; if the difference reaches the threshold value, continuing to perform translation data difference calculation;
the threshold value for evaluating the semantic correlation analysis can be a statistical result from a standard translation data set, the standard translation data set is a comparison sample provided by published books and professional news, nearly fifty thousand Chinese characters exist, about fifty thousand vocabularies exist, 7 ten thousand Tibetan syllables exist, the combination situation of the vocabularies is more complex, and the standard translation data set is a small-scale sample and cannot cover the actual scene of natural language; therefore, in the embodiment, the translation mapping relationship is constructed by other ways, such as manual translation by a professional, and the standard translation data set is only used for providing a threshold value for evaluating the translation correctness.
Then analyzing the difference between the newly added translation data and the standard translation sample, and calculating according to the difference value to obtain the translation correctness probability of the newly added translation data;
in this embodiment, the edit distance of the syllable may be calculated and processed by using methods such as Levenshtein edit distance and hamming distance, the difference between sentence texts may be analyzed to perform semantic correlation analysis, the difference between newly added translation data and a standard translation sample may also be calculated according to the edit distance of the syllable, and methods such as Levenshtein edit distance and hamming distance may be used, and the relationship between the translation correctness probability and the difference value follows that the difference value is larger and the correctness probability is lower.
Summarizing the translation correctness probabilities of all source entries in the software and outputting a software translation correctness report;
the software translation correctness report is used for representing translation quality, and is fed back to the translator through the interactive web service to help the translator optimize the translation quality.
The method in the embodiment is applied to the desktop edition of the Galaxy kylin operating system, management, development and evaluation of 513 system components and 9.4 ten thousand Tibetan vocabulary entries in the current edition are completed in a short period, a Galaxy kylin operating system Tibetan desktop operating system product is constructed, large-scale deployment is carried out in all units of Tibetan areas such as Tibet, unified integrated management and quality evaluation are carried out on Chinese Tibetan language translation data of all software of the Galaxy kylin operating system desktop edition, a convenient and scientific Tibetan language supporting method of the Galaxy kylin operating system is provided through integrated construction support, unified management and natural language understanding, and the problems of difficult collaboration, management and integrated construction of software Tibetan language translation in the Galaxy kylin operating system are solved.
Referring to fig. 6, in an embodiment of the present invention, a translation management and evaluation system for data in chinese and Tibetan languages under a domestic operating system is further provided, including:
the source entry acquisition module 1 is used for analyzing all software source codes under a domestic operating system based on Linux, obtaining source entries and constructing a source entry data set;
the translation mapping relation building module 2 is used for building a translation mapping relation between newly-added translation data and the source entry according to the obtained update of the translation data of the source entry;
the updating module 3 is used for performing cross-software to-be-processed source entry inspection in a domestic operating system according to the translation mapping relation, and updating or increasing translation data for the software with the detected to-be-processed source entry;
and the evaluation module 4 is used for evaluating the translation quality of the newly added translation data and outputting an evaluation result of the translation correctness.
In an embodiment of the present invention, there is also provided a computer apparatus, including: comprising a processor, a memory, and a program;
the program is stored in the memory, and the processor calls the program stored in the memory to execute the translation management and evaluation method of the Chinese and Tibetan language data under the domestic operating system.
The computer apparatus may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to realize the translation management and evaluation method of the Chinese and Tibetan language data under the domestic operating system. The display screen of the computer device can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer device can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer device, an external keyboard, a touch pad or a mouse and the like.
The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory is used for storing programs, and the processor executes the programs after receiving the execution instructions.
The processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like. The Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment of the present invention, a computer-readable storage medium is further provided, where the computer-readable storage medium is used for storing a program, and the program is used for executing the translation management and evaluation method for chinese-Tibetan language data under the domestic operating system.
As will be appreciated by one of skill in the art, embodiments of the present invention may be provided as a method, computer apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations of methods, computer apparatus, or computer program products according to embodiments of the invention. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart.
The present invention provides a translation management and evaluation method, system, computer device, and computer readable storage medium for chinese-Tibetan language data under a domestic operating system, and the application of the method, system, computer device, and computer readable storage medium is described in detail above, and a specific example is applied in the present document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. The translation management and evaluation method of the Chinese Tibetan language data under the domestic operating system is characterized by comprising the following steps of:
analyzing all software source codes under a domestic operating system based on Linux to obtain source entries and constructing a source entry data set;
responding to the updating of the translation data of the source entry, and constructing a translation mapping relation between newly-added translation data and the source entry;
according to the translation mapping relation, cross-software inspection of the source vocabulary entries to be processed is carried out in the domestic operating system, and translation data are updated or added for the software with the detected source vocabulary entries to be processed;
and evaluating the translation quality of the newly added translation data, and outputting an evaluation result of the translation correctness.
2. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 1, wherein the update of the translation data of the source entry is obtained by: and providing a translation service of the source entry data set through the interactive web service, and recording newly added translation data translated by the translator.
3. The method for translation management and evaluation of Tibetan language data under the domestic operating system of claim 1 or 2, wherein the constructing of the translation mapping relationship between the newly added translation data and the source entry specifically comprises the following steps:
performing language validity check on the source entries, detecting whether the source entries have a source entry data set or not for detecting valid source entries, and taking the source entries which are detected to be illegal or not in the source entry data set as illegal data for abandoning;
and for the source entries which pass the language legality detection and exist in the source entry data set, continuously detecting whether the source entries have translation data or not, if the source entries have the translation data, updating the associated translation mapping relation, and if the source entries do not have the translation data, constructing a new translation mapping relation.
4. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 3, wherein: when the source entry is subjected to language validity check, if the source entry is not a legal natural language of Chinese or Tibetan, an error report or warning is provided in the interactive web service.
5. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 3, wherein: when the source vocabulary entry of the Tibetan language is subjected to language validity check, whether the source vocabulary entry is a legal natural language or not is analyzed by identifying the ISO 639 language identification code of the text data of the Tibetan language.
6. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 3, wherein: and responding to the updating or new building of the translation mapping relation, producing a related translation updating event, responding to the generation of the translation updating event, analyzing the translation updating event to obtain a source entry to be processed, and checking the source entry to be processed in all software in the domestic operating system.
7. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 6, wherein: the updating or adding of the translation data to the software with the detected source entry to be processed specifically comprises the following steps:
for software with a source entry to be processed detected, detecting whether translation data of the source entry to be processed exists in the software, and if the translation data does not exist, adding translation data of the source entry according to a translation mapping relation; if the translation data exist, comparing whether the original translation data in the software is consistent with the newly added translation data in the translation mapping relation or not, if not, updating by adopting the newly added translation data, and if so, abandoning the newly added translation data;
and submitting the updated or newly added translation data to the corresponding software source code for updating.
8. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 2, wherein: the method for evaluating the translation quality of the newly added translation data and outputting the evaluation result of the translation correctness comprises the following steps:
performing semantic correlation analysis on newly added translation data in the translation mapping relation, and marking the translation correctness probability of the translation data to be analyzed as 0 if the semantic similarity does not reach a set threshold value; if the difference reaches the threshold value, continuing to perform translation data difference calculation;
analyzing the difference between the newly added translation data and the standard translation sample, and calculating the translation correctness probability of the newly added translation data according to the difference value;
summarizing the translation correctness probabilities of all source vocabulary entries in a domestic operating system, and outputting a software translation correctness report.
9. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 8, wherein: the software translation correctness report is fed back to the translator through the interactive web service for optimizing the translation quality.
10. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 1, wherein: the method is realized based on a translation management and evaluation system of the Chinese Tibetan language data under a domestic operating system, and the translation management and evaluation system of the Chinese Tibetan language data under the domestic operating system comprises the following communication connection:
the source entry acquisition module is used for analyzing all software source codes under a domestic operating system based on Linux, acquiring source entries and constructing a source entry data set;
the translation mapping relation building module is used for building a translation mapping relation between newly-added translation data and the source entry according to the obtained update of the translation data of the source entry;
the updating module is used for performing cross-software to-be-processed source entry check in a domestic operating system according to the translation mapping relation and updating or increasing translation data for the software with the detected to-be-processed source entry;
and the evaluation module is used for evaluating the translation quality of the newly added translation data and outputting an evaluation result of the translation correctness.
11. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 1, wherein: the method is implemented on the basis of a computer arrangement comprising: comprising a processor, a memory, and a program;
the program is stored in the memory, and the processor calls the program stored in the memory to execute the translation management and evaluation method of the data of the Tibetan language under the domestic operating system of claim 1.
12. The method for translation management and evaluation of Tibetan language data under the homemade operating system of claim 1, wherein: the method is implemented based on a computer-readable storage medium for storing a program for executing the method for translation management and evaluation of chinese-Tibetan language data under a domestic operating system of claim 1.
CN202210155863.0A 2022-02-21 2022-02-21 Translation management and evaluation method for Chinese Tibetan language data under domestic operating system Active CN114217901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210155863.0A CN114217901B (en) 2022-02-21 2022-02-21 Translation management and evaluation method for Chinese Tibetan language data under domestic operating system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210155863.0A CN114217901B (en) 2022-02-21 2022-02-21 Translation management and evaluation method for Chinese Tibetan language data under domestic operating system

Publications (2)

Publication Number Publication Date
CN114217901A true CN114217901A (en) 2022-03-22
CN114217901B CN114217901B (en) 2022-07-29

Family

ID=80709065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210155863.0A Active CN114217901B (en) 2022-02-21 2022-02-21 Translation management and evaluation method for Chinese Tibetan language data under domestic operating system

Country Status (1)

Country Link
CN (1) CN114217901B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007056807A1 (en) * 2005-11-18 2007-05-24 Robert Arthur Crewdson Computer software development system and method
CN103793322A (en) * 2012-11-05 2014-05-14 深圳中兴网信科技有限公司 Test method and test system for translation problems in software localization testing
US20140180670A1 (en) * 2012-12-21 2014-06-26 Maria Osipova General Dictionary for All Languages
CN104346153A (en) * 2013-07-31 2015-02-11 国际商业机器公司 Method and system for translating text information of application programs
CN110187887A (en) * 2019-05-24 2019-08-30 广东飞企互联科技股份有限公司 Automatic translating method and system for software development
US20200073948A1 (en) * 2018-08-31 2020-03-05 Samsung Electronics Co., Ltd. Method and apparatus with sentence mapping
CN112115063A (en) * 2020-09-29 2020-12-22 腾讯科技(深圳)有限公司 Software localization test method, device, terminal and storage medium
CN114064185A (en) * 2021-11-26 2022-02-18 中国铁道科学研究院集团有限公司通信信号研究所 International language switching design method for dispatching centralized system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007056807A1 (en) * 2005-11-18 2007-05-24 Robert Arthur Crewdson Computer software development system and method
CN103793322A (en) * 2012-11-05 2014-05-14 深圳中兴网信科技有限公司 Test method and test system for translation problems in software localization testing
US20140180670A1 (en) * 2012-12-21 2014-06-26 Maria Osipova General Dictionary for All Languages
CN104346153A (en) * 2013-07-31 2015-02-11 国际商业机器公司 Method and system for translating text information of application programs
US20200073948A1 (en) * 2018-08-31 2020-03-05 Samsung Electronics Co., Ltd. Method and apparatus with sentence mapping
CN110187887A (en) * 2019-05-24 2019-08-30 广东飞企互联科技股份有限公司 Automatic translating method and system for software development
CN112115063A (en) * 2020-09-29 2020-12-22 腾讯科技(深圳)有限公司 Software localization test method, device, terminal and storage medium
CN114064185A (en) * 2021-11-26 2022-02-18 中国铁道科学研究院集团有限公司通信信号研究所 International language switching design method for dispatching centralized system

Also Published As

Publication number Publication date
CN114217901B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
US8296124B1 (en) Method and apparatus for detecting incorrectly translated text in a document
CN109522552B (en) Normalization method and device of medical information, medium and electronic equipment
US20110288852A1 (en) Dynamic bi-phrases for statistical machine translation
US20190228058A1 (en) Intelligent Verification of Presentation of a User Interface
US11403465B2 (en) Systems and methods for report processing
KR101509727B1 (en) Apparatus for creating alignment corpus based on unsupervised alignment and method thereof, and apparatus for performing morphological analysis of non-canonical text using the alignment corpus and method thereof
CN111736840A (en) Compiling method and running method of applet, storage medium and electronic equipment
CN115081440B (en) Method, device and equipment for recognizing variant words in text and extracting original sensitive words
US11836069B2 (en) Methods and systems for assessing functional validation of software components comparing source code and feature documentation
CN111178064A (en) Information pushing method and device based on field word segmentation processing and computer equipment
US20180314683A1 (en) Method and device for processing natural language
CN114217901B (en) Translation management and evaluation method for Chinese Tibetan language data under domestic operating system
CN111240971B (en) Method and device for generating wind control rule test case, server and storage medium
US11994980B2 (en) Method, device and computer program product for application testing
US11042257B1 (en) Translating application resources and inspecting the functionality of the resource translations
CN113792138B (en) Report generation method and device, electronic equipment and storage medium
CN112735465B (en) Invalid information determination method and device, computer equipment and storage medium
EP3719676A1 (en) Language processing method and device
CN115600038A (en) Page rendering method, device, equipment and medium
CN111401009B (en) Digital expression character recognition conversion method, device, server and storage medium
CN114896269A (en) Structured query statement detection method and device, electronic equipment and storage medium
CN114116268A (en) Method and device for checking Flink SQL statement, computer equipment and storage medium
CN113033177A (en) Method and device for analyzing electronic medical record data
US20230306196A1 (en) System and method for spelling correction
CN116483377B (en) Code detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant