US20200372218A1 - Data-driven automated selection of profiles of translation professionals for translation tasks - Google Patents

Data-driven automated selection of profiles of translation professionals for translation tasks Download PDF

Info

Publication number
US20200372218A1
US20200372218A1 US16/989,818 US202016989818A US2020372218A1 US 20200372218 A1 US20200372218 A1 US 20200372218A1 US 202016989818 A US202016989818 A US 202016989818A US 2020372218 A1 US2020372218 A1 US 2020372218A1
Authority
US
United States
Prior art keywords
profiles
translation
electronic document
previous translations
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/989,818
Inventor
Artem Ukrainets
Vladimir Gusakov
Ivan Smolnikov
Elena TUZHILINA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smartcat LLC
Original Assignee
Smartcat LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smartcat LLC filed Critical Smartcat LLC
Priority to US16/989,818 priority Critical patent/US20200372218A1/en
Assigned to SMARTCAT LLC reassignment SMARTCAT LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GUSAKOV, VLADIMIR, SMOLNIKOV, IVAN, TUZHILINA, ELENA, UKRAINETS, ARTEM
Publication of US20200372218A1 publication Critical patent/US20200372218A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/123Storage facilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/51Translation evaluation
    • G06K9/00456
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Definitions

  • This instant specification relates to data-driven automated selection of profiles of translation professionals for translation tasks.
  • Machine translation programs Computer programs that translate automatically from one language to another (“machine translation programs”) can in principle meet this need and such programs have been developed and are in continued development for a variety of languages.
  • machine translation programs For formal (as opposed to informal, idiomatic, colloquial) content in well-studied languages (e.g., English, French, Spanish, German, and others), such machine translation programs work reasonably well.
  • FIG. 1 is a schematic diagram that shows an example of a system for data-driven automated selection of profiles of translation professionals for translation tasks.
  • FIGS. 2A-D are flow chart that show examples of processes for data-driven automated selection of profiles of translation professionals for translation tasks.
  • FIG. 3 is a schematic diagram that shows an example of a computing system.
  • This document describes systems and techniques for data-driven automated selection of profiles of translation professionals (e.g., translators, editors, proofreaders, or interpreters) for translation tasks. This may be achieved by one or more processors executing instructions stored in one or more memories of a first process for automated selection of translation professionals experienced in a subject area to which content of an electronic document to be translated pertains, a second process for automated evaluation of translation qualities for the profiles associated with the translation professionals, and a third process for automated planning of translation resources and workflow of the translation professionals.
  • translation professionals e.g., translators, editors, proofreaders, or interpreters
  • a system can provide subject area to translator professional matching with higher accuracy than prior systems.
  • the system may provide fully automated subject area to translator professional matching without manual or empirical adjustment of parameters used to match subject areas to translator professionals.
  • the system may base the quality evaluation on machine learning using a model that is trained on editor evaluations of the translation work product to predict the quality.
  • the system may provide fully automated quality evaluation of a profile of a translator professional without manual or empirical adjustment of parameters used in the quality evaluation.
  • FIG. 1 is a schematic diagram that shows an example of a system 100 for data-driven automated selection of profiles of translation professionals for translation tasks.
  • the system 100 includes a translation system 102 in communication with a client system 124 and multiple translator systems 104 a - c over a network 106 , such as local area network, a wide area network, or one or more of the computing devices that make up the Internet.
  • the translator systems 104 a - c are used by multiple translation professionals 108 a - c to translate electronic documents at the direction of the translation system 102 .
  • the translation system 102 may receive a request to translate an electronic document from the client system 124 , for example, through at least one interface device 110 to the network 106 .
  • the interface device 110 provides communication between the translation system 102 and the network 106 or networks used to communicate with the client system 124 and the translator systems 104 a - c .
  • the request may include the electronic document (or an address that the translation system 102 or another system may use to retrieve the electronic document), an identification of a source language of the electronic document, and/or an identification of a target language to which content of the electronic document is to be translated.
  • the translation system 102 further includes at least one processor 112 , at least one memory 114 , and at least one data storage device 116 .
  • the memory 114 stores instructions for one or more modules, such as a selection module 118 , an evaluation module 120 , and a workflow module 122 .
  • the processor 112 executes the instructions of the modules to perform the operations described herein.
  • the translation professionals are each associated with a profile that may be stored, for example, at the translation system 102 in the data storage device 116 .
  • the processor 112 may execute the instructions of the selection module 118 to select ones of the profiles associated with the translation professionals to perform translation for the electronic document.
  • the processor 112 may execute the instructions of the evaluation module 120 to evaluate qualities of translations previously performed by the profiles associated with the translation professionals.
  • the processor 112 may execute the instructions of the workflow module 122 to make a final selection of ones the profiles to translate the electronic document based on the translation qualities and resource and/or workflow parameters.
  • the translation system 102 may then assign and/or notify the selected profiles of the translation to be performed for the electronic document.
  • the translation system 102 may provide the electronic document, or at least a portion thereof, to ones of the translator systems 104 a - c for the selected ones of the profiles.
  • the ones of the translator systems 104 a - c receive the translations from the translation professionals and provide the translations to the translation system 102 .
  • the translation system 102 receives the translations and provides a final translation of the electronic document, based on the received translations, to the client system 124 .
  • FIGS. 2A-D are flow charts that show examples of processes for data-driven automated selection of profiles of translation professionals for translation tasks, in accordance with some aspects of the same disclosure.
  • the processes may be performed, for example, by a system such as the system 100 .
  • a system such as the system 100 .
  • the description that follows uses the system 100 as an example for describing the processes.
  • another system, or combination of systems may be used to perform the processes.
  • FIG. 2A is a flow chart that shows an example of an overall process 200 for data-driven automated selection from one or more profiles 204 a - c of translation professionals for translation tasks.
  • the overall process 200 may include one or more sub-processes 202 a - c .
  • the first sub-process 202 a may be performed, for example, by the selection module 118 and includes an automated selection of one or more of the profiles 204 a - b of the translation professionals experienced in a subject area to which content of an electronic document 206 to be translated pertains.
  • the second sub-process 202 b may be performed, for example, by the evaluation module 120 and includes an automated evaluation of one or more qualities of translations 208 a - b for the profiles 204 a - b that were selected.
  • the third sub-process 202 c may be performed, for example, by the workflow module 122 and includes an automated planning of translation resources and workflow of the translation professionals.
  • the sub-processes 202 a - c may be mutually interconnected.
  • the third sub-process 202 c may be based on the qualities of translations 208 a - b from the second sub-process 202 b , and only the professionals identified during the first sub-process 202 a may take part in the second sub-process 202 b .
  • the sub-processes 202 a - c may occur in another order, such as a reverse order.
  • a system may use completed translations to evaluate and update the evaluations of the translation professionals who participated in a translation project, glossaries and corpora used in the project may be updated, and selection of relevant translation professionals for subsequent texts may be improved.
  • the resource and workflow planning of the third sub-process 202 c may include one or more factors 210 for the translation professionals, such as a cost charged by each translation professional for the translation, an estimated amount of time taken by each translation professional to perform the translation, and the qualities of translations 208 a - b associated with each of the translation professionals.
  • the translation system 102 may store the parameters for the cost, time, and the qualities of translations 208 a - b for each of the profiles in the data storage device 116 .
  • the workflow module 122 may calculate the cost for each translation professional for a translation project based on a rate indicated by the translation professional in the profile associated with the translation professional.
  • the workflow module 122 may calculate the amount of time taken by each translation professional via a sub-system that monitors the work of the translation professionals associated with each profile in real time with a cloud-based architecture.
  • the workflow module 122 may grade or evaluate the compliance of each translation professional with the assigned task using algorithms for textual analysis and machine learning.
  • the workflow module 122 uses the qualities of translations 208 a - b from the evaluation to further refine the list of the profiles 204 a - b to be used for the translation.
  • the workflow module 122 may use one or more client requirements 212 provided by the client system 124 when grading or evaluating the compliance of each translation professional with the assigned task, such as when the translation is due to the client system 124 or what levels of the qualities of translations 208 a - b are acceptable for the client system 124 .
  • the translation system 102 may then cause a translation process 214 to occur using the finally selected ones of the profiles 204 a - b and the planned workflow.
  • FIG. 2B is a flow chart that shows an example of the first sub-process 202 a for automated selection of translation professionals experienced in a subject area to which content of an electronic document to be translated pertains.
  • the selection module 118 may perform the first sub-process 202 a to select profiles for translation professionals who are conversant in the subject area of the content of the electronic document to be translated (since, for example, a translation professional who works with legal texts may not be competent at handling technical documents).
  • the selection module 118 narrows down the pool of potential translation professionals to optimize the time needed for further selection and optimization during the second sub-process 202 b and the third sub-process 202 c .
  • the selection module 118 selects one or more of the profiles 204 a - c of the translation professionals 108 a - c based on content of one or more previous translations of electronic documents 224 that is in a same subject area as the content of the electronic document 206 to be translated.
  • a lack of subject-area knowledge and terminology by a translation professional may be a primary cause of translation errors and low quality of translations.
  • the first sub-process 202 a may apply one or more of the following stages to define a set of profiles of translation professionals from which the final profiles of the translation professionals for the translation project will be selected.
  • the first sub-process 202 a may include, at box 220 , pre-processing of text from the electronic document 206 to be translated and/or the previous translations of electronic documents 224 .
  • the selection module 118 may perform a syntactic and morphological filtering of the text of the previous translations of electronic documents 224 .
  • the filtering may include, for example, stripping of metadata, tags, and formatting from the text; marking up of parts of speech in the text; and/or extraction of root forms of words from the text.
  • the first sub-process 202 a may include, at box 226 , extraction of terminology from the electronic documents in the translation system 102 that have previously been translated by the profiles.
  • the extraction may include creation of a common glossary based on the extracted terms and individual glossaries for each of the profiles for the terms translated by each profile.
  • the common glossary and/or the individual glossaries reduce an amount of data to be analyzed and enables building criteria for selecting translators based on the knowledge of the translation professionals of a specific set of terms.
  • the selection module 118 may perform the extraction of the terminology by performing a linguistic filtering.
  • the linguistic filtering may include an identification of candidate terms (e.g., potential glossary entries from the text) by searching for words and phrases that fit certain patterns, such as a noun pattern, an adjective and noun pattern, a gerund and noun pattern, and/or a noun and noun pattern, etc.
  • the selection module 118 may perform the extraction of the terminology by performing a calculation of quantitative characteristics (C-Value) for each candidate term from the text using, for example, the following calculation:
  • the selection module 118 may use two different approaches to select the profiles 204 a - b of the translation professionals 108 a - b , a simplified approach and a thematic approach.
  • the selection module 118 may select the approach to use based on the volume of the previous translations of electronic documents 224 associated with the profiles 204 a - c of the translation professionals 108 a - c and the electronic document 206 to be translated.
  • the selection module 118 may select the simplified approach for low volumes. For the simplified approach, the selection module 118 may select the profiles 204 a - b of the translation professionals 108 a - b using a term-by-term comparison of the terms extracted from the electronic document 206 to be translated with the terms extracted from the previous translations of electronic documents 224 . For each term extracted from the electronic document 206 , the selection module 118 may calculate how many times the term is found in the electronic document 206 to identify one or more terminology frequency vectors, a i , . . . , a k .
  • the selection module 118 may, at box 232 , calculate a numerical value of a proximity of the terms in the electronic document 206 to the terms from the previous translations of electronic documents 224 using the following calculation:
  • w i , . . . , w k are one or more terminology frequency vectors 234 , each of a particular term in the previous translations of electronic documents 224 by a profile of a translation professional, T.
  • the selection module 118 may select the thematic (or subject) approach for high volumes.
  • the selection module 118 may classify, at box 230 , the terms from the electronic document 206 and/or the previous translations of electronic documents 224 into one or more classes.
  • the selection module 118 may determine the classes of the terms based on matching and/or comparing each of the terms to a term associated with a subject area, for example, at a particular level of a subject tree.
  • the selection module 118 may automatically classify the terms based on machine learning clustering that maximizes a distance between clusters of the terms.
  • the selection module 118 may assign an identifier to the clusters, such as a number, and each of the terms may be assigned the identifier of the cluster to which the term belongs. Each cluster may then be considered a quasi-subject area.
  • the selection module 118 may represent each electronic document in the corpus of the previous translations of electronic documents 224 by a subject vector. For each of the previous translations of electronic documents 224 in the corpus, the selection module 118 may calculate a frequency of the appearance of the term in particular clusters. The selection module 118 represents each previous translation of an electronic document (associated with a particular profile) by a subject vector that is the number of clusters in which the terms appear.
  • the selection module 118 may calculate, at box 232 , the proximity between the subject vector of the electronic document 206 and the subject vectors of all of the previous translations of electronic documents 224 by the profiles 204 a - c .
  • the selection module 118 may determine the proximity or similarity between the subject vector of the electronic document 206 and each of the subject vectors of the previous translations of electronic documents 224 using the following calculation for cosine similarity between two vectors:
  • A may be the subject vector of the electronic document 206 and each of the subject vectors of the previous translations of electronic documents 224 may be B, and where A i and B i are the components of the vectors A and B, respectively.
  • the selection module 118 may exclude ones of the profiles 204 for subject vectors that are located far from the subject vector of the electronic document 206 (e.g., have a low proximity value) from further processing in order to reduce the number of the selected ones of the profiles 204 a - c and to reduce a computational load on the translation system 102 .
  • the selection module 118 may select the simplified approach for remaining ones of the profiles 204 a - c that do not have high volumes.
  • the selection module 118 may, at box 228 , re-build the terminology space of the terminology frequency vectors 234 as translations of additional electronic documents are associated with the profiles 204 a - c .
  • the selection module 118 may also update the glossaries with new terms from the additional electronic documents.
  • the selection module 118 may then select, at box 236 , ones of the profiles 204 a - c based on the proximities of the terms for the profiles 204 a - c to the terms from the electronic document 206 for the simplified approach or based on the proximities of the subject vectors for the profiles 204 a - c to the subject vector from the electronic document 206 for the thematic approach. For example, the selection module 118 may select a particular number of the profiles 204 a - b that have highest proximities and/or a threshold level of the proximity.
  • FIG. 2C is a flow chart that shows an example of the second sub-process 202 b for automated evaluation of translation qualities for the profiles associated with the translation professionals.
  • the second sub-process 202 b formally characterizes and quantifies the qualities of translations 208 a - b for the profiles 204 a - c of the translation professionals 108 a - c .
  • the previous translations of electronic documents 224 may, for example, contain errors of different types, such as typos, grammatical errors, and/or incorrect terminology.
  • the evaluation module 120 may use information regarding the errors to identify the qualities of translations 208 a - b .
  • the translation system 102 may then use the qualities of translations 208 a - b for future translations to select from the profiles 204 a - c of the translation professionals 108 a - c .
  • quantitative characteristics associated with a profile of a particular translation professional may affect a client requirement, such as a due date for a translation or a cost of a translation (since correcting mistakes may take additional time and is often comparable to re-translating the electronic document 206 ).
  • the evaluation module 120 may use this information to predict the qualities of translations 208 a - b for the profiles 204 a - b and to select the profiles 204 a - b of the most qualified ones of the translation professionals 108 a - c to translate the electronic document 206 .
  • the evaluation module 120 may evaluate multiple aspects of ones of the previous translations of electronic documents 224 for each of the profiles 204 a - c to calculate a corresponding one of the qualities of translations 208 a - b as well as a predicted quality level for future translations.
  • the aspects may include an analysis, at box 240 a , of low-level data for each segment of a translation.
  • the evaluation module 120 may analyze time spent by the profile of the translation professional working on the translation of the segment, a number of actions taken by the profile of the translation professional to translate the segment, and a type of each correction made at each stage of the translation by the profile of the translation professional (e.g., corrections by an editor for the translation system 102 after the translation professional or corrections by the client system 124 after the editor).
  • the aspects may include an analysis, at box 240 b , of compliance between the translated terms and the project glossary and/or automatically generated terms based on a subject analysis.
  • the evaluation module 120 may determine that a threshold number or rate of translated terms for a profile do not appear in project glossaries and/or automatically generated terms for the electronic documents being translated.
  • the evaluation module 120 may compare the translated terms to terms in the project glossary for the translation project to determine how many of the translated terms do not appear in the glossary and to check the consistency of the translated terms.
  • the evaluation module 120 may add extracted terms with commonly used translations of the extracted terms, which surpass a particular threshold frequency.
  • the evaluation module 120 may use a lower weight for the commonly used translations than for the other translated terms.
  • the evaluation module 120 may only use the extracted terms, for example, if there is no project glossary.
  • the aspects may include an analysis, at box 240 c , of a set of linguistic descriptors.
  • the evaluation module 120 may analyze an average length of sentences in the translations, a variety and/or variability of a vocabulary in the translations, or a complexity of text in the translations, etc.
  • the aspects may include an analysis, at box 240 d , of results of automatic quality assurance (QA) checks.
  • the evaluation module 120 may analyze results of automatic checks for spelling, grammar, punctuation, tag structure and order, consistency of placeholders, extra and/or double spaces, contextual matches control, correct transfer of dates and numerical parameters, case control, multi-source and multi-target checks, or repeating words, etc.
  • the aspects may include an analysis, at box 240 e , of reviewer corrections.
  • the evaluation module 120 may analyze reviewer corrections as detailed ratings with classifications by error types.
  • the aspects may include an analysis, at box 240 f , of reviewer evaluations.
  • the evaluation module 120 may analyze reviewer evaluations as a composite evaluation as per a predefined quality rating.
  • the aspects may include an analysis, at box 240 g , of translation tests passed by the translation professionals 108 a - c associated with the profiles 204 a - c in different subject areas, which may be a manual evaluation.
  • the translation tests may be performed over a constant set of texts, therefore the method of evaluation and test samples are not varied so that the evaluation module 120 may compare the test results for the profiles 204 a - c to one another.
  • the translation system 102 may store translation data for each of the profiles 204 a - c , for example, in the data storage device 116 .
  • the translation data may include, for each of the profiles 204 a - c and for each electronic document translated by the translation professional associated with the profile, a source text to be translated and a corresponding translated text that are split into segments, the low-level data, the results of the automatic QA checks, and/or the set of linguistic descriptors.
  • the translation system 102 may store results of corrections by reviewers (e.g., an amount of corrections) at the next stage of the translation workflow.
  • the translation system 102 may store results of evaluation ratings by reviewers in one or more aspects (e.g., precision, language, and/or style) according to a particular rating scale.
  • the translation system 102 may store evaluations of the translations for the profiles 204 a - c according to the formal Language Quality Assurance (LQA) procedure with definitions of the types of mistakes found.
  • LQA formal Language Quality Assurance
  • the second sub-process 202 b includes generating multiple machine-learning models 242 a - c .
  • the evaluation module 120 may compile the first machine-learning model 242 a to evaluate a correlation between automatically measured parameters, at boxes 240 a - d , and human corrections done by reviewers at an editing stage, at boxes 240 e .
  • the evaluation module 120 may compile the second machine-learning model 242 b to evaluate a correlation between the human corrections, at box 240 e , and the human quality evaluation, at box 240 f
  • the evaluation module 120 may build the third machine-learning model 242 c for correlation between the automatically measured parameters, at box 240 g , and the human quality evaluation, at box 240 f.
  • the correlations in the first machine-learning model 242 a are used, at box 244 , to project or predict a number of corrections for each of the profiles 204 a - c for each translation.
  • the correlations in the second machine-learning model 242 b are used, at box 246 , to project or predict evaluations for each of the profiles 204 a - c for each translation.
  • the second machine-learning model 242 b may be used for project or predict evaluations for a profile even though the profile is not associated with any corrections at the editing stage.
  • the correlations in the third machine-learning model 242 c are used, at box 248 , to determine a final evaluation and quality projection for a profile.
  • the machine-learning models 242 a - c may be validated and improved iteratively based on the results of the formal LQA procedure, the translation tests, and other new data (e.g., evaluations, or corrections, etc.).
  • FIG. 2D is a flow chart that shows an example of the third sub-process 202 c for automated planning of translation resources and workflow of the profiles associated with the translation professionals.
  • the third sub-process 202 c for automated planning of translation resources and workflow incorporates the results obtained in the first sub-process 202 a (automated selection of the profiles 204 a - b of the translation professionals 108 a - c by subject area) and the second sub-process 202 b (evaluation of the qualities of translations 208 a - b ).
  • the workflow module 122 may prepare, at box 250 , a plan for implementation of the project to translate the electronic document 206 based on the client requirements 212 (e.g., translation materials, deadline for the translation, required quality of the translation, allowed cost of the translation, etc.) in a way to optimize existing resources (e.g., the translation professionals 108 a - c , who have certain limitations of their own, such as possible translation speed, availability, and language knowledge in subject area of the electronic document 206 ).
  • the client requirements 212 e.g., translation materials, deadline for the translation, required quality of the translation, allowed cost of the translation, etc.
  • existing resources e.g., the translation professionals 108 a - c , who have certain limitations of their own, such as possible translation speed, availability, and language knowledge in subject area of the electronic document 206 .
  • the workflow module 122 may split the translation project into multiple separate parts (if the workflow module 122 determines that this is optimal) and distribute the parts to multiple ones of the translation professionals 108 a - c (e.g., translators, editors, and/or proofreaders). For example, the workflow module 122 may take into account, at box 254 , the workload of each of the translation professionals both in the real time and predicted for the time-frame of the translation project based on the current work-in-progress projects assigned to the profiles 204 a - c of the translation professionals 108 a - c and translation speed of each of the translation professionals 108 a - c as identified in the profiles 204 a - c .
  • the workflow module 122 may allow for work to occur in parallel (e.g., by multiple ones of the translation professionals 108 a - c ) at multiple stages (e.g., translation, editing, and/or proofreading).
  • the workflow module 122 selects, at box 260 , an optimal choice from the profiles 204 a - c of the translation professionals 108 a - c for each of the workflow stages (e.g., translation, editing, and proofreading).
  • the workflow module 122 may select a two-stage/translation-editing (TE) workflow or a three-stage/translation-editing-proofreading (TEP) workflow to ensure high quality of the translation.
  • the workflow module 122 may assign a single one of the profiles 204 a - c to the translation to perform all of the stages of the translation.
  • the workflow module 122 may select one of the profiles 204 a - c to assign the translation to based on the profiles 204 a - b selected by the selection module 118 and the qualities of translations 208 a - b provided by the evaluation module 120 .
  • the workflow module 122 may select one of the profiles 204 a - c to assign the editing to (e.g., a profile that has greater qualifications than the profile assigned to the translation) for comparing the source text of the electronic document 206 to the translation generated by the selected translator.
  • the workflow module 122 may select one of the profiles 204 a - c to assign the proofreading to (e.g., who may review only the translation and corrects small errors of style, typos, or formatting, etc.).
  • the workflow module 122 suggests, at box 258 , possible variants for the translation workflow, such as the number of stages and the number of the profiles 204 a - c based on the client requirements 212 .
  • the client requirements 212 may explicitly indicate the number of or specific translation stages to be used, or a user of the client system 124 may select one of the options offered by the translation system 102 , in which case the workflow module 122 selects the corresponding project workflow.
  • the workflow module 122 may remove or suggest removal of one of the subsequent stages from the workflow (e.g., editing or proofreading).
  • the workflow module 122 may compensate for removal of the stage by replacing the translation professional assigned to the first-stage/translation with a profile of a translation professional that has a higher quality of translation than the original profile that was assigned.
  • the workflow module 122 may determine, at box 254 , the timing of delivering the translation project to the client system 124 .
  • the timing of the delivery may be based on the number of the profiles 204 a - c that have been assigned to the translation of the electronic document 206 .
  • the workflow module 122 may divide up the translation into multiple segments and separately assign the segments to multiple ones of the profiles 204 a - c (e.g., segments of the translation may be performed in parallel) to reduce the amount of time needed to complete the translation.
  • the workflow module 122 may have an effective lower limit for words assigned to a single translation professional of approximately 250 words (e.g., one translation page). In some implementations, this lower limit may be the number of words that can typically be translated by the average translation professional in one hour. In some implementations, the client requirements 212 for urgent projects typically require no less than one hour for completion.
  • the workflow module 122 may create, at box 254 , a work calendar to take current translator availability into account during the selection process.
  • the workflow module 122 may allocate time in a work calendar for each of the profiles 204 a - c .
  • the workflow module 122 may estimate the amount of time each of the profiles 204 a - c may work and the amount of time each translation task may take for the translation professional associated with the profile.
  • the workflow module 122 may identify the difference between the amount of time a profile may work and the amount of work assigned to the profile as an available workload.
  • the workflow module 122 may find “hidden reserves” of underutilized downtime during which the workflow module 122 may assign more translation tasks to the translation professionals 108 a - c.
  • the translation system 102 may provide a user interface to the client system 124 with three workflow options that correspond to different project completion times.
  • the completion time may be maximized to reasonable extent.
  • the workflow module 122 may calculate the maximum completion time (in days) using an average daily output (e.g., 2,000, 4,000, and 12,000 words for translators, editors, and proofreaders, respectively) and the selected number of stages.
  • the completion time may be minimized. If selected, the workflow module 122 may minimize the completion time by assigning more of the translation professionals 108 a - c to each translation stage and/or by performing fewer translation stages.
  • the workflow module 122 may calculate the maximum number of translation professionals, N max , as:
  • N max 2 ⁇ ln( W )
  • the workflow module 122 uses an average between the maximum completion time under the first option and the minimum completion time under the second option.
  • the result of the sub-processes 202 a - c is a set of parameters for completing the translation of the electronic document 206 including which stages of the workflow will be performed, a selected set of the profiles 204 a - c of the translation professionals 108 a - c to be assigned to specific stages and segments of the text of the electronic document 206 , a volume of work and work plan for each of the selected ones of the profiles 204 a - c , and a time of completion/project delivery schedule.
  • the evaluation module 120 may reevaluate and store the qualities of translations 208 a - b and productivities for each of the profiles 204 a - c in the data storage device 116 .
  • FIG. 3 is a schematic diagram that shows an example of a machine in the form of a computer system 300 .
  • the computer system 300 executes one or more sets of instructions 326 that cause the machine to perform any one or more of the methodologies discussed herein.
  • the machine may operate in the capacity of a server or a client machine in client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • the machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile telephone, a web appliance, a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • PC personal computer
  • PDA personal digital assistant
  • the computer system 300 includes a processor 302 , a main memory 304 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.), a static memory 306 (e.g., flash memory, static random access memory (SRAM), etc.), and a data storage device 316 , which communicate with each other via a bus 308 .
  • main memory 304 e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • RDRAM Rambus DRAM
  • static memory 306 e.g., flash memory, static random access memory (SRAM), etc.
  • SRAM static random access memory
  • the processor 302 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processor 302 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets.
  • the processor 302 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like.
  • the processor 302 is configured to execute instructions of the selection module 118 , the evaluation module 120 , and/or the workflow module 122 for performing the operations and steps discussed herein.
  • the computer system 300 may further include a network interface device 322 that provides communication with other machines over a network 318 , such as a local area network (LAN), an intranet, an extranet, or the Internet.
  • the computer system 300 also may include a display device 310 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)), an alphanumeric input device 312 (e.g., a keyboard), a cursor control device 314 (e.g., a mouse), and a signal generation device 320 (e.g., a speaker).
  • a display device 310 e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)
  • an alphanumeric input device 312 e.g., a keyboard
  • a cursor control device 314 e.g., a mouse
  • a signal generation device 320 e.g., a speaker
  • the data storage device 316 may include a computer-readable storage medium 324 on which is stored the sets of instructions 326 of the selection module 118 , the evaluation module 120 , and/or the workflow module 122 embodying any one or more of the methodologies or functions described herein.
  • the sets of instructions 326 of the selection module 118 , the evaluation module 120 , and/or the workflow module 122 may also reside, completely or at least partially, within the main memory 304 and/or within the processor 302 during execution thereof by the computer system 300 , the main memory 304 and the processor 302 also constituting computer-readable storage media.
  • the sets of instructions 326 may further be transmitted or received over the network 318 via the network interface device 322 .
  • While the example of the computer-readable storage medium 324 is shown as a single medium, the term “computer-readable storage medium” can include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the sets of instructions 326 .
  • the term “computer-readable storage medium” can include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure.
  • the term “computer-readable storage medium” can include, but not be limited to, solid-state memories, optical media, and magnetic media.
  • the present disclosure also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including a floppy disk, an optical disk, a compact disc read-only memory (CD-ROM), a magnetic-optical disk, a read-only memory (ROM), a random access memory (RAM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a magnetic or optical card, or any type of media suitable for storing electronic instructions.
  • a computer readable storage medium such as, but not limited to, any type of disk including a floppy disk, an optical disk, a compact disc read-only memory (CD-ROM), a magnetic-optical disk, a read-only memory (ROM), a random
  • example or “exemplary” are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the words “example” or “exemplary” is intended to present concepts in a concrete fashion.
  • the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless specified otherwise, or clear from context, “X includes A or B” is intended to mean any of the natural inclusive permutations.

Abstract

The subject matter of this specification can be implemented in, among other things, a method that includes storing previous translations of electronic documents for profiles of translation professionals. The method includes receiving a request to translate an electronic document. The method includes selecting ones of the profiles as being experienced in at least one subject area of the electronic document based on a proximity of terms or subject areas in the electronic documents translated by the ones of the profiles to terms or the subject area of the electronic document. The method includes evaluating qualities of the previous translations for each of the selected ones of the profiles. The method includes planning a workflow for translation of the electronic document based on the selected ones of the profiles and the qualities of the previous translations. The method includes causing the electronic document to be translated according to the planned workflow.

Description

    RELATED APPLICATIONS
  • This patent application is a continuation of U.S. patent application Ser. No. 15/782,004, filed Jun. 6, 2018 which claims the benefit under 35 U.S.C. § 371 of International Patent Application No. PCT/US2017/049771, filed Aug. 31, 2017, wherein the entire contents of each are hereby incorporated by reference.
  • TECHNICAL FIELD
  • This instant specification relates to data-driven automated selection of profiles of translation professionals for translation tasks.
  • BACKGROUND
  • Information gathering and exchange for any scientific, commercial, political or social purpose often requires fast and easy translation of content in order to make the universe of knowledge and ideas useful on a global scale. Computer programs that translate automatically from one language to another (“machine translation programs”) can in principle meet this need and such programs have been developed and are in continued development for a variety of languages. For formal (as opposed to informal, idiomatic, colloquial) content in well-studied languages (e.g., English, French, Spanish, German, and others), such machine translation programs work reasonably well.
  • However, for more-difficult or less-studied languages (e.g., Arabic), existing machine translation programs do not work well, even for formal communications (e.g., Modem Standard Arabic), and they are particularly weak in the case of informal, colloquial, and idiomatic communications. Similarly, where specificity is needed, machine translation by itself is insufficient even for well-studied languages (e.g., English, French, Spanish, German, and others). Human translators can in principle provide accurate translations for difficult languages and informal communications, but Internet applications require constant availability and quick response, which cannot be assured in the case of existing methods that use human translators.
  • DESCRIPTION OF DRAWINGS
  • FIG. 1 is a schematic diagram that shows an example of a system for data-driven automated selection of profiles of translation professionals for translation tasks.
  • FIGS. 2A-D are flow chart that show examples of processes for data-driven automated selection of profiles of translation professionals for translation tasks.
  • FIG. 3 is a schematic diagram that shows an example of a computing system.
  • DETAILED DESCRIPTION
  • This document describes systems and techniques for data-driven automated selection of profiles of translation professionals (e.g., translators, editors, proofreaders, or interpreters) for translation tasks. This may be achieved by one or more processors executing instructions stored in one or more memories of a first process for automated selection of translation professionals experienced in a subject area to which content of an electronic document to be translated pertains, a second process for automated evaluation of translation qualities for the profiles associated with the translation professionals, and a third process for automated planning of translation resources and workflow of the translation professionals.
  • The systems and techniques described here may provide one or more of the following advantages. First, a system can provide subject area to translator professional matching with higher accuracy than prior systems. The system may provide fully automated subject area to translator professional matching without manual or empirical adjustment of parameters used to match subject areas to translator professionals. Rather than basing a quality evaluation of a profile of a translation professional on a rate of corrections by editors of translation work associated with the profile, the system may base the quality evaluation on machine learning using a model that is trained on editor evaluations of the translation work product to predict the quality. The system may provide fully automated quality evaluation of a profile of a translator professional without manual or empirical adjustment of parameters used in the quality evaluation.
  • The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.
  • FIG. 1 is a schematic diagram that shows an example of a system 100 for data-driven automated selection of profiles of translation professionals for translation tasks. The system 100 includes a translation system 102 in communication with a client system 124 and multiple translator systems 104 a-c over a network 106, such as local area network, a wide area network, or one or more of the computing devices that make up the Internet. The translator systems 104 a-c are used by multiple translation professionals 108 a-c to translate electronic documents at the direction of the translation system 102.
  • The translation system 102 may receive a request to translate an electronic document from the client system 124, for example, through at least one interface device 110 to the network 106. The interface device 110 provides communication between the translation system 102 and the network 106 or networks used to communicate with the client system 124 and the translator systems 104 a-c. The request may include the electronic document (or an address that the translation system 102 or another system may use to retrieve the electronic document), an identification of a source language of the electronic document, and/or an identification of a target language to which content of the electronic document is to be translated.
  • The translation system 102 further includes at least one processor 112, at least one memory 114, and at least one data storage device 116. The memory 114 stores instructions for one or more modules, such as a selection module 118, an evaluation module 120, and a workflow module 122. The processor 112 executes the instructions of the modules to perform the operations described herein.
  • The translation professionals are each associated with a profile that may be stored, for example, at the translation system 102 in the data storage device 116. The processor 112 may execute the instructions of the selection module 118 to select ones of the profiles associated with the translation professionals to perform translation for the electronic document. The processor 112 may execute the instructions of the evaluation module 120 to evaluate qualities of translations previously performed by the profiles associated with the translation professionals. The processor 112 may execute the instructions of the workflow module 122 to make a final selection of ones the profiles to translate the electronic document based on the translation qualities and resource and/or workflow parameters.
  • The translation system 102 may then assign and/or notify the selected profiles of the translation to be performed for the electronic document. The translation system 102 may provide the electronic document, or at least a portion thereof, to ones of the translator systems 104 a-c for the selected ones of the profiles. The ones of the translator systems 104 a-c receive the translations from the translation professionals and provide the translations to the translation system 102. The translation system 102 receives the translations and provides a final translation of the electronic document, based on the received translations, to the client system 124.
  • FIGS. 2A-D are flow charts that show examples of processes for data-driven automated selection of profiles of translation professionals for translation tasks, in accordance with some aspects of the same disclosure. The processes may be performed, for example, by a system such as the system 100. For clarity of presentation, the description that follows uses the system 100 as an example for describing the processes. However, another system, or combination of systems, may be used to perform the processes.
  • FIG. 2A is a flow chart that shows an example of an overall process 200 for data-driven automated selection from one or more profiles 204 a-c of translation professionals for translation tasks. The overall process 200 may include one or more sub-processes 202 a-c. The first sub-process 202 a may be performed, for example, by the selection module 118 and includes an automated selection of one or more of the profiles 204 a-b of the translation professionals experienced in a subject area to which content of an electronic document 206 to be translated pertains. The second sub-process 202 b may be performed, for example, by the evaluation module 120 and includes an automated evaluation of one or more qualities of translations 208 a-b for the profiles 204 a-b that were selected. The third sub-process 202 c may be performed, for example, by the workflow module 122 and includes an automated planning of translation resources and workflow of the translation professionals.
  • The sub-processes 202 a-c may be mutually interconnected. For example, the third sub-process 202 c may be based on the qualities of translations 208 a-b from the second sub-process 202 b, and only the professionals identified during the first sub-process 202 a may take part in the second sub-process 202 b. In some implementations, the sub-processes 202 a-c may occur in another order, such as a reverse order. For example, a system may use completed translations to evaluate and update the evaluations of the translation professionals who participated in a translation project, glossaries and corpora used in the project may be updated, and selection of relevant translation professionals for subsequent texts may be improved.
  • The resource and workflow planning of the third sub-process 202 c may include one or more factors 210 for the translation professionals, such as a cost charged by each translation professional for the translation, an estimated amount of time taken by each translation professional to perform the translation, and the qualities of translations 208 a-b associated with each of the translation professionals. The translation system 102 may store the parameters for the cost, time, and the qualities of translations 208 a-b for each of the profiles in the data storage device 116. The workflow module 122 may calculate the cost for each translation professional for a translation project based on a rate indicated by the translation professional in the profile associated with the translation professional. The workflow module 122 may calculate the amount of time taken by each translation professional via a sub-system that monitors the work of the translation professionals associated with each profile in real time with a cloud-based architecture.
  • The workflow module 122 may grade or evaluate the compliance of each translation professional with the assigned task using algorithms for textual analysis and machine learning. The workflow module 122 uses the qualities of translations 208 a-b from the evaluation to further refine the list of the profiles 204 a-b to be used for the translation. The workflow module 122 may use one or more client requirements 212 provided by the client system 124 when grading or evaluating the compliance of each translation professional with the assigned task, such as when the translation is due to the client system 124 or what levels of the qualities of translations 208 a-b are acceptable for the client system 124. The translation system 102 may then cause a translation process 214 to occur using the finally selected ones of the profiles 204 a-b and the planned workflow.
  • FIG. 2B is a flow chart that shows an example of the first sub-process 202 a for automated selection of translation professionals experienced in a subject area to which content of an electronic document to be translated pertains. The selection module 118 may perform the first sub-process 202 a to select profiles for translation professionals who are conversant in the subject area of the content of the electronic document to be translated (since, for example, a translation professional who works with legal texts may not be competent at handling technical documents). The selection module 118 narrows down the pool of potential translation professionals to optimize the time needed for further selection and optimization during the second sub-process 202 b and the third sub-process 202 c. The selection module 118 selects one or more of the profiles 204 a-c of the translation professionals 108 a-c based on content of one or more previous translations of electronic documents 224 that is in a same subject area as the content of the electronic document 206 to be translated. In some implementations, a lack of subject-area knowledge and terminology by a translation professional may be a primary cause of translation errors and low quality of translations. The first sub-process 202 a may apply one or more of the following stages to define a set of profiles of translation professionals from which the final profiles of the translation professionals for the translation project will be selected.
  • The first sub-process 202 a may include, at box 220, pre-processing of text from the electronic document 206 to be translated and/or the previous translations of electronic documents 224. For example, the selection module 118 may perform a syntactic and morphological filtering of the text of the previous translations of electronic documents 224. The filtering may include, for example, stripping of metadata, tags, and formatting from the text; marking up of parts of speech in the text; and/or extraction of root forms of words from the text.
  • The first sub-process 202 a may include, at box 226, extraction of terminology from the electronic documents in the translation system 102 that have previously been translated by the profiles. The extraction may include creation of a common glossary based on the extracted terms and individual glossaries for each of the profiles for the terms translated by each profile. In some implementations, the common glossary and/or the individual glossaries reduce an amount of data to be analyzed and enables building criteria for selecting translators based on the knowledge of the translation professionals of a specific set of terms.
  • For example, the selection module 118 may perform the extraction of the terminology by performing a linguistic filtering. The linguistic filtering may include an identification of candidate terms (e.g., potential glossary entries from the text) by searching for words and phrases that fit certain patterns, such as a noun pattern, an adjective and noun pattern, a gerund and noun pattern, and/or a noun and noun pattern, etc.
  • The selection module 118 may perform the extraction of the terminology by performing a calculation of quantitative characteristics (C-Value) for each candidate term from the text using, for example, the following calculation:
  • C - Value * ( a ) = log 2 a + const * ( f ( a ) - 1 T a b T a f ( b ) ) ,
  • where |a| is the number of words in candidate term a, Ta are the candidate terms containing candidate term a, f (a) is the frequency of candidate term a, and |Ta| is the number of candidate terms containing candidate term a. A high C-Value indicates a high likelihood that the candidate term is significant enough to be added to the common glossary and/or an individual glossary.
  • The selection module 118 may use two different approaches to select the profiles 204 a-b of the translation professionals 108 a-b, a simplified approach and a thematic approach. The selection module 118 may select the approach to use based on the volume of the previous translations of electronic documents 224 associated with the profiles 204 a-c of the translation professionals 108 a-c and the electronic document 206 to be translated.
  • The selection module 118 may select the simplified approach for low volumes. For the simplified approach, the selection module 118 may select the profiles 204 a-b of the translation professionals 108 a-b using a term-by-term comparison of the terms extracted from the electronic document 206 to be translated with the terms extracted from the previous translations of electronic documents 224. For each term extracted from the electronic document 206, the selection module 118 may calculate how many times the term is found in the electronic document 206 to identify one or more terminology frequency vectors, ai, . . . , ak.
  • For each of the profiles 204 a-c of the translation professionals 108 a-c, the selection module 118 may, at box 232, calculate a numerical value of a proximity of the terms in the electronic document 206 to the terms from the previous translations of electronic documents 224 using the following calculation:

  • Q Ti=1 kln(a i+1)·ln(w i+1)
  • , where wi, . . . , wk are one or more terminology frequency vectors 234, each of a particular term in the previous translations of electronic documents 224 by a profile of a translation professional, T.
  • Alternatively or in addition, the selection module 118 may select the thematic (or subject) approach for high volumes. For the thematic approach, the selection module 118 may classify, at box 230, the terms from the electronic document 206 and/or the previous translations of electronic documents 224 into one or more classes. The selection module 118 may determine the classes of the terms based on matching and/or comparing each of the terms to a term associated with a subject area, for example, at a particular level of a subject tree. Alternatively or in addition, the selection module 118 may automatically classify the terms based on machine learning clustering that maximizes a distance between clusters of the terms. Once the terms have been clustered, the selection module 118 may assign an identifier to the clusters, such as a number, and each of the terms may be assigned the identifier of the cluster to which the term belongs. Each cluster may then be considered a quasi-subject area.
  • The selection module 118 may represent each electronic document in the corpus of the previous translations of electronic documents 224 by a subject vector. For each of the previous translations of electronic documents 224 in the corpus, the selection module 118 may calculate a frequency of the appearance of the term in particular clusters. The selection module 118 represents each previous translation of an electronic document (associated with a particular profile) by a subject vector that is the number of clusters in which the terms appear.
  • The selection module 118 may calculate, at box 232, the proximity between the subject vector of the electronic document 206 and the subject vectors of all of the previous translations of electronic documents 224 by the profiles 204 a-c. The selection module 118 may determine the proximity or similarity between the subject vector of the electronic document 206 and each of the subject vectors of the previous translations of electronic documents 224 using the following calculation for cosine similarity between two vectors:
  • similarity = cos ( θ ) = A · B A B = i = 1 n A i × B i i = 1 n ( A i ) 2 × i = 1 n ( B i ) 2 ,
  • where A may be the subject vector of the electronic document 206 and each of the subject vectors of the previous translations of electronic documents 224 may be B, and where Ai and Bi are the components of the vectors A and B, respectively. The selection module 118 may exclude ones of the profiles 204 for subject vectors that are located far from the subject vector of the electronic document 206 (e.g., have a low proximity value) from further processing in order to reduce the number of the selected ones of the profiles 204 a-c and to reduce a computational load on the translation system 102. The selection module 118 may select the simplified approach for remaining ones of the profiles 204 a-c that do not have high volumes.
  • The selection module 118 may, at box 228, re-build the terminology space of the terminology frequency vectors 234 as translations of additional electronic documents are associated with the profiles 204 a-c. The selection module 118 may also update the glossaries with new terms from the additional electronic documents.
  • Once proximities are determined under either the simplified approach or the thematic approach, the selection module 118 may then select, at box 236, ones of the profiles 204 a-c based on the proximities of the terms for the profiles 204 a-c to the terms from the electronic document 206 for the simplified approach or based on the proximities of the subject vectors for the profiles 204 a-c to the subject vector from the electronic document 206 for the thematic approach. For example, the selection module 118 may select a particular number of the profiles 204 a-b that have highest proximities and/or a threshold level of the proximity.
  • FIG. 2C is a flow chart that shows an example of the second sub-process 202 b for automated evaluation of translation qualities for the profiles associated with the translation professionals. The second sub-process 202 b formally characterizes and quantifies the qualities of translations 208 a-b for the profiles 204 a-c of the translation professionals 108 a-c. The previous translations of electronic documents 224 may, for example, contain errors of different types, such as typos, grammatical errors, and/or incorrect terminology. The evaluation module 120, for example, may use information regarding the errors to identify the qualities of translations 208 a-b. The translation system 102 may then use the qualities of translations 208 a-b for future translations to select from the profiles 204 a-c of the translation professionals 108 a-c. In some implementations, quantitative characteristics associated with a profile of a particular translation professional may affect a client requirement, such as a due date for a translation or a cost of a translation (since correcting mistakes may take additional time and is often comparable to re-translating the electronic document 206). The evaluation module 120 may use this information to predict the qualities of translations 208 a-b for the profiles 204 a-b and to select the profiles 204 a-b of the most qualified ones of the translation professionals 108 a-c to translate the electronic document 206. The evaluation module 120 may evaluate multiple aspects of ones of the previous translations of electronic documents 224 for each of the profiles 204 a-c to calculate a corresponding one of the qualities of translations 208 a-b as well as a predicted quality level for future translations.
  • The aspects may include an analysis, at box 240 a, of low-level data for each segment of a translation. For example, the evaluation module 120 may analyze time spent by the profile of the translation professional working on the translation of the segment, a number of actions taken by the profile of the translation professional to translate the segment, and a type of each correction made at each stage of the translation by the profile of the translation professional (e.g., corrections by an editor for the translation system 102 after the translation professional or corrections by the client system 124 after the editor).
  • The aspects may include an analysis, at box 240 b, of compliance between the translated terms and the project glossary and/or automatically generated terms based on a subject analysis. For example, the evaluation module 120 may determine that a threshold number or rate of translated terms for a profile do not appear in project glossaries and/or automatically generated terms for the electronic documents being translated. The evaluation module 120 may compare the translated terms to terms in the project glossary for the translation project to determine how many of the translated terms do not appear in the glossary and to check the consistency of the translated terms. In addition, the evaluation module 120 may add extracted terms with commonly used translations of the extracted terms, which surpass a particular threshold frequency. In some implementations, the evaluation module 120 may use a lower weight for the commonly used translations than for the other translated terms. In some implementations, the evaluation module 120 may only use the extracted terms, for example, if there is no project glossary.
  • The aspects may include an analysis, at box 240 c, of a set of linguistic descriptors. For example, the evaluation module 120 may analyze an average length of sentences in the translations, a variety and/or variability of a vocabulary in the translations, or a complexity of text in the translations, etc.
  • The aspects may include an analysis, at box 240 d, of results of automatic quality assurance (QA) checks. For example, the evaluation module 120 may analyze results of automatic checks for spelling, grammar, punctuation, tag structure and order, consistency of placeholders, extra and/or double spaces, contextual matches control, correct transfer of dates and numerical parameters, case control, multi-source and multi-target checks, or repeating words, etc.
  • The aspects may include an analysis, at box 240 e, of reviewer corrections. For example, the evaluation module 120 may analyze reviewer corrections as detailed ratings with classifications by error types.
  • The aspects may include an analysis, at box 240 f, of reviewer evaluations. For example, the evaluation module 120 may analyze reviewer evaluations as a composite evaluation as per a predefined quality rating.
  • The aspects may include an analysis, at box 240 g, of translation tests passed by the translation professionals 108 a-c associated with the profiles 204 a-c in different subject areas, which may be a manual evaluation. In some implementations, the translation tests may be performed over a constant set of texts, therefore the method of evaluation and test samples are not varied so that the evaluation module 120 may compare the test results for the profiles 204 a-c to one another.
  • The translation system 102 may store translation data for each of the profiles 204 a-c, for example, in the data storage device 116. The translation data may include, for each of the profiles 204 a-c and for each electronic document translated by the translation professional associated with the profile, a source text to be translated and a corresponding translated text that are split into segments, the low-level data, the results of the automatic QA checks, and/or the set of linguistic descriptors. The translation system 102 may store results of corrections by reviewers (e.g., an amount of corrections) at the next stage of the translation workflow. The translation system 102 may store results of evaluation ratings by reviewers in one or more aspects (e.g., precision, language, and/or style) according to a particular rating scale. In some implementations, the translation system 102 may store evaluations of the translations for the profiles 204 a-c according to the formal Language Quality Assurance (LQA) procedure with definitions of the types of mistakes found.
  • To automatically evaluate the qualities of translations 208 a-b, the second sub-process 202 b includes generating multiple machine-learning models 242 a-c. For example, the evaluation module 120 may compile the first machine-learning model 242 a to evaluate a correlation between automatically measured parameters, at boxes 240 a-d, and human corrections done by reviewers at an editing stage, at boxes 240 e. The evaluation module 120 may compile the second machine-learning model 242 b to evaluate a correlation between the human corrections, at box 240 e, and the human quality evaluation, at box 240 f The evaluation module 120 may build the third machine-learning model 242 c for correlation between the automatically measured parameters, at box 240 g, and the human quality evaluation, at box 240 f.
  • The correlations in the first machine-learning model 242 a are used, at box 244, to project or predict a number of corrections for each of the profiles 204 a-c for each translation. The correlations in the second machine-learning model 242 b are used, at box 246, to project or predict evaluations for each of the profiles 204 a-c for each translation. In some implementations, the second machine-learning model 242 b may be used for project or predict evaluations for a profile even though the profile is not associated with any corrections at the editing stage. The correlations in the third machine-learning model 242 c are used, at box 248, to determine a final evaluation and quality projection for a profile. The machine-learning models 242 a-c may be validated and improved iteratively based on the results of the formal LQA procedure, the translation tests, and other new data (e.g., evaluations, or corrections, etc.).
  • FIG. 2D is a flow chart that shows an example of the third sub-process 202 c for automated planning of translation resources and workflow of the profiles associated with the translation professionals. The third sub-process 202 c for automated planning of translation resources and workflow incorporates the results obtained in the first sub-process 202 a (automated selection of the profiles 204 a-b of the translation professionals 108 a-c by subject area) and the second sub-process 202 b (evaluation of the qualities of translations 208 a-b). For example, the workflow module 122 may prepare, at box 250, a plan for implementation of the project to translate the electronic document 206 based on the client requirements 212 (e.g., translation materials, deadline for the translation, required quality of the translation, allowed cost of the translation, etc.) in a way to optimize existing resources (e.g., the translation professionals 108 a-c, who have certain limitations of their own, such as possible translation speed, availability, and language knowledge in subject area of the electronic document 206).
  • The workflow module 122 may split the translation project into multiple separate parts (if the workflow module 122 determines that this is optimal) and distribute the parts to multiple ones of the translation professionals 108 a-c (e.g., translators, editors, and/or proofreaders). For example, the workflow module 122 may take into account, at box 254, the workload of each of the translation professionals both in the real time and predicted for the time-frame of the translation project based on the current work-in-progress projects assigned to the profiles 204 a-c of the translation professionals 108 a-c and translation speed of each of the translation professionals 108 a-c as identified in the profiles 204 a-c. The workflow module 122 may allow for work to occur in parallel (e.g., by multiple ones of the translation professionals 108 a-c) at multiple stages (e.g., translation, editing, and/or proofreading).
  • The workflow module 122 selects, at box 260, an optimal choice from the profiles 204 a-c of the translation professionals 108 a-c for each of the workflow stages (e.g., translation, editing, and proofreading). In some implementations, the workflow module 122 may select a two-stage/translation-editing (TE) workflow or a three-stage/translation-editing-proofreading (TEP) workflow to ensure high quality of the translation. In some implementations, the workflow module 122 may assign a single one of the profiles 204 a-c to the translation to perform all of the stages of the translation. The workflow module 122 may select one of the profiles 204 a-c to assign the translation to based on the profiles 204 a-b selected by the selection module 118 and the qualities of translations 208 a-b provided by the evaluation module 120. The workflow module 122 may select one of the profiles 204 a-c to assign the editing to (e.g., a profile that has greater qualifications than the profile assigned to the translation) for comparing the source text of the electronic document 206 to the translation generated by the selected translator. The workflow module 122 may select one of the profiles 204 a-c to assign the proofreading to (e.g., who may review only the translation and corrects small errors of style, typos, or formatting, etc.).
  • In some implementations, the workflow module 122 suggests, at box 258, possible variants for the translation workflow, such as the number of stages and the number of the profiles 204 a-c based on the client requirements 212. For example, the client requirements 212 may explicitly indicate the number of or specific translation stages to be used, or a user of the client system 124 may select one of the options offered by the translation system 102, in which case the workflow module 122 selects the corresponding project workflow. If the workflow module 122 determines that the translation project cannot be performed within a particular timeframe (e.g., as specified in the client requirements 212) with the indicated number of stages, then the workflow module 122 may remove or suggest removal of one of the subsequent stages from the workflow (e.g., editing or proofreading). The workflow module 122 may compensate for removal of the stage by replacing the translation professional assigned to the first-stage/translation with a profile of a translation professional that has a higher quality of translation than the original profile that was assigned.
  • The workflow module 122 may determine, at box 254, the timing of delivering the translation project to the client system 124. The timing of the delivery may be based on the number of the profiles 204 a-c that have been assigned to the translation of the electronic document 206. The workflow module 122 may divide up the translation into multiple segments and separately assign the segments to multiple ones of the profiles 204 a-c (e.g., segments of the translation may be performed in parallel) to reduce the amount of time needed to complete the translation. Conversely, spreading the translation tasks among a larger number of the profiles 204 a-c may, in some implementations, incur other risks, such as refusal by translational professionals who are not interested in translating short texts and a potential for inconsistency in the translation, since different translation professionals may use varying terms, phrasing, and stylistic constructions. In some implementations, the workflow module 122 may have an effective lower limit for words assigned to a single translation professional of approximately 250 words (e.g., one translation page). In some implementations, this lower limit may be the number of words that can typically be translated by the average translation professional in one hour. In some implementations, the client requirements 212 for urgent projects typically require no less than one hour for completion.
  • In some implementations, one or more of the translation professionals may not be willing or able to accept a particular task as the highest-quality translation professionals may be the busiest and least available. Accordingly, the workflow module 122 may create, at box 254, a work calendar to take current translator availability into account during the selection process. The workflow module 122 may allocate time in a work calendar for each of the profiles 204 a-c. The workflow module 122 may estimate the amount of time each of the profiles 204 a-c may work and the amount of time each translation task may take for the translation professional associated with the profile. The workflow module 122 may identify the difference between the amount of time a profile may work and the amount of work assigned to the profile as an available workload. In some implementations, by checking the work calendar to take availability into account, the workflow module 122 may find “hidden reserves” of underutilized downtime during which the workflow module 122 may assign more translation tasks to the translation professionals 108 a-c.
  • If the client requirements 212 do not explicitly indicate a completion time for the translation of the electronic document 206, then the translation system 102 may provide a user interface to the client system 124 with three workflow options that correspond to different project completion times. In the first option, the completion time may be maximized to reasonable extent. If selected, the workflow module 122 may calculate the maximum completion time (in days) using an average daily output (e.g., 2,000, 4,000, and 12,000 words for translators, editors, and proofreaders, respectively) and the selected number of stages. In the second option, the completion time may be minimized. If selected, the workflow module 122 may minimize the completion time by assigning more of the translation professionals 108 a-c to each translation stage and/or by performing fewer translation stages. The workflow module 122 may calculate the maximum number of translation professionals, Nmax, as:

  • N max=2·ln(W)
  • , where W is the number of words in the project. In the third option, if selected by the client system 124, the workflow module 122 uses an average between the maximum completion time under the first option and the minimum completion time under the second option.
  • The result of the sub-processes 202 a-c is a set of parameters for completing the translation of the electronic document 206 including which stages of the workflow will be performed, a selected set of the profiles 204 a-c of the translation professionals 108 a-c to be assigned to specific stages and segments of the text of the electronic document 206, a volume of work and work plan for each of the selected ones of the profiles 204 a-c, and a time of completion/project delivery schedule.
  • When the translation of the electronic document 206 is complete, the source text of the electronic document 206 in the source language and the translated text in the target language are added to the corpus of the previous translations of electronic documents 224. In addition, the evaluation module 120 may reevaluate and store the qualities of translations 208 a-b and productivities for each of the profiles 204 a-c in the data storage device 116.
  • For simplicity of explanation, the processes of this disclosure are depicted and described as a series of acts. However, acts in accordance with this disclosure can occur in various orders and/or concurrently, and with other acts not presented and described herein. Furthermore, not all illustrated acts may be required to implement the processes in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that the processes could alternatively be represented as a series of interrelated states via a state diagram or events. Additionally, it should be appreciated that the processes disclosed in this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such processes to computing devices. The term “article of manufacture,” as used herein, is intended to encompass a computer program accessible from a computer-readable device or storage media.
  • FIG. 3 is a schematic diagram that shows an example of a machine in the form of a computer system 300. The computer system 300 executes one or more sets of instructions 326 that cause the machine to perform any one or more of the methodologies discussed herein. The machine may operate in the capacity of a server or a client machine in client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile telephone, a web appliance, a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute the sets of instructions 326 to perform any one or more of the methodologies discussed herein.
  • The computer system 300 includes a processor 302, a main memory 304 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.), a static memory 306 (e.g., flash memory, static random access memory (SRAM), etc.), and a data storage device 316, which communicate with each other via a bus 308.
  • The processor 302 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processor 302 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processor 302 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. The processor 302 is configured to execute instructions of the selection module 118, the evaluation module 120, and/or the workflow module 122 for performing the operations and steps discussed herein.
  • The computer system 300 may further include a network interface device 322 that provides communication with other machines over a network 318, such as a local area network (LAN), an intranet, an extranet, or the Internet. The computer system 300 also may include a display device 310 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)), an alphanumeric input device 312 (e.g., a keyboard), a cursor control device 314 (e.g., a mouse), and a signal generation device 320 (e.g., a speaker).
  • The data storage device 316 may include a computer-readable storage medium 324 on which is stored the sets of instructions 326 of the selection module 118, the evaluation module 120, and/or the workflow module 122 embodying any one or more of the methodologies or functions described herein. The sets of instructions 326 of the selection module 118, the evaluation module 120, and/or the workflow module 122 may also reside, completely or at least partially, within the main memory 304 and/or within the processor 302 during execution thereof by the computer system 300, the main memory 304 and the processor 302 also constituting computer-readable storage media. The sets of instructions 326 may further be transmitted or received over the network 318 via the network interface device 322.
  • While the example of the computer-readable storage medium 324 is shown as a single medium, the term “computer-readable storage medium” can include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the sets of instructions 326. The term “computer-readable storage medium” can include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. The term “computer-readable storage medium” can include, but not be limited to, solid-state memories, optical media, and magnetic media.
  • In the foregoing description, numerous details are set forth. It will be apparent, however, to one of ordinary skill in the art having the benefit of this disclosure, that the present disclosure may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present disclosure.
  • Some portions of the detailed description have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
  • It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, it is appreciated that throughout the description, discussions utilizing terms such as “identifying”, “providing”, “enabling”, “finding”, “selecting” or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (e.g., electronic) quantities within the computer system memories or registers into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
  • The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including a floppy disk, an optical disk, a compact disc read-only memory (CD-ROM), a magnetic-optical disk, a read-only memory (ROM), a random access memory (RAM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a magnetic or optical card, or any type of media suitable for storing electronic instructions.
  • The words “example” or “exemplary” are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the words “example” or “exemplary” is intended to present concepts in a concrete fashion. As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless specified otherwise, or clear from context, “X includes A or B” is intended to mean any of the natural inclusive permutations. That is, if X includes A; X includes B; or X includes both A and B, then “X includes A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form. Moreover, use of the term “an embodiment” or “one embodiment” or “an implementation” or “one implementation” throughout is not intended to mean the same embodiment or implementation unless described as such. The terms “first,” “second,” “third,” “fourth,” etc. as used herein are meant as labels to distinguish among different elements and may not necessarily have an ordinal meaning according to their numerical designation.
  • It is to be understood that the above description is intended to be illustrative, and not restrictive. Other implementations will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims (20)

What is claimed is:
1. A method comprising:
storing, in a data storage device, a plurality of previous translations of prior electronic documents for a plurality of profiles of translation professionals;
extracting, by at least one processor, a plurality of terms comprising words and patterns of words from the prior electronic documents;
receiving, from a client system, a request to translate a current electronic document from a source language to a target language;
selecting, by the processor, one or more of the plurality of profiles based on a proximity of the plurality of terms extracted from text in the prior electronic documents translated by the one or more of the profiles to extracted terms of the current electronic document;
evaluating, by the processor, qualities of the previous translations of the prior electronic documents for each of the selected one or more of the profiles;
planning, by the processor, a workflow for translation of the current electronic document based on the selected one or more of the profiles and the qualities of the previous translations; and
causing the current electronic document to be translated according to the planned workflow.
2. The method of claim 1, further comprising:
determining, by the processor, a predicted translation time or a predicted translation accuracy of the current electronic document using the qualities of the previous translations for each of the selected one or more of the profiles.
3. The method of claim 2, further comprising:
using the current electronic document and an input profile of the selected one or more of the profiles as input to a machine learning (ML) model; and
obtaining one or more outputs of the ML model, the one or more outputs of the ML model indicating the predicted translation time or the predicted translation accuracy of the current electronic document by the input profile.
4. The method of claim 3, wherein the ML model is trained using training data with training inputs comprising the plurality of previous translations and target outputs for respective training inputs, the target outputs comprising a plurality of corrections associated with the plurality of previous translations.
5. The method of claim 1, further comprising:
training a first machine learning (ML) model to evaluate a correlation between automatically measured parameters and reviewer corrections of the previous translations such that the first trained ML model receives the electronic document as input and outputs data indicating a predicted number of corrections for each of the selected one or more of the profiles to translate the current electronic document;
training a second ML model to evaluate a correlation between the reviewer corrections and human quality evaluations of the previous translations such that the second trained ML model receives the predicted number of corrections as input and outputs data indicating a predicted quality evaluation for each of the selected one or more of the profiles to translate the current electronic document; and
training a third ML model to evaluate a correlation between the automatically measured parameters and the human quality evaluation of the previous translations such that the third trained ML model receives the predicted quality evaluation as input and outputs data indicating a final evaluation and quality projection for each of the selected one or more of the profiles to translate the current electronic document.
6. The method of claim 5, wherein the automatically measured parameters comprises at least one of:
time spent translating segments of the plurality of previous translations by the selected one or more of the profiles;
a number of actions taken by the selected one or more of the profiles to translate the segments of the plurality of the previous translations; or
a type of correction incurred by the selected one or more of the profiles while translating the plurality of previous translations.
7. The method of claim 1, wherein the qualities of the previous translations comprise one or more scores of translation tests associated with the selected one or more of the profiles, wherein the translation tests and the current electronic document both comprise the same subject.
8. A system comprising:
a memory; and
a processing device, communicatively coupled to the memory, to:
store, in a data storage device, a plurality of previous translations of prior electronic documents for a plurality of profiles of translation professionals;
extract a plurality of terms comprising words and patterns of words from the prior electronic documents;
receive, from a client system, a request to translate a current electronic document from a source language to a target language;
select one or more of the plurality of profiles based on a proximity of the plurality of terms extracted from text in the prior electronic documents translated by the one or more of the profiles to extracted terms of the current electronic document;
evaluate qualities of the previous translations of the prior electronic documents for each of the selected one or more of the profiles;
plan a workflow for translation of the current electronic document based on the selected one or more of the profiles and the qualities of the previous translations; and
cause the current electronic document to be translated according to the planned workflow.
9. The system of claim 8, wherein the processing device is further to:
determine a predicted translation time or a predicted translation accuracy of the current electronic document using the qualities of the previous translations for each of the selected one or more of the profiles.
10. The system of claim 9, wherein the processing device is further to:
use the current electronic document and an input profile of the selected one or more of the profiles as input to a machine learning (ML) model; and
obtain one or more outputs of the ML model, the one or more outputs of the ML model indicating the predicted translation time or the predicted translation accuracy of the current electronic document by the input profile.
11. The system of claim 10, wherein the ML model is trained using training data with training inputs comprising the plurality of previous translations and target outputs for respective inputs, the target outputs comprising a plurality of corrections associated with the plurality of previous translations.
12. The system of claim 8, wherein the processing device is further to:
train a first machine learning (ML) model to evaluate a correlation between automatically measured parameters and reviewer corrections of the previous translations such that the first trained ML model receives the electronic document as input and outputs data indicating a predicted number of corrections for each of the selected one or more of the profiles to translate the current electronic document;
train a second ML model to evaluate a correlation between the reviewer corrections and human quality evaluations of the previous translations such that the second trained ML model receives the predicted number of corrections as input and outputs data indicating a predicted quality evaluation for each of the selected one or more of the profiles to translate the current electronic document; and
train a third ML model to evaluate a correlation between the automatically measured parameters and the human quality evaluation of the previous translations such that the third trained ML model receives the predicted quality evaluation as input and outputs data indicating a final evaluation and quality projection for each of the selected one or more of the profiles to translate the current electronic document.
13. The system of claim 12, wherein the automatically measured parameters comprises at least one of:
time spent translating segments of the plurality of previous translations by the selected one or more of the profiles;
a number of actions taken by the selected one or more of the profiles to translate the segments of the plurality of the previous translations; or
a type of correction incurred by the selected one or more of the profiles while translating the plurality of previous translations.
14. The system of claim 8, wherein the qualities of the previous translations comprise one or more scores of translation tests associated with the selected one or more of the profiles, wherein the translation tests and the current electronic document both comprise the same subject.
15. A non-transitory machine-readable storage medium comprising instructions that, when executed by a processing device, cause the processing device to:
store, in a data storage device, a plurality of previous translations of prior electronic documents for a plurality of profiles of translation professionals;
extract a plurality of terms comprising words and patterns of words from the prior electronic documents;
receive, from a client system, a request to translate a current electronic document from a source language to a target language;
select one or more of the plurality of profiles based on a proximity of the plurality of terms extracted from text in the prior electronic documents translated by the one or more of the profiles to extracted terms of the current electronic document;
evaluate qualities of the previous translations of the prior electronic documents for each of the selected one or more of the profiles;
plan a workflow for translation of the current electronic document based on the selected one or more of the profiles and the qualities of the previous translations; and
cause the current electronic document to be translated according to the planned workflow.
16. The non-transitory machine-readable storage medium of claim 15, wherein the processing device is further to:
determine a predicted translation time or a predicted translation accuracy of the current electronic document using the qualities of the previous translations for each of the selected one or more of the profiles.
17. The non-transitory machine-readable storage medium of claim 16, wherein the processing device is further to:
use the current electronic document and an input profile of the selected one or more of the profiles as input to a machine learning (ML) model; and
obtain one or more outputs of the ML model, the one or more outputs of the ML model indicating the predicted translation time or the predicted translation accuracy of the current electronic document by the input profile.
18. The non-transitory machine-readable storage medium of claim 17, wherein the ML model is trained using training data with training inputs comprising the plurality of previous translations and target outputs for respective training inputs, the target outputs comprising a plurality of corrections associated with the plurality of previous translations.
19. The non-transitory machine-readable storage medium of claim 15, wherein the processing device is further to:
train a first machine learning (ML) model to evaluate a correlation between automatically measured parameters and reviewer corrections of the previous translations such that the first trained ML model receives the electronic document as input and outputs data indicating a predicted number of corrections for each of the selected one or more of the profiles to translate the current electronic document;
train a second ML model to evaluate a correlation between the reviewer corrections and human quality evaluations of the previous translations such that the second trained ML model receives the predicted number of corrections as input and outputs data indicating a predicted quality evaluation for each of the selected one or more of the profiles to translate the current electronic document; and
train a third ML model to evaluate a correlation between the automatically measured parameters and the human quality evaluation of the previous translations such that the third trained ML model receives the predicted quality evaluation as input and outputs data indicating a final evaluation and quality projection for each of the selected one or more of the profiles to translate the current electronic document.
20. The non-transitory machine-readable storage medium of claim 19, wherein the automatically measured parameters comprises at least one of:
time spent translating segments of the plurality of previous translations by the selected one or more of the profiles;
a number of actions taken by the selected one or more of the profiles to translate the segments of the plurality of the previous translations; or
a type of correction incurred by the selected one or more of the profiles while translating the plurality of previous translations.
US16/989,818 2017-08-31 2020-08-10 Data-driven automated selection of profiles of translation professionals for translation tasks Abandoned US20200372218A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/989,818 US20200372218A1 (en) 2017-08-31 2020-08-10 Data-driven automated selection of profiles of translation professionals for translation tasks

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
PCT/US2017/049771 WO2019045746A1 (en) 2017-08-31 2017-08-31 Data-driven automated selection of profiles of translation professionals for translation tasks
US201815782004A 2018-06-06 2018-06-06
US16/989,818 US20200372218A1 (en) 2017-08-31 2020-08-10 Data-driven automated selection of profiles of translation professionals for translation tasks

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US15/782,004 Continuation US10740558B2 (en) 2017-08-31 2017-08-31 Translating a current document using a planned workflow associated with a profile of a translator automatically selected by comparing terms in previously translated documents with terms in the current document
PCT/US2017/049771 Continuation WO2019045746A1 (en) 2017-08-31 2017-08-31 Data-driven automated selection of profiles of translation professionals for translation tasks

Publications (1)

Publication Number Publication Date
US20200372218A1 true US20200372218A1 (en) 2020-11-26

Family

ID=65436031

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/782,004 Active US10740558B2 (en) 2017-08-31 2017-08-31 Translating a current document using a planned workflow associated with a profile of a translator automatically selected by comparing terms in previously translated documents with terms in the current document
US16/989,818 Abandoned US20200372218A1 (en) 2017-08-31 2020-08-10 Data-driven automated selection of profiles of translation professionals for translation tasks

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/782,004 Active US10740558B2 (en) 2017-08-31 2017-08-31 Translating a current document using a planned workflow associated with a profile of a translator automatically selected by comparing terms in previously translated documents with terms in the current document

Country Status (2)

Country Link
US (2) US10740558B2 (en)
WO (1) WO2019045746A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230196027A1 (en) * 2020-08-24 2023-06-22 Unlikely Artificial Intelligence Limited Computer implemented method for the automated analysis or use of data

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11200336B2 (en) * 2018-12-13 2021-12-14 Comcast Cable Communications, Llc User identification system and method for fraud detection
GR20200100429A (en) * 2020-07-21 2022-02-11 Δημητρης Ιωαννη Λιαλιαρης Automated internet translation service

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6526426B1 (en) * 1998-02-23 2003-02-25 David Lakritz Translation management system
US6885985B2 (en) * 2000-12-18 2005-04-26 Xerox Corporation Terminology translation for unaligned comparable corpora using category based translation probabilities
IT1315160B1 (en) * 2000-12-28 2003-02-03 Agostini Organizzazione Srl D SYSTEM AND METHOD OF AUTOMATIC OR SEMI-AUTOMATIC TRANSLATION WITH PREEDITATION FOR THE CORRECTION OF ERRORS.
US20060282256A1 (en) * 2005-06-13 2006-12-14 Werner Anna F Translation method utilizing core ancient roots
US8145472B2 (en) * 2005-12-12 2012-03-27 John Shore Language translation using a hybrid network of human and machine translators
US8548795B2 (en) * 2006-10-10 2013-10-01 Abbyy Software Ltd. Method for translating documents from one language into another using a database of translations, a terminology dictionary, a translation dictionary, and a machine translation system
WO2009014005A1 (en) 2007-07-03 2009-01-29 Sadafumi Toyoda Translation order managing system
US20110282795A1 (en) 2009-09-15 2011-11-17 Albert Kadosh Method and system for intelligent job assignment through an electronic communications network
US8731901B2 (en) * 2009-12-02 2014-05-20 Content Savvy, Inc. Context aware back-transliteration and translation of names and common phrases using web resources
US20110282647A1 (en) * 2010-05-12 2011-11-17 IQTRANSLATE.COM S.r.l. Translation System and Method
US8386235B2 (en) 2010-05-20 2013-02-26 Acosys Limited Collaborative translation system and method
WO2013102052A1 (en) * 2011-12-28 2013-07-04 Bloomberg Finance L.P. System and method for interactive automatic translation
US8990066B2 (en) * 2012-01-31 2015-03-24 Microsoft Corporation Resolving out-of-vocabulary words during machine translation
WO2014062941A1 (en) 2012-10-17 2014-04-24 Proz.Com Method and apparatus to facilitate high-quality translation of texts by multiple translators
US9342505B2 (en) * 2013-06-02 2016-05-17 Jianqing Wu Translation protocol for large discovery projects
US9619464B2 (en) * 2013-10-28 2017-04-11 Smartcat Ltd. Networked language translation system and method
US9444773B2 (en) 2014-07-31 2016-09-13 Mimecast North America, Inc. Automatic translator identification
US10949904B2 (en) 2014-10-04 2021-03-16 Proz.Com Knowledgebase with work products of service providers and processing thereof
US10248653B2 (en) * 2014-11-25 2019-04-02 Lionbridge Technologies, Inc. Information technology platform for language translation and task management
RU2604984C1 (en) 2015-05-25 2016-12-20 Общество с ограниченной ответственностью "Аби Девелопмент" Translating service based on electronic community

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230196027A1 (en) * 2020-08-24 2023-06-22 Unlikely Artificial Intelligence Limited Computer implemented method for the automated analysis or use of data
US11763096B2 (en) 2020-08-24 2023-09-19 Unlikely Artificial Intelligence Limited Computer implemented method for the automated analysis or use of data
US11829725B2 (en) 2020-08-24 2023-11-28 Unlikely Artificial Intelligence Limited Computer implemented method for the automated analysis or use of data

Also Published As

Publication number Publication date
US10740558B2 (en) 2020-08-11
WO2019045746A1 (en) 2019-03-07
US20190065463A1 (en) 2019-02-28

Similar Documents

Publication Publication Date Title
US20200372218A1 (en) Data-driven automated selection of profiles of translation professionals for translation tasks
Eiselen et al. Developing Text Resources for Ten South African Languages.
US20190114320A1 (en) System and method for quality evaluation of collaborative text inputs
US9619464B2 (en) Networked language translation system and method
Zhang et al. AMBERT: A pre-trained language model with multi-grained tokenization
CN110717039A (en) Text classification method and device, electronic equipment and computer-readable storage medium
US11016740B2 (en) Systems and methods for virtual programming by artificial intelligence
Lind et al. Greasing the wheels for comparative communication research: Supervised text classification for multilingual corpora
CN115526171A (en) Intention identification method, device, equipment and computer readable storage medium
RU2546064C1 (en) Distributed system and method of language translation
CN110991193A (en) Translation matrix model selection system based on OpenKiwi
CN112836525A (en) Human-computer interaction based machine translation system and automatic optimization method thereof
CN116560631A (en) Method and device for generating machine learning model code
Alves et al. Evaluating language tools for fifteen EU-official under-resourced languages
Rhouati et al. Sentiment Analysis of French Tweets based on Subjective Lexicon Approach: Evaluation of the use of OpenNLP and CoreNLP Tools.
De Luzi et al. Cicero: An AI-Based Writing Assistant for Legal Users
CN113642337B (en) Data processing method and device, translation method, electronic device, and computer-readable storage medium
Bonnell et al. Rule-based Adornment of Modern Historical Japanese Corpora using Accurate Universal Dependencies.
Burchardt et al. Machine translation at work
Murthy et al. Hiner: A large hindi named entity recognition dataset
Potapova et al. Logistics Translator. Concept Vision on Future Interlanguage Computer Assisted Translation
US20240127617A1 (en) Systems and methods for automated text labeling
RU2667030C1 (en) System and method of intellectual automatic selection of perfomers of translation
Kostis et al. Towards an integrated retrieval system to semantically match cvs, job descriptions and curricula
CN115204182B (en) Method and system for identifying e-book data to be corrected

Legal Events

Date Code Title Description
AS Assignment

Owner name: SMARTCAT LLC, RUSSIAN FEDERATION

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:UKRAINETS, ARTEM;TUZHILINA, ELENA;GUSAKOV, VLADIMIR;AND OTHERS;REEL/FRAME:054171/0739

Effective date: 20171120

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION