WO2014062905A1 - Systèmes et procédés de contrôle de l'avancement de travaux de transformation de contenu basée sur le traitement du langage naturel et/ou l'apprentissage machine - Google Patents

Systèmes et procédés de contrôle de l'avancement de travaux de transformation de contenu basée sur le traitement du langage naturel et/ou l'apprentissage machine Download PDF

Info

Publication number
WO2014062905A1
WO2014062905A1 PCT/US2013/065406 US2013065406W WO2014062905A1 WO 2014062905 A1 WO2014062905 A1 WO 2014062905A1 US 2013065406 W US2013065406 W US 2013065406W WO 2014062905 A1 WO2014062905 A1 WO 2014062905A1
Authority
WO
WIPO (PCT)
Prior art keywords
workers
job
works
work
text
Prior art date
Application number
PCT/US2013/065406
Other languages
English (en)
Inventor
Matthew M.I. ROMAINE
Matthrew James SKYRM
Original Assignee
Gengo Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/054,292 external-priority patent/US20140108103A1/en
Application filed by Gengo Inc. filed Critical Gengo Inc.
Publication of WO2014062905A1 publication Critical patent/WO2014062905A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/08Auctions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations

Definitions

  • At least some embodiments of the present disclosure relate to systems and methods configured to accept out-sourced jobs from customers, present the jobs to workers, accept completed job output, and allow job output retrieval by customers.
  • the Internet provides a communication channel to reach people globally and thus provides access to a pool of diverse workers for labor and expertise.
  • a job outsourcing paradigm termed "crowd-sourcing" typically includes three major parties: customer as the job originator, worker who performs the job submitted by the customer, and rendezvous point for the customer and the worker.
  • the worker pool accessed via Internet includes workers of different skill sets and different skill levels.
  • the job output quality varies and is generally unpredictable.
  • unscrupulous workers may try to game the system (e.g., by claiming to possess a skill set that they do not possess and performing poorly the job that is assigned accordingly).
  • systems and methods are configured to quantify job output expectations with respect to quality, turnaround time, and transaction cost, and uses a just-in-time and best-in-time (JIT-BIT) worker selection process and an iterative two-phase work/evaluation process to ensure that the expectations are met.
  • JIT-BIT just-in-time and best-in-time
  • systems and methods are provided to compute indicators of completeness of the work output of a transformation of text-based content, worker capacity in performing the transformation, and/or the degree of matching between a unit of work and a worker, based on information collected about complexity of works, times and throughput of workers, rating of work outputs and using natural language processing techniques and machine learning techniques, such as language detection, longest common substring, length ratio, document similarity, etc.
  • the indicators are utilized to optimize job pickup and output submission for online crowdsourcing tasks related to transformation of text-based content, such as transcription, translation, proofreading, etc.
  • the disclosure includes methods and apparatuses which perform these methods, including data processing systems which perform these methods, and computer readable media containing instructions which when executed on data processing systems cause the systems to perform these methods.
  • Figure 1 illustrates a system configured to manage workers according to one embodiment.
  • Figure 2 illustrates a system to control quality for translation jobs according to one embodiment.
  • Figure 3 illustrates a system to control expectation for outsource jobs according to one embodiment.
  • Figure 4 illustrates a system configured to provide services according to one embodiment.
  • Figure 5 illustrates a data processing system according to one embodiment.
  • Figure 6 illustrates a system configured to control work progress of text-based content transformation according to one embodiment.
  • Figure 7 shows a method to control work progress according to one embodiment.
  • a system and method is configured to provide translation services.
  • workers perform works or jobs as translators.
  • the systems and methods disclosed herein can be used to provide other services, such as answering questions, providing advices, etc.
  • the present disclosure includes a job managing system that is agnostic to job type, rendezvous system, and worker type.
  • a job may be a request for writing software that meets a specification, or a request for translation of a piece of text, etc.
  • a rendezvous system may accept job request, disseminate job requests, and accept job outputs (completed jobs) using a web service or a mixture of virtual and real-world components, such as a physical bulletin board for job postings with an email/physical address at which job outs (completed jobs) can be sent for inspection.
  • a web service or a mixture of virtual and real-world components, such as a physical bulletin board for job postings with an email/physical address at which job outs (completed jobs) can be sent for inspection.
  • the job managing system in response to a job request from a customer the job managing system is configured to present to the customer with one or more
  • the job managing system Based on the answer(s) selected by the customer, the job managing system is configured to assign predetermined Expectation Metrics (EM) to the job. The job managing system then prompts the customer to accept the Expectation Metrics (EM) assigned to the job. If the customer does not agree with the Expectation Metrics (EM) assigned by the job managing system to the job according to the answer(s) selected by the customer, the customer is prompted to modify his choice(s) for the one or more multiple-choice questions.
  • EM Expectation Metrics
  • the job managing system After the customer accepts the Expectation Metrics (EM) selected according to the answers provided by the customer to the one or more multiple-choice questions, the job managing system is uses a JIT-BIT scheme to determine to whom and in which order to present (show) the job in the work phase, taking into consideration job properties, worker properties, current Expectation Index (El), turnaround time, cost, etc. A qualified and available worker interested in performing the job can pick it up from the job managing system and start working on it.
  • EM Expectation Metrics
  • the worker who performed the evaluation is assigned to work on the job during the new work phase to rectify the job.
  • the JIT-BIT scheme is used to determine to whom and in which order to present the job for further working on the job to the Expectation Index (El) of the job.
  • the customer may be provided with the option to request for a full refund without receiving the job output, or partial payment for the below-par job output.
  • customers, workers, and system operators can opt to view through a user interface (Ul), or a query through Application Programming Interface (API), or to receive push notifications about job status and alerts if they have sufficient privileges associated with their roles.
  • Ul user interface
  • API Application Programming Interface
  • One embodiment of a job submission system includes an electronic database configured to store submitted jobs, which may be submitted via an electronic system having web/application servers configured to accept jobs over Internet protocols, a brick-and-mortar system configured to accept jobs over snail mail.
  • One embodiment of the disclosure includes methods for describing customer job output expectations, where job output expectations (Expectation Metrics, or EM) include quantifiable requirements on quality, turnaround time, and transaction cost.
  • job output expectations include quantifiable requirements on quality, turnaround time, and transaction cost.
  • quality requirements are pre-defined for different job types.
  • Each type of jobs has a set of pre-defined quality metrics. After the type of a job is identified by a customer (e.g., via a multiple-choice question), the quality requirements associated with the job type is used for the job submitted by the customer.
  • Quality metrics are configured to be quantifiable automatically or be objectively evaluated by workers using a computer assisted user interface.
  • One embodiment of the disclosure includes methods for assigning output Expectation Metrics (EM) to jobs. For example, an answer to a single multiple-choice question regarding the intended use of the job output is collected from the customer and used to assign EM to the job submitted by the customer.
  • each selectable answer for the multiple-choice question is associated with a pre-determined set of Expectation Metrics (EM); and the pre-determined set of Expectation Metrics (EM) associated with each selectable answer can be assigned by a domain expert of the job type, who has access to all data on previous similar jobs, in order to derive the pre-determined Expectation Metrics (EM).
  • EM Expectation Metrics
  • One embodiment of the disclosure includes methods for calculating
  • Expectation Index (El) of a job can be is calculated as the ratio of quality metrics that has been met by the job in current state, in relation with the quality metrics specified in the Expectation Matrix (EM) assigned to the job.
  • EM Expectation Matrix
  • Expectation Index (El) calculation is performed in a fully automated way. Methods to calculate the Expectation Index (El) can be implemented programmatically.
  • Expectation Index (El) calculation is not fully automatable; and the system provides a user interface that guides a worker on how to perform evaluation objectively. To achieve objectivity the evaluation user interface is configured to be restrictive on input the worker is allowed to provide. The input provided by the worker is used to calculate the Expectation Index (El) based on a published standard or documentation.
  • the restrictive user interface (Ul) configured to receive input for the evaluation of Expectation Index (El) is implemented through a highlighter where a worker is allowed to amend a portion of the job output and select the category of quality issue in the amended portion. Categories selectable by the worker from pull-down menus to identify quality issues are configured to have an order such that if a part of job output is evaluated and can be ambiguously categorized, it can be default to the first occurring category it can be classified into.
  • Expectation Index (El) calculation may include multiple sub-calculations; and to speed up calculation independent sub-calculations can be processed concurrently.
  • One embodiment of the disclosure includes methods for assigning job properties.
  • the customer submitting a job can tag the job with property tags.
  • an Artificial Intelligent (Al) program can be used to analyze a job and tag the job with property tags in an automated way.
  • workers viewing a job may tag jobs with property tags.
  • One embodiment of the disclosure includes methods for determining the order to present a job to workers.
  • jobs submitted into the system can be presented to workers using a JIT-BIT scheme.
  • the JIT part of JIT-BIT scheme advocates presenting jobs to qualified workers who are immediately available.
  • the BIT part of JIT-BIT scheme advocates presenting jobs to the best worker who can fulfill the job by matching job properties and current job Expectation Index (El) (initially 0) with worker properties.
  • El current job Expectation Index
  • One embodiment of the disclosure includes methods for determining BIT workers to whom a job is to be presented and the order of the BIT workers to whom the job is to be presented.
  • the system considers the compatibility between the job properties and the worker properties to identify BIT workers and to determine the order of the BIT workers for the job.
  • the system is configured to match skill-set requirements and skill-level requirements as closely as possible.
  • the system is configured to consider worker timeliness for that skill-set to ensure requirements on turnaround time can be met.
  • the system is configured to consider worker compensation to ensure the limit on transaction cost is not exceeded.
  • One embodiment of the disclosure includes methods for iteratively processing jobs until terminal conditions are met. For example, a job is processed iteratively between a work phase and an evaluation phase. A worker is assigned to work on the job during the work phase; and a different worker is then assigned to evaluate the output of the worker who worked on the job during the work phase. In each of the work phase and the evaluation phase, a JIT-BIT scheme is applied to determine the candidates to whom the job will be presented to be worked on or evaluated and the order of the candidates for the presentation. The job is processed iteratively through the work-evaluation cycle until a terminal condition is met. Examples of terminal conditions are Expectation Index (El) of the job is above a threshold, a limit on turnaround time is reached, a limit on transaction cost is reached, and the customer or a system operator manually intervened to stop the iteration.
  • El Expectation Index
  • One embodiment of the disclosure includes methods for handling failure to raise Expectation Index (El) to a predetermined threshold (e.g., 1 ). For example, if the system failed to raise the Expectation Index (El) of a job to the predetermined threshold before another terminal condition is satisfied, the system may attempts to 'rectify' the failed expectation by granting the customer an option to accept the job output for a pre-determined fraction of transaction cost or an option to request for a full-refund.
  • a predetermined threshold e.g. 1
  • One embodiment of the disclosure includes methods for assigning worker properties. For example, workers may voluntarily provide inputs to specify their skill-set, skill-level, and compensation rate as profiles of the workers. Each skill-set, skill-level pair has implicit quality and timeliness metrics associated with the pair. Default values (e.g., null) are used if the worker does not provide the input. For example, a worker ratings attribution system updates quality metrics and timeliness metrics of a worker skill-set associated with the job the worker was involved in.
  • One embodiment of the disclosure includes methods to attribute positive ratings to a worker. For example, the quality and timeliness of each job
  • Positive rating attribution is carried out after the customer has approved the job output. Positive rating attribution is awarded to all those involved in a
  • Expectation Index (El) will receive positive rating attributes equivalent to the highest positive attribute rating assigned to the worker within the entire pool of qualified workers.
  • One embodiment of the disclosure includes methods to attribute negative ratings to workers. For example, when a job output is disputed by a customer, a trusted worker is compensated to investigate the job history and determine how negative rating attribution should be apportioned to those involved in the job of the customer. In one embodiment, the judgment of the trusted worker on apportion is final.
  • One embodiment of the disclosure includes methods to collect job payment, where customer pays before collecting job output.
  • One embodiment of the disclosure includes methods to visualize job states and alerts.
  • the current state of a job can be obtained for visualization via a user interface, an application programming interface (API), and/or a push notification mechanism, such as email.
  • API application programming interface
  • a push notification mechanism such as email.
  • Examples of job states include job submitted, work phase started, work phase ended, evaluation phase started, evaluation phase ended, and iteration number.
  • Examples of alerts include job states discussed above and other conditions, such as job not picked up after a predetermined time period (e.g., X seconds), job not picked up after a predetermined time of views (e.g., X views), job abandoned after pick up, job having poor Expectation Index (El) after a predetermined number of time period (e.g., X seconds), job not picked up after a predetermined time of views (e.g., X views), job abandoned after pick up, job having poor Expectation Index (El) after a predetermined number of
  • the conditions for triggering the alerts are customizable.
  • the number X in the examples discussed above can be customized for requesting customized alerts for a specific job.
  • filters on job properties can be applied to customize the presentation of jobs displayed via the user interface (Ul), query results returned via the application programming interface (API), or notifications sent by push mechanisms.
  • Push notification can be turned on or off.
  • One embodiment of the disclosure includes methods for providing feedback on the completion of work. For example, once a job has exited the system, each worker who has worked and/or evaluated the job is allowed to see his/her worker rating attribution and the job rectifications. However, workers are not provided with access to the identities of workers who did rectification.
  • One embodiment of the disclosure includes methods for flagging issues. For example, where workers can flag any rectification in job re-work that is visible to them. When the number of flags on a rectification or on a worker exceeds a threshold, a trusted worker in the corresponding job type will be enlisted to review the worker or rectification to assign appropriate negative ratings to the worker.
  • One embodiment of the disclosure includes methods for detecting
  • data for the top X workers, in terms of payout, completed jobs is cross-referenced Y days, where X and Y are integers.
  • Data for pools of workers in jobs will be analyzed for patterns using publicly available algorithms.
  • One embodiment of the disclosure include methods for preventing job holding as each worker is allowed to only undertake a single job at a time.
  • jobs are still shown to BIT workers, but they cannot pick the jobs up (until they become available to work on the jobs).
  • the two-phase work-evaluation process becomes a three-phase process that includes a work phase, an evaluation phase, and a rectification phase.
  • Figure 1 illustrates a system configured to manage workers according to one embodiment.
  • FIG. 1 the processing of a translation job submitted by a customer involves processing stages such as order, translation, proofread, quality check, delivery and feedback.
  • the system operates by the work of a hierarchical system of lay workers (e.g., standard workers in Figure 1 ), professional workers (e.g., pro workers in Figure 1 ), and trusted experts (e.g., ultra workers in Figure 1 ).
  • a hierarchical system of lay workers e.g., standard workers in Figure 1
  • professional workers e.g., pro workers in Figure 1
  • trusted experts e.g., ultra workers in Figure 1 .
  • Higher-tiered workers manage lower-tiered workers and make data-based decisions about improving quality and efficiency.
  • jobs are classified by various criteria, including type and difficulty. Once the order of a job is placed, the job is made available to a pool of pre-tested workers to work on.
  • the system is configured to allow customers and workers to have open and monitored communication.
  • the privacy of customers and workers is protected with a unique identification number assigned to both. By disabling the ability to view email addresses, communications between customers and workers remain on the system.
  • the worker Prior to a worker committing to a job, the worker is provided with access to preview the job, including notes and instructions given by the customer, and to view the system-determined deadline.
  • the system provides the customer with access to preview the completed job without access to copy or receive it until the customer has approved the completed job.
  • the system allows the customer to ask for clarification, request amendments and corrections, and offer feedback and ratings.
  • Workers are alerted about jobs available for the workers to work on, through the use of algorithmic instant job notifications, hourly email notifications RSS feeds, or by viewing the system dashboard.
  • the available jobs are identified based on the qualifications of the workers.
  • the system requires that workers undergo a series of screening and testing processes before receiving access to the system.
  • a minimum two-stage testing process is used at the onset of the qualification process: machine graded test for screening unskilled and under-qualified applicants, and human graded test for determining the skill-set and skill-level of qualified applicants.
  • test results are based at least in part on the ability of an applicant (e.g., potential worker) to follow the directions outlined prior to the screening and testing process.
  • tests and system entry are turned on/off and open/closed depending job pickup times and the number of qualified workers available to complete all available tasks.
  • the outputs of workers undergo a series of checks and random assessments, machine and/or human-powered, to ensure output is consistent and of high quality. Workers showing signs of underperformance may receive warnings, demotions, or removal from the system. A worker who has scored poorly in previous customer assessments is reviewed more frequently than those who consistently perform well. Data regarding each job ordered via the system (ratings, acceptance, rejection/revision rates, and internal quality ratings) is tracked, analyzed, and used for improving overall system performance.
  • the system is configured to offer services at scale through crowd-sourcing. This structure makes it possible to simplify complex and lengthy jobs making them shorter and more manageable, resulting in faster delivery time.
  • the system benefits a worker by providing the worker the freedom to choose from jobs the worker is qualified to complete during any given time, which removes the need for administration and allows workers to have access to a constant job flow.
  • the system provides customers with a number of different tools when the ordering.
  • the system notifies the customer to solicit more information from the customer.
  • a predetermined time period e.g., one hour
  • the system notifies the customer to solicit more information from the customer. The notificaiton allows the customer to know in a timely manner whether there is an issue with the job submitted by the customer.
  • customers are provided with the option to invite the previous worker(s) to complete reoccurring jobs ordered at a later date.
  • Such previous workers are considered preferred workers.
  • the use of preferred workers allows the jobs of the customers to be completed in the most consistent manner.
  • the preferred-worker approach also compensates and motivates the worker to maintain high-quality output, by providing the worker with access to more work.
  • the system provides customers an interface to submit a glossary of terms when ordering translation jobs.
  • glossary ensures the important words and phrases that appear in the text are consistently translated in the desired manner.
  • the system provides customers with access to cancel a job order, should the customer places an order and then decides to cancel.
  • the customer can cancel a job for a full refund, before a worker completes the job.
  • the system provides workers with a variety of tools and resources.
  • the system is configured to provide a style guide that stipulates language-specific rules that workers are required to follow unless customers specify otherwise. Rules focus on points of the debate to ensure consistent usage throughout each language.
  • the system is configured to provide learning resources, including a series of lessons for beginner workers to help them fine-tune their skills and approach.
  • the system provides translator forums that serve as a platform for workers to seek information.
  • the translator forms provide a central place for information.
  • At least one embodiment of the disclosure provides a system and methods to exploit round-the-clock availability and vast skill-set of the pool of workers while ensuring quality, turnaround time and transaction cost meets job requester expectation.
  • methods are configured to accept job requests from customers over the Internet using a server system.
  • job requests can be accepted via alternative systems such as a 3G cellular communication network, a brick-and-mortar office accepting jobs through snail mail, etc.
  • Expectation Metrics pre-determined for the answers that are selected by the customer are associated with the job as properties.
  • Expectation Metrics includes job quality metrics, and requirements on turnaround time and transaction cost.
  • the multiple-choice questions are simple questions in layman language that are re-worded from complex job-specific quality questions into easier ones. For example, instead of asking a customer who is requesting a
  • the multiple-choice questions serve three purposes: 1 ) to reduce the number of questions asked, 2) to prevent the customer from having to articulate the complex required job output quality by unambiguously defining the quality requirements on behalf the customer, and 3) to prevent customers from keying in unrealistic expectations that the system needs to reject.
  • the multiple-choice questions are designed to be mapped to job quality metrics that are quantifiable.
  • the quantifiable quality metrics enable the system to prove that the quality has been met and reduce dispute.
  • the Expectation Metrics (EM) is presented to the customer for his/her perusal or reference, which is needed when the customer wishes to raise a dispute.
  • the job Upon agreeing that the customer has reviewed the Expectation Metrics (EM), the job undergoes a two-phase work-evaluation iteration.
  • the system employs a JIT-BIT scheme to determine to whom and in which order the jobs will be presented for pick-up.
  • the JIT part of JIT-BIT advocates presenting jobs to workers in order of availability to achieve faster job pick-up.
  • the BIT part of JIT-BIT advocates presenting jobs to the best worker who can fulfill the job by matching job properties and current job El with worker properties to achieve maximizing the quality part of Expectation Metrics (EM) without exceeding transaction cost.
  • EM Expectation Metrics
  • the job Upon completion, the job enters the evaluation phase in which the system calculates the Expectation Index (El) of the job: a measurement of the ratio of quality Expectation Metrics (EM) that has been met.
  • Expectation Index (El) evaluation can be automatic or semi-automatic.
  • a job can be split into different parts or evaluated as a whole by the
  • a worker to evaluate the job is sought by employing the JIT-BIT scheme.
  • the worker evaluates the Expectation Index (El) of the job is provided with a user interface for assistance.
  • the user interface is designed to guide the worker to perform evaluation objectively; and the user interface is designed to restrict worker input during evaluation.
  • the restrictive user interface for evaluation has a highlighter and pull-down options. A worker can use the highlighter to highlight part of the job output that has poor quality and select from the pull-down a quality category that best describes the issue. In the case of multiple possible categorizations, the topmost category is always chosen.
  • the restrictiveness of the user interface for evaluation is designed to: (1 ) standardize categorization of quality issues to ensure evaluation consistency, (2) require the worker to only perform the simple task of highlight-and-categorize to reduce overly-subjective thinking process and to make the evaluation more objective, and (3) concentrates the worker attention on a small part of the job output at a single time to alleviate his/her judgment from being clouded or influenced by previous or overall quality issues, which can lead to more lenient judgment as evaluation progresses.
  • a terminal condition e.g., Expectation Index (El) reaches 1 , turnaround time is reached, or transaction cost is exceeded
  • a negative terminal condition e.g., turnaround time is reach, or transaction cost is exceeded, but the Expectation Index (El) is not close to 1
  • a catch-all fulfillment process is triggered, which allows the customer to accept the job output at a reduced cost or request a full refund.
  • a catch-all fulfillment process is necessary as a last resort.
  • the iterative process enables the system to utilize the worker pool without over-relying on specific workers of skill-set and skill-level that may be high in demand.
  • the chaining of multiple workers to re-work the same job can increase synergy and ideally raise the job output quality to meet the Expectation Metrics (EM) before turnaround time and transaction cost is exceeded.
  • EM Expectation Metrics
  • the JIT-BIT scheme allows lower skilled worker to work on jobs, the output of which will potentially be corrected in the next cycle; and the feedback system enables the worker to learn from re-works on their work.
  • the feedback system does not disclose the identity of workers who did the re-working to avoid workers from holding grudges against people who evaluated his/her job output negatively and then re-worked (rectified) it.
  • positive rating attribution to a worker is performed out automatically and is proportional to how much work/re-work was performed by the worker and inversely proportional to the number of re-work cycles it takes to meet the
  • a domain expert For jobs that exit according to negative terminal conditions, a domain expert is enlisted together with a system expert to study the job re-work history and job workflow to determine the cause. If the cause is not system related, the domain expert will determine how negative ratings are attributed to the works involved in the jobs.
  • the attribution system is designed to encourage workers to perform
  • the benefits are workers getting more positive attribution (fewer workers in worker pool) and less risk of jobs exiting according to negative terminal conditions, which results in negative attribution.
  • the attribution system is also designed to discourage cheating to get positive attribution such as (1 ) making minor modifications earns only a small compensation and can be detected when analyzed by pattern recognition algorithms and (2) making major modifications through merely re-jigging job output makes little sense since evaluating the job as meeting El earns high positive attribution with less work.
  • Figure 2 illustrates a system to control quality for translation jobs according to one embodiment.
  • a job submitted by a customer is configured to receive an answer related to the intended use; and the answer is configured to be selected by the customer from a set of predetermined choices.
  • the answer selected by the customer is pre-associated with job quality metrics, which can be measured during the translation service to determine the Expectation Index (El) of the job output.
  • the job exits the translation service after a terminal condition is satisfied.
  • Figure 3 illustrates a system to control expectation for outsource jobs according to one embodiment.
  • a unit of work to be performed on the crowdsourcing platform includes transformation performed on a text, including but not limited to transcription, translation, or proofreading.
  • a component called Zurich hereafter is configured to provide the workers participating in the crowdsourcing platform with automated tools that combine various technologies, such as Natural Language Processing (NLP), Machine Learning (ML) and statistics that the platform accumulates through the process of transformation of the text.
  • Zurich improves the throughput of the platform and quality of the transformation performed by the platform by aiding the workers to improve their speed as well as their output quality, and by automating task assignment and management, such that the tasks can be complete via workers participating in the crowdsourcing platform to carry out the transformation on the text; and the transformation can be carried out via the crowdsourcing platform on a large scale while minimizing the management efforts.
  • Zurich provides and relies on data handling: data collection, data processing, and data application. Zurich is configured to: better classify the types and features of the transformation; automate the detection of incomplete or bad transformations; and assign transformations to the "best fit" worker at a given time.
  • the crowdsourcing platform usually does not have control over whether or when workers take and actually do work. Workers are not bound by any contract to take specific pieces of work. Workers may choose freely.
  • Zurich uses pre-computed statistical data as well as real-time computed data to compute several metrics and scores, such as:
  • CS complexity score
  • data collection is performed partly through events being triggered from the platform (auto-saving, submission of a unit of work) as well as batch processing and events triggered by certain transaction types directly in the data store.
  • Zurich is configured to use the collected data to generate scores and classifications, which are recalculated in sufficient intervals.
  • a completeness score (CS) for translation transformations can be computed using several techniques, such as Language Detection (LD), Longest Common Substring (LCS), Length Ratio (LR), and/or Document Similarity (DS).
  • LD Language Detection
  • LCS Longest Common Substring
  • LR Length Ratio
  • DS Document Similarity
  • Zurich to determine whether a translation transformation is complete, or how far or close it is from being incomplete or complete, Zurich combines the aforementioned techniques. Using statistical data generated by analyzing prior translation transformations, Zurich can scores a new translation transformation with respect to the statistical data. Examples of scores include:
  • a sore based on Language Detection (LD) configured to indicate what language is the given text written in (may include a reliability score for the given prediction), where the score is normalized against the sample data;
  • LD Language Detection
  • DS Document Similarity
  • the statistical data is computed on a per language pair basis.
  • the averages, standard deviation, etc. are calculated for each language pair separately. Texts are also classified into different classes according to their lengths. The averages, standard deviation, and the scores are computed separately for different length classes within the language pairs.
  • a threshold can be chosen for each of the above discussed scores for the determination of whether a translation transformation is to be considered incomplete. When a transformation is considered incomplete staff will check the classification. Whether or not the classification was correct will be recorded.
  • the data can be used to construct a hypothesis function that weights the different scores differently.
  • the weights are calculated using by applying multivariate linear or polynomial regression to the datasets.
  • the weighted sum of the scores is used as a completeness measurement for transformations.
  • the coefficients can be either adjusted in a batch process that runs at sufficient intervals, or by implementing a neural network to classify whether a translation is complete or not that uses back propagation to adjust its weights directly in each classification process.
  • Zurich is further configured to determine one or more of: Worker Transformation Capacity (WTC), Language Pair Service (LPS) capacity, and Content based Best-Possible-Fit (BPF) Worker Score.
  • WTC Worker Transformation Capacity
  • LPS Language Pair Service
  • BPF Best-Possible-Fit
  • WTC Worker Transformation Capacity
  • WTC profiles or fingerprints the work time habits of a worker per hour and day of a week.
  • the profile includes not only the information on whether or not a worker statistically works on during a particular hour on a particular day in a week (e.g., a Monday at 7:00), but also information on the amount of work that had been done on average during the particular hour on the particular day in the week.
  • WTC Worker Transformation Capacity
  • WTC gives a bias towards recent work activity to gradually phase out workers that had a high throughput prior to a predetermined time period (e.g., 6 months ago) but have not logged into the platform since then.
  • the Language Pair Service (LPS) capacity is determined using the capacity scores for each worker.
  • Zurich is configured to use Document Similarity (DS), and complexity score (CS) to determine a job preference score indicating whether a worker prefers certain types of content or complexity.
  • the job preference score can be augmented to form a worker profile by adding manual tagging by customers and workers as well as automated Natural Language Processing (NLP) analytics like (e.g., n-gram based) collocation extraction and extraction of hapax legomena (hapaxes).
  • NLP Natural Language Processing
  • Figure 6 illustrates a system configured to control work progress of text-based content transformation according to one embodiment.
  • the crowdsourcing platform usually has two primary points of interaction with a worker: a pickup interface and a submission interface.
  • the pickup interface allows a free worker (who currently does not have an assigned work from the crowdsourcing platform) to pick up a unit of work. While the worker has the unit of work, the worker is not considered free to pick up another unit of work until the worker submits the output for the unit of work.
  • the submission interface allows a worker to submit the output for the unit of work performed by worker.
  • the pickup process starts when a unit of work becomes available on the platform and ends when the unit has been picked up by a worker.
  • BPF Best-Possible-Fit
  • the submission process is roughly in the timespan from, e.g., one hour before submission (or timeout by hitting a working time limit imposed by the platform) and the time after submission until approval by the customer.
  • Zurich is configured to provide following optimizations of the submission process:
  • Figure 7 shows a method to control work progress according to one embodiment.
  • the method shown in Figure 7 can be implemented on the translation service platform illustrated in Figure 2, using data processing systems and devices illustrated in Figures 4 and 5.
  • a computing apparatus is configured to: collect (731 ) data about works to transform text-based content; collect (733) information about workers performing the works; collect (735) ratings of work outputs provided by the workers; generate (737) indicators of capacity of workers and indicators of degrees of matching between works and workers using the collected data, information and ratings; assign (739) works to workers based on the indicators of capacity of workers and the indicators of degrees of matching between works and workers; generate (741 ) indicators of completeness of work outputs using natural language processing and machine learning; and automate (743) quality assurance check and work time limit management using the indicators of completeness.
  • the operations as illustrated in Figures 2 and 3 are configured to be performed on a computing apparatus, such as the server device (303) illustrated in Figure 4.
  • Figure 4 illustrates a system configured to provide services according to one embodiment.
  • the operations discussed above are implemented at least in part in a service device (303), which can be implemented using one or more data processing systems as illustrated in Figure 5.
  • a plurality of users devices e.g., 305, 305, 309 are coupled to the service device (303) via the network, which includes a local area network, a wireless communications network, a wide area network, an intranet, and/or the Internet, etc.
  • the user device can be one of various endpoints of the network (301 ), such as a personal computer, a mobile computing device, a notebook computer, a netbook, a personal media player, a personal digital assistant, a tablet computer, a mobile phone, a smart phone, a cellular phone, etc.
  • the user device e.g., 305) can be implemented as a data processing system as illustrated in Figure 5, with more or fewer components.
  • At least some of the components of the system disclosed herein can be implemented as a computer system, such as a data processing system illustrated in Figure 5, with more or fewer components. Some of the components may share hardware or be combined on a computer system. In one embodiment, a network of computers can be used to implement one or more of the components.
  • data discussed in the present disclosure can be stored in storage devices of one or more computers accessible to the components discussed herein.
  • the storage devices can be implemented as a data processing system illustrated in Figure 5, with more or fewer components.
  • Figure 5 illustrates a data processing system according to one embodiment. While Figure 5 illustrates various parts of a computer system, it is not intended to represent any particular architecture or manner of interconnecting the parts. One embodiment may use other systems that have fewer or more components than those shown in Figure 5.
  • the data processing system (310) includes an inter-connect (31 1 ) (e.g., bus and system core logic), which interconnects a microprocessor(s) (313) and memory (314).
  • the microprocessor (313) is coupled to cache memory (319) in the example of Figure 5.
  • the inter-connect (31 1 ) interconnects the
  • I/O devices (315) may include a display device and/or peripheral devices, such as mice, keyboards, modems, network interfaces, printers, scanners, video cameras and other devices known in the art.
  • I/O devices (315) may include a display device and/or peripheral devices, such as mice, keyboards, modems, network interfaces, printers, scanners, video cameras and other devices known in the art.
  • the data processing system is a server system, some of the I/O devices (315), such as printers, scanners, mice, and/or keyboards, are optional.
  • the inter-connect (31 1 ) includes one or more buses connected to one another through various bridges, controllers and/or adapters.
  • the I/O controllers (317) include a USB (Universal Serial Bus) adapter for controlling USB peripherals, and/or an IEEE-1394 bus adapter for controlling IEEE-1394 peripherals.
  • USB Universal Serial Bus
  • IEEE-1394 IEEE-1394
  • the memory (314) includes one or more of: ROM (Read Only Memory), volatile RAM (Random Access Memory), and non-volatile memory, such as hard drive, flash memory, etc.
  • ROM Read Only Memory
  • RAM Random Access Memory
  • non-volatile memory such as hard drive, flash memory, etc.
  • Volatile RAM is typically implemented as dynamic RAM (DRAM) which requires power continually in order to refresh or maintain the data in the memory.
  • DRAM dynamic RAM
  • Non-volatile memory is typically a magnetic hard drive, a magnetic optical drive, an optical drive (e.g., a DVD RAM), or other type of memory system which maintains data even after power is removed from the system.
  • the non-volatile memory may also be a random access memory.
  • the non-volatile memory can be a local device coupled directly to the rest of the components in the data processing system.
  • a non-volatile memory that is remote from the system such as a network storage device coupled to the data processing system through a network interface such as a modem or Ethernet interface, can also be used.
  • the functions and operations as described here can be implemented using special purpose circuitry, with or without software instructions, such as using Application-Specific Integrated Circuit (ASIC) or Field-Programmable Gate Array (FPGA).
  • ASIC Application-Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • Embodiments can be implemented using hardwired circuitry without software instructions, or in combination with software instructions. Thus, the techniques are limited neither to any specific combination of hardware circuitry and software, nor to any particular source for the instructions executed by the data processing system.
  • At least some aspects disclosed can be embodied, at least in part, in software. That is, the techniques may be carried out in a computer system or other data processing system in response to its processor, such as a microprocessor, executing sequences of instructions contained in a memory, such as ROM, volatile RAM, non-volatile memory, cache or a remote storage device.
  • processor such as a microprocessor
  • a memory such as ROM, volatile RAM, non-volatile memory, cache or a remote storage device.
  • Routines executed to implement the embodiments may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as "computer programs.”
  • the computer programs typically include one or more instructions set at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a computer, cause the computer to perform operations necessary to execute elements involving the various aspects.
  • a machine readable medium can be used to store software and data which when executed by a data processing system causes the system to perform various methods.
  • the executable software and data may be stored in various places including for example ROM, volatile RAM, non-volatile memory and/or cache. Portions of this software and/or data may be stored in any one of these storage devices.
  • the data and instructions can be obtained from centralized servers or peer to peer networks. Different portions of the data and instructions can be obtained from different centralized servers and/or peer to peer networks at different times and in different communication sessions or in a same communication session.
  • the data and instructions can be obtained in entirety prior to the execution of the applications. Alternatively, portions of the data and instructions can be obtained dynamically, just in time, when needed for execution. Thus, it is not required that the data and instructions be on a machine readable medium in entirety at a particular instance of time.
  • Examples of tangible, non-transitory computer-readable media include but are not limited to recordable and non-recordable type media such as volatile and non-volatile memory devices, read only memory (ROM), random access memory (RAM), flash memory devices, floppy and other removable disks, magnetic disk storage media, optical storage media (e.g., Compact Disk Read-Only Memory (CD ROMS), Digital Versatile Disks (DVDs), etc.), among others.
  • the computer-readable media may store the instructions.
  • the instructions may also be embodied in digital and analog communication links for electrical, optical, acoustical or other forms of propagated signals, such as carrier waves, infrared signals, digital signals, etc.
  • propagated signals such as carrier waves, infrared signals, digital signals, etc. are not tangible machine readable medium and are not configured to store instructions.
  • a machine readable medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.).
  • a machine e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.
  • hardwired circuitry may be used in combination with software instructions to implement the techniques.
  • the techniques are neither limited to any specific combination of hardware circuitry and software nor to any particular source for the instructions executed by the data processing system.
  • references to "one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure.
  • the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, and are not necessarily all referring to separate or alternative embodiments mutually exclusive of other embodiments.
  • various features are described which may be exhibited by one embodiment and not by others.
  • various requirements are described which may be requirements for one embodiment but not for other embodiments. Unless excluded by explicit description and/or apparent incompatibility, any combination of various features described in this description is also included here. For example, the features described above in connection with “in one embodiment” or “in some embodiments” can be all optionally included in one

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne des systèmes et des procédés pour calculer des indicateurs d'achèvement de travaux de transformation de contenu textuel, la capacité d'un travailleur à assurer la transformation, et/ou le degré de correspondance entre une unité de travail et un travailleur en fonction d'informations collectées sur la complexité des travaux, les heures et le rendement des travailleurs, l'évaluation des travaux produits et l'utilisation de techniques de traitement du langage naturel et de techniques d'apprentissage machine, telles que la détection de la langue, la sous-chaîne commune la plus longue, le rapport de longueur, la similitude des documents, etc. Les indicateurs sont utilisés pour optimiser la collecte du travail et la soumission en sortie pour les tâches d'externalisation ouverte en ligne relatives à la transformation de contenu textuel, telle que la transcription, la traduction, la correction d'épreuves, etc.
PCT/US2013/065406 2012-10-17 2013-10-17 Systèmes et procédés de contrôle de l'avancement de travaux de transformation de contenu basée sur le traitement du langage naturel et/ou l'apprentissage machine WO2014062905A1 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201261715207P 2012-10-17 2012-10-17
US201261715204P 2012-10-17 2012-10-17
US61/715,207 2012-10-17
US61/715,204 2012-10-17
US14/054,292 2013-10-15
US14/054,292 US20140108103A1 (en) 2012-10-17 2013-10-15 Systems and methods to control work progress for content transformation based on natural language processing and/or machine learning

Publications (1)

Publication Number Publication Date
WO2014062905A1 true WO2014062905A1 (fr) 2014-04-24

Family

ID=50488745

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/065406 WO2014062905A1 (fr) 2012-10-17 2013-10-17 Systèmes et procédés de contrôle de l'avancement de travaux de transformation de contenu basée sur le traitement du langage naturel et/ou l'apprentissage machine

Country Status (1)

Country Link
WO (1) WO2014062905A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020026338A1 (en) * 1999-06-03 2002-02-28 Hans Max Theodore Bukow Method and apparatus for matching projects and workers
EP1489523A2 (fr) * 2003-06-20 2004-12-22 Microsoft Corporation Traduction automatique adaptive
US6859523B1 (en) * 2001-11-14 2005-02-22 Qgenisys, Inc. Universal task management system, method and product for automatically managing remote workers, including assessing the work product and workers
US20120072253A1 (en) * 2010-09-21 2012-03-22 Servio, Inc. Outsourcing tasks via a network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020026338A1 (en) * 1999-06-03 2002-02-28 Hans Max Theodore Bukow Method and apparatus for matching projects and workers
US6859523B1 (en) * 2001-11-14 2005-02-22 Qgenisys, Inc. Universal task management system, method and product for automatically managing remote workers, including assessing the work product and workers
EP1489523A2 (fr) * 2003-06-20 2004-12-22 Microsoft Corporation Traduction automatique adaptive
US20120072253A1 (en) * 2010-09-21 2012-03-22 Servio, Inc. Outsourcing tasks via a network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
OMAR F. ZAIDAN ET AL.: "Crowdsourcing Translation: Professional Quality from Non-Professionals", PROCEEDINGS OF THE 49TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 19 June 2011 (2011-06-19), pages 1220 - 1229 *

Similar Documents

Publication Publication Date Title
US20140108103A1 (en) Systems and methods to control work progress for content transformation based on natural language processing and/or machine learning
US11868941B2 (en) Task-level answer confidence estimation for worker assessment
US20190042999A1 (en) Systems and methods for optimizing parallel task completion
US11514511B2 (en) Autonomous bidder solicitation and selection system
US11164152B2 (en) Autonomous procurement system
US9600788B2 (en) Crowdsourcing directory system
US20200394592A1 (en) Generating a machine-learned model for scoring skills based on feedback from job posters
JP2018067286A (ja) モデル妥当性確認システムおよび方法
US10410626B1 (en) Progressive classifier
US10265613B2 (en) Conducting challenge events
US20170132555A1 (en) Semi-automated machine learning process to match work to worker
US20110106711A1 (en) Decision support system and method for distributed decision making for optimal human resource deployment
Anderson et al. Artificial Intelligence for Business: A Roadmap for getting started with AI
CN117083622A (zh) 项目成功概率计算系统、项目成功概率计算方法以及项目成功概率计算程序
Abhinav et al. CrowdAssist: A multidimensional decision support system for crowd workers
WO2014062905A1 (fr) Systèmes et procédés de contrôle de l'avancement de travaux de transformation de contenu basée sur le traitement du langage naturel et/ou l'apprentissage machine
Griesemer A field study of the impact of ISO 9001 on software development in the United States
Tegnér et al. Exploring AI in Swedish E-com-merce

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13846601

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13846601

Country of ref document: EP

Kind code of ref document: A1