KR102244699B1

KR102244699B1 - Method for labeling emotion using sentence similarity of crowdsourcing based project for artificial intelligence training data generation

Info

Publication number: KR102244699B1
Application number: KR1020200072034A
Authority: KR
Inventors: 박민우; 조창희
Original assignee: 주식회사 크라우드웍스
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2021-04-27

Abstract

The present invention relates to a method for labeling an emotion using sentence similarity of a crowdsourcing-based project for artificial intelligence training data generation. The method includes: a step of requesting labeling work execution with respect to an emotion of a source text by assigning labeling works of a crowdsourcing-based project (hereinafter, project) to workers; a step of receiving emotion label input as labeling work results from the workers; a step of reconfirming an existing labeling work result by determining the presence or absence of a change in labeling criteria with regard to a worker who has executed at least a certain amount of labeling work (hereinafter, target worker); a step of requesting inspection execution by assigning labeling work results including reconfirmation results to inspectors; and a step of receiving the input of inspection results regarding the labeling work results as inspection pass or return from the inspectors.

Description

Emotion labeling method using sentence similarity of crowdsourcing-based project for generating artificial intelligence learning data {METHOD FOR LABELING EMOTION USING SENTENCE SIMILARITY OF CROWDSOURCING BASED PROJECT FOR ARTIFICIAL INTELLIGENCE TRAINING DATA GENERATION}

본 발명은 인공지능 학습데이터 생성을 위한 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법에 관한 것이다.The present invention relates to an emotion labeling method using sentence similarity of a crowdsourcing-based project for generating artificial intelligence learning data.

최근, 기업 활동의 일부 과정에 일반 대중을 참여시키는 크라우드소싱 기반으로 많은 양의 데이터를 수집 및 가공하는 기업들이 늘고 있다. 즉, 기업은 하나의 프로젝트를 오픈하여 일반 대중, 즉 작업자가 해당 프로젝트에 참여하게 함으로써, 작업자에 의해 완료된 작업 결과를 통해 필요한 정보를 수집하게 된다.Recently, more and more companies collect and process large amounts of data on a crowdsourcing basis that engages the general public in some process of corporate activities. In other words, by opening a project and allowing the general public, that is, workers to participate in the project, necessary information is collected through the results of work completed by the workers.

구체적으로, 하나의 프로젝트가 오픈되면, 복수의 작업자 각각에게 복수의 작업이 배정된다. 각각의 작업자는 배정받은 복수의 작업을 수행하고, 작업 결과를 제공한다. 이후, 복수의 검수자 각각에게 작업 결과에 대한 복수의 검수 작업이 배정되고, 각각의 검수자는 배정받은 복수의 검수 작업을 수행하게 된다.Specifically, when one project is opened, a plurality of tasks are assigned to each of the plurality of workers. Each worker performs a plurality of tasks assigned to it, and provides the task results. Thereafter, a plurality of inspection tasks for the work results are assigned to each of the plurality of inspectors, and each inspector performs the assigned plurality of inspection tasks.

크라우드소싱 기반의 프로젝트는 다양한 유형이 있으며, 유형별로 작업자들에게 다양한 작업 수행을 요구한다. 그 중, 작업자의 주관적인 판단을 요구하는 프로젝트, 예를 들어 감정 라벨링 작업을 요구하는 프로젝트가 있다.There are many types of crowdsourcing-based projects, and each type requires workers to perform various tasks. Among them, there is a project that requires subjective judgment of a worker, for example, a project that requires emotional labeling work.

감정 라벨링 작업 수행 시, 작업자는 텍스트를 확인하고 주관적인 판단 기준에 따라 텍스트가 표현하는 감정을 라벨링하게 되는데, 감정 라벨링 작업을 반복하여 수행하다 보면, 작업자의 판단 기준이 변경되는 상황이 발생하게 된다. 이로 인해, 유사한 텍스트에 상이한 감정이 라벨링되게 되어, 감정 라벨링 작업 결과의 정확도가 낮아지게 된다.When performing the emotion labeling operation, the operator checks the text and labels the emotion expressed by the text according to the subjective determination criteria. If the emotion labeling operation is repeatedly performed, a situation in which the determination standard of the worker is changed occurs. As a result, different emotions are labeled on similar texts, so that the accuracy of the emotion labeling operation result is lowered.

그러나, 현재에는 프로젝트 진행 중 작업자의 판단 기준이 변경되는 것을 자동으로 판단할 수 있는 방법이 없고, 검수자의 검수에 의해서만 작업자가 적절한 판단 기준을 가지고 올바르게 감정 라벨링 작업을 수행하고 있는지를 확인할 수 있을 뿐이다.However, at present, there is no way to automatically determine that the operator's judgment criteria are changed while the project is in progress, and only by the inspector's inspection, it is possible to check whether the operator has the appropriate judgment criteria and is correctly performing the emotional labeling work. .

즉, 작업 결과가 검수되기 전까지는 작업자의 판단 기준이 변경된 것을 알 수 없기 때문에, 검수자는 잘못된 감정이 라벨링된 작업 결과에 대해서도 검수를 수행해야 하고, 이는 검수자의 검수 효율을 떨어뜨리게 된다. That is, since it is not known that the operator's judgment criteria have changed until the work result is inspected, the inspector must also perform inspection on the job result labeled with the wrong emotion, which lowers the inspector's inspection efficiency.

공개특허공보 제10-2014-0095956호, 2014.08.04.Unexamined Patent Publication No. 10-2014-0095956, 2014.08.04.

본 발명이 해결하고자 하는 과제는 인공지능 학습데이터 생성을 위한 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법을 제공하는 것이다.The problem to be solved by the present invention is to provide an emotion labeling method using sentence similarity of a crowdsourcing-based project for generating artificial intelligence learning data.

다만, 본 발명이 해결하고자 하는 과제는 상기된 바와 같은 과제로 한정되지 않으며, 또다른 과제들이 존재할 수 있다.However, the problem to be solved by the present invention is not limited to the problem as described above, and other problems may exist.

상술한 과제를 해결하기 위한 본 발명의 일 면에 따른 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법은, 크라우드소싱 기반 프로젝트(이하, 프로젝트)의 복수의 라벨링 작업을 복수의 작업자에게 배정하여 소스 텍스트의 감정에 대한 라벨링 작업 수행을 요청하는 단계, 상기 복수의 작업자로부터 복수의 라벨링 작업 결과로서 감정 라벨을 입력받는 단계, 라벨링 작업을 소정 수량 이상 수행한 작업자(이하, 대상 작업자)를 대상으로, 라벨 부여 기준의 변경 여부를 판단하여, 기존 라벨링 작업 결과에 대한 재확인 프로세스를 진행하는 단계, 재확인 결과를 포함하는 복수의 라벨링 작업 결과를 복수의 검수자에게 배정하여 검수 수행을 요청하는 단계 및 상기 복수의 검수자로부터 상기 복수의 라벨링 작업 결과에 대한 복수의 검수 결과를 검수 통과 또는 반려로 입력받는 단계를 포함하고, 상기 기존 라벨링 작업 결과에 대한 재확인 프로세스를 진행하는 단계는, 상기 대상 작업자의 복수의 기존 라벨링 작업 결과 중에서 상기 대상 작업자의 신규 라벨링 작업 결과와 동일한 감정 라벨이 입력된 적어도 하나의 기존 라벨링 작업 결과를 추출하는 단계와, 상기 신규 라벨링 작업 결과의 소스 텍스트와 상기 추출된 적어도 하나의 기존 라벨링 작업 결과의 소스 텍스트의 문장 유사도를 측정하는 단계와, 상기 문장 유사도의 측정 결과에 따라, 상기 대상 작업자의 라벨 부여 기준의 변경 여부를 판단하는 단계와, 상기 라벨 부여 기준의 변경 여부의 판단 결과에 따라, 상기 대상 작업자에게 상기 추출된 적어도 하나의 기존 라벨링 작업 결과에 대한 재확인 수행을 요청하는 단계와, 상기 대상 작업자로부터 상기 추출된 적어도 하나의 기존 라벨링 작업 결과에 대한 재확인 결과를 입력받는 단계를 포함한다.In an emotional labeling method using sentence similarity of a crowdsourcing-based project according to an aspect of the present invention for solving the above-described problem, a plurality of labeling tasks of a crowdsourcing-based project (hereinafter, a project) are assigned to a plurality of workers, Requesting to perform a labeling operation on the emotion of text, receiving an emotion label as a result of a plurality of labeling operations from the plurality of workers, and a worker (hereinafter referred to as a target worker) who has performed the labeling operation in a predetermined quantity or more, Determining whether or not to change the labeling criteria, performing a reconfirmation process for the existing labeling results, assigning a plurality of labeling results including the reconfirmation results to a plurality of inspectors and requesting to perform the inspection, and the plurality of Including the step of receiving a plurality of inspection results for the plurality of labeling work results from the inspector as inspection pass or rejection, and the step of reconfirming the existing labeling work results, the step of performing a plurality of existing labeling of the target worker Extracting at least one existing labeling operation result in which the same emotion label as the result of the new labeling operation of the target worker is input from among the work results, and the source text of the new labeling operation result and the extracted at least one existing labeling operation result Measuring the sentence similarity of the source text of the, according to the measurement result of the sentence similarity, determining whether to change the labeling criterion of the target worker, and according to the determination result of the change of the labeling criterion, And requesting the target worker to perform reconfirmation of the extracted at least one existing labeling work result, and receiving a reconfirmation result of the extracted at least one existing labeling work result from the target worker.

본 발명의 일부 실시예에서, 상기 대상 작업자의 라벨 부여 기준의 변경 여부를 판단하는 단계는, 상기 신규 라벨링 작업 결과의 소스 텍스트와 상기 추출된 적어도 하나의 기존 라벨링 작업 결과의 소스 텍스트의 평균 문장 유사도를 산출하는 단계와, 상기 평균 문장 유사도가 상기 소정의 기준 값 이하인 경우, 상기 대상 작업자의 라벨 부여 기준이 변경된 것으로 판단하는 단계를 포함할 수 있다.In some embodiments of the present invention, the determining whether the target worker's labeling criterion is changed may include an average sentence similarity between the source text of the new labeling operation result and the extracted source text of the extracted at least one existing labeling operation result. And determining that the labeling criterion of the target worker is changed when the average sentence similarity is less than or equal to the predetermined reference value.

본 발명의 일부 실시예에서, 상기 대상 작업자로부터 상기 추출된 적어도 하나의 기존 라벨링 작업 결과에 대한 재확인 결과를 입력받는 단계는, 상기 대상 작업자로부터 상기 추출된 기존 라벨링 작업 결과에 대한 재확인 결과로서 수정된 감정 라벨을 입력받는 것일 수 있다.In some embodiments of the present invention, the step of receiving a reconfirmation result of the extracted at least one existing labeling operation result from the target operator is modified as a reconfirmation result of the extracted existing labeling operation result from the target operator. It may be to receive an emotion label.

본 발명의 일부 실시예에서, 상기 방법은, 상기 재확인 결과에 대한 검수 결과에 따라, 상기 추출된 기존 라벨링 작업 결과에 대한 재확인을 수행한 작업자에게 소정의 보상을 제공하는 단계를 더 포함하고, 상기 재확인 결과에 대한 검수 결과에 따라, 상기 추출된 기존 라벨링 작업 결과에 대한 재확인을 수행한 작업자에게 소정의 보상을 제공하는 단계는, 수정된 감정 라벨이 입력된 재확인 결과에 대한 검수 결과가 검수 통과인 경우에 대해서 상기 작업자에게 소정의 보상을 제공하는 것일 수 있다.In some embodiments of the present invention, the method further comprises the step of providing a predetermined compensation to an operator who has performed reconfirmation of the extracted existing labeling operation result, according to an inspection result of the reconfirmation result, The step of providing a predetermined compensation to the operator who has performed reconfirmation of the extracted existing labeling work result according to the inspection result of the reconfirmation result is, in which the reconfirmation result of the reconfirmation result inputted with the modified emotion label is passed the inspection. In the case, it may be to provide a predetermined compensation to the worker.

본 발명의 일부 실시예에서, 상기 대상 작업자로부터 상기 추출된 적어도 하나의 기존 라벨링 작업 결과에 대한 재확인 결과를 입력받는 단계는, 상기 대상 작업자로부터 상기 추출된 기존 라벨링 작업 결과에 대한 재확인 결과로서 동일한 감정 라벨을 입력받는 것일 수 있다.In some embodiments of the present invention, the step of receiving a reconfirmation result of the extracted at least one existing labeling job result from the target worker is the same as a result of reconfirmation of the extracted existing labeling job result from the target worker. It may be to receive a label.

본 발명의 일부 실시예에서, 상기 방법은, 상기 재확인 결과에 대한 검수 결과에 따라, 상기 추출된 기존 라벨링 작업 결과에 대한 재확인을 수행한 작업자에게 소정의 패널티를 제공하는 단계를 더 포함하고, 상기 재확인 결과에 대한 검수 결과에 따라, 상기 추출된 기존 라벨링 작업 결과에 대한 재확인을 수행한 작업자에게 소정의 패널티를 제공하는 단계는, 동일한 감정 라벨이 입력된 재확인 결과에 대한 검수 결과가 반려인 경우에 대해서 상기 작업자에게 소정의 패널티를 제공하는 것일 수 있다.In some embodiments of the present invention, the method further comprises the step of providing a predetermined penalty to an operator who has performed reconfirmation of the extracted existing labeling operation result, according to an inspection result of the reconfirmation result, and the The step of providing a predetermined penalty to the operator who has performed reconfirmation of the extracted existing labeling work result according to the inspection result of the reconfirmation result is when the reconfirmation result of the reconfirmation result inputted with the same emotion label is rejected. For this, it may be to provide a predetermined penalty to the worker.

본 발명의 일부 실시예에서, 상기 방법은, 상기 프로젝트의 오픈 전, 상기 프로젝트의 복수의 라벨링 작업에 상응하는 복수의 소스 텍스트를 복수의 벡터로 임베딩하는 단계를 더 포함하고, 상기 문장 유사도를 측정하는 단계는, 상기 신규 라벨링 작업 결과의 소스 텍스트의 벡터와 상기 추출된 적어도 하나의 기존 라벨링 작업 결과의 소스 텍스트의 벡터를 이용하여 상기 문장 유사도를 측정할 수 있다.In some embodiments of the present invention, the method further comprises embedding a plurality of source texts corresponding to a plurality of labeling tasks of the project into a plurality of vectors before opening the project, and measuring the sentence similarity. In the step of performing, the sentence similarity may be measured using a vector of the source text of the result of the new labeling operation and the vector of the extracted source text of the result of the at least one existing labeling operation.

본 발명의 일부 실시예에서, 상기 복수의 소스 텍스트를 복수의 벡터로 임베딩하는 단계는, Sent2Vec 알고리즘을 사용하는 것일 수 있다.In some embodiments of the present invention, the step of embedding the plurality of source texts into a plurality of vectors may be using the Sent2Vec algorithm.

본 발명의 일부 실시예에서, 상기 소정 수량은 상기 프로젝트의 각 작업자의 최대 작업 수행 가능 수량의 소정의 비율에 상응하는 수량일 수 있다.In some embodiments of the present invention, the predetermined quantity may be a quantity corresponding to a predetermined ratio of the maximum workable quantity of each worker of the project.

상술한 과제를 해결하기 위한 본 발명의 다른 면에 따른 컴퓨터 프로그램은, 하드웨어인 컴퓨터와 결합되어 상기 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법을 실행하며, 컴퓨터 판독가능 기록매체에 저장된다.A computer program according to another aspect of the present invention for solving the above-described problem is combined with a computer that is hardware to execute an emotion labeling method using sentence similarity of the crowdsourcing-based project, and is stored in a computer-readable recording medium.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the present invention are included in the detailed description and drawings.

상술한 본 발명에 의하면, 작업자로부터 신규 라벨링 작업 결과를 입력받을 때마다, 신규 라벨링 작업 결과와 동일한 감정 라벨이 입력된 기존 라벨링 작업 결과를 추출하고, 신규 라벨링 작업 결과와 기존 라벨링 작업 결과의 소스 텍스트의 평균 문장 유사도를 산출하여 평균 문장 유사도가 기준 값 이하인지를 확인함으로써, 작업자의 라벨 부여 기준이 변경된 것을 자동으로 판단할 수 있다.According to the present invention described above, whenever a new labeling operation result is input from an operator, the existing labeling operation result in which the same emotional label as the new labeling operation result is input is extracted, and the source text of the new labeling operation result and the existing labeling operation result By calculating the average sentence similarity of and checking whether the average sentence similarity is less than or equal to the reference value, it is possible to automatically determine that the operator's labeling criterion has changed.

프로젝트 진행 중에 작업자의 라벨 부여 기준이 변경된 것을 자동으로 판단하여, 잘못 수행된 기존 라벨링 작업 결과에 대해서 검수가 진행되기 전에 해당 작업자에게 먼저 재확인할 수 있게 함으로써, 검수자는 적절하지 않은 라벨 부여 기준에 따라 잘못 수행되어 어차피 반려될 라벨링 작업 결과를 검수할 필요가 없어, 검수자의 검수 효율이 높아진다.By automatically determining that the operator's labeling criteria have changed during the project, the results of the incorrectly performed existing labeling work can be reconfirmed to the relevant operator before proceeding, so that the inspector can follow the improper labeling criteria. There is no need to inspect the result of labeling work that is performed incorrectly and will be returned anyway, increasing the inspection efficiency of the inspector.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시예에 따른 크라우드소싱 서비스의 개념도이다.
도 2는 본 발명의 일 실시예에 따른 크라우드소싱 기반의 프로젝트의 진행 프로세스를 설명하기 위한 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법의 순서도이다.
도 4는 본 발명의 일 실시예에 따른 소정 수량 이상의 라벨링 작업 결과를 대상으로 하여 재작업 프로세스를 진행하는 것을 설명하기 위한 예시도이다.
도 5는 도 3의 단계 S130의 구체적인 순서도이다.
도 6은 본 발명의 일 실시예에 따른 신규 라벨링 작업 결과와 동일한 감정 라벨이 입력된 기존 라벨링 작업 결과를 추출하는 것을 설명하기 위한 예시도이다.
도 7은 본 발명의 일 실시예에 따른 신규 라벨링 작업 결과와 기존 라벨링 작업 결과의 소스 텍스트 간의 문장 유사도를 측정하는 것을 설명하기 위한 예시도이다.
도 8은 본 발명의 일 실시예에 따른 라벨링 작업 결과의 검수 배정을 설명하기 위한 예시도이다.
도 9는 본 발명의 일 실시예에 따른 재확인 결과로서 수정된 감정 라벨이 입력되는 경우를 설명하기 위한 예시도이다.
도 10은 본 발명의 일 실시예에 따른 재확인 결과로서 동일한 감정 라벨이 입력되는 경우를 설명하기 위한 예시도이다.
도 11은 본 발명의 일 실시예에 따른 크라우드소싱 기반 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 장치를 설명하기 위한 도면이다.1 is a conceptual diagram of a crowdsourcing service according to an embodiment of the present invention.
2 is a flowchart illustrating a crowdsourcing-based project progress process according to an embodiment of the present invention.
3 is a flowchart of an emotion labeling method using sentence similarity of a crowdsourcing-based project according to an embodiment of the present invention.
4 is an exemplary diagram for explaining performing a rework process targeting a labeling operation result of a predetermined quantity or more according to an embodiment of the present invention.
5 is a detailed flowchart of step S130 of FIG. 3.
6 is an exemplary view for explaining extracting an existing labeling operation result inputted with the same emotion label as the new labeling operation result according to an embodiment of the present invention.
7 is an exemplary diagram for explaining measuring a sentence similarity between a result of a new labeling operation and a source text of a result of an existing labeling operation according to an embodiment of the present invention.
8 is an exemplary view for explaining the inspection and assignment of labeling work results according to an embodiment of the present invention.
9 is an exemplary view for explaining a case where a modified emotion label is input as a result of reconfirmation according to an embodiment of the present invention.
10 is an exemplary diagram for explaining a case in which the same emotion label is input as a result of reconfirmation according to an embodiment of the present invention.
11 is a diagram for describing an emotion labeling apparatus using a sentence similarity of a crowdsourcing-based crowdsourcing-based project according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in a variety of different forms. It is provided to fully inform the skilled person of the scope of the present invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terms used in the present specification are for describing exemplary embodiments and are not intended to limit the present invention. In this specification, the singular form also includes the plural form unless specifically stated in the phrase. As used herein, “comprises” and/or “comprising” do not exclude the presence or addition of one or more other elements other than the mentioned elements. Throughout the specification, the same reference numerals refer to the same elements, and "and/or" includes each and all combinations of one or more of the mentioned elements. Although "first", "second", and the like are used to describe various elements, it goes without saying that these elements are not limited by these terms. These terms are only used to distinguish one component from another component. Therefore, it goes without saying that the first component mentioned below may be the second component within the technical idea of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in the present specification may be used with meanings that can be commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not interpreted ideally or excessively unless explicitly defined specifically.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 크라우드소싱 서비스의 개념도이다.1 is a conceptual diagram of a crowdsourcing service according to an embodiment of the present invention.

도 1을 참조하면, 크라우드소싱 서비스는 의뢰자(10), 서비스 제공 업체(20) 및 대중(30)으로 구성되어 수행된다.Referring to FIG. 1, the crowdsourcing service is performed by being composed of a client 10, a service provider 20, and a public 30.

의뢰자(10)는 크라우드소싱 기반의 프로젝트(이하, 프로젝트)를 의뢰하는 기업이나 개인을 의미한다.The sponsor 10 refers to a company or individual who requests a crowdsourcing-based project (hereinafter, referred to as a project).

의뢰자(10)는 인공지능 학습데이터의 생성을 위한 소스 데이터의 수집 또는 데이터 어노테이션 등을 목적으로 프로젝트를 의뢰한다. 프로젝트를 통해서 생성된 데이터는 지도 학습, 비지도 학습, 강화 학습 등의 임의의 기계 학습의 학습데이터로 활용될 수 있다. 소스 데이터의 수집은 녹음된 음성 수집, 사진 수집 등 가공되지 않은 데이터를 수집하는 것을 의미한다. 데이터 어노테이션은 텍스트, 사진, 비디오 등의 소스 데이터에 관련 주석 데이터를 입력하는 것을 의미한다. 예들 들어, 데이터 어노테이션은 주어진 지문에서 개체를 찾는 것, 유사한 문장을 찾는 것 등이 있을 수 있으나 이에 제한되지 않는다. 한편, 전술한 프로젝트의 종류는 일 실시예에 불과하며, 의뢰자의 설계에 따라 다양한 프로젝트가 본 발명에서 취급될 수 있다.The client 10 requests a project for the purpose of collecting source data or annotating data for generating artificial intelligence learning data. The data generated through the project can be used as learning data for arbitrary machine learning such as supervised learning, unsupervised learning, and reinforcement learning. Collection of source data means collecting raw data such as recorded voice collection and photo collection. Data annotation refers to inputting relevant annotation data into source data such as text, photos, and videos. For example, the data annotation may include finding an entity in a given fingerprint or finding a similar sentence, but is not limited thereto. Meanwhile, the types of the above-described projects are only one embodiment, and various projects may be handled in the present invention according to the design of the client.

서비스 제공 업체(20)는 크라우드소싱 서비스를 제공하는 기업을 의미한다.The service provider 20 refers to a company that provides crowdsourcing services.

서비스 제공 업체(20)는 의뢰자(10)로부터 제품 또는 서비스에 대한 프로젝트를 의뢰받으면, 해당 프로젝트에 대한 작업을 일반 대중(30)에게 배정하여 대중(30)으로부터 작업 결과를 제공받는다. 이후, 작업 결과를 기반으로 추출된 최종 산출물을 의뢰자(10)에게 제공한다.When the service provider 20 receives a request for a project for a product or service from the client 10, the service provider 20 allocates a task for the project to the general public 30 and receives the work result from the public 30. Thereafter, the final product extracted based on the work result is provided to the requester 10.

이때, 서비스 제공 업체(20)는 크라우드소싱 플랫폼(이하, 플랫폼)을 통해 의뢰자(10) 및 대중(30)에게 크라우드소싱 서비스를 제공한다. 즉, 서비스 제공 업체(20)는 의뢰자(10)로부터 프로젝트를 의뢰받으면, 플랫폼에 프로젝트를 오픈한다. 이후, 대중(30)으로부터 오픈된 프로젝트에 대한 작업 결과를 제공받으면, 해당 프로젝트를 플랫폼 상에서 종료하고, 최종 산출물을 추출하여 의뢰자(10)에게 제공할 수 있다.At this time, the service provider 20 provides a crowdsourcing service to the client 10 and the public 30 through a crowdsourcing platform (hereinafter, the platform). That is, when the service provider 20 receives a request for a project from the client 10, the service provider 20 opens the project on the platform. Thereafter, when the work result for the open project is provided by the public 30, the project is terminated on the platform, and the final product may be extracted and provided to the client 10.

대중(30)은 플랫폼에 오픈된 프로젝트에 참여하는 일반 대중을 의미한다. 여기서, 대중(30)은 서비스 제공 업체(20)가 제공하는 애플리케이션 또는 웹사이트 등을 통해 플랫폼에 오픈된 프로젝트에 참여할 수 있다. The public 30 refers to the general public participating in the project open on the platform. Here, the public 30 may participate in a project opened on the platform through an application or website provided by the service provider 20.

대중(30)은 작업자(32) 및 검수자(34)로 구성된다.The public 30 is composed of a worker 32 and an inspector 34.

작업자(32)는 플랫폼에 오픈된 복수의 프로젝트 중 특정 프로젝트에 참여를 결정한다. 이후, 작업자(32)는 소스 데이터의 수집 또는 데이터 어노테이션 등의 작업을 수행하고, 이를 플랫폼에 전송한다.The worker 32 decides to participate in a specific project among a plurality of projects open on the platform. Thereafter, the operator 32 performs an operation such as collection of source data or data annotation, and transmits it to the platform.

검수자(34)는 플랫폼에 오픈된 복수의 프로젝트 중 특정 프로젝트에 참여를 결정한다. 이후, 검수자(34)는 작업자(32)가 수행한 작업 결과에 대한 검수를 수행한다. 검수자(34)는 검수 수행 결과로서, 검수 통과 처리 또는 반려 처리를 할 수 있고, 반려 처리시 반려 사유를 입력할 수 있다. 검수 통과의 경우 재작업과 이로 인한 재검수가 필요하지 않으므로, 검수 통과는 검수 완료와 동일한 의미를 가진다.The inspector 34 decides to participate in a specific project among a plurality of projects open on the platform. Thereafter, the inspector 34 inspects the result of the work performed by the operator 32. As a result of performing the inspection, the inspector 34 may perform inspection pass processing or rejection processing, and input a rejection reason during rejection processing. In the case of passing the inspection, rework and the resulting re-inspection are not required, so passing the inspection has the same meaning as the completion of the inspection.

도 2는 본 발명의 일 실시예에 따른 크라우드소싱 기반의 프로젝트의 진행 프로세스를 설명하기 위한 흐름도이다. 2 is a flowchart illustrating a crowdsourcing-based project progress process according to an embodiment of the present invention.

먼저, 의뢰자(10)는 서비스 제공 업체(20)로 하나 이상의 프로젝트를 의뢰한다(S11).First, the client 10 requests one or more projects to the service provider 20 (S11).

이후, 서비스 제공 업체(20)는 의뢰된 프로젝트를 플랫폼 상에 오픈한다(S12). 이때, 서비스 제공 업체(20)는 프로젝트 오픈 전에, 해당 프로젝트의 난이도 등을 고려하여 등급을 결정할 수 있다. 즉, 난이도에 따라 어떤 등급 이상의 대중(30)에게 해당 프로젝트를 노출시킬지를 결정할 수 있다. 이에 따라, 프로젝트의 작업 결과의 신뢰도를 높일 수 있게 된다.Thereafter, the service provider 20 opens the requested project on the platform (S12). At this time, the service provider 20 may determine the grade in consideration of the difficulty of the project before opening the project. That is, depending on the difficulty level, it is possible to determine which level or higher public 30 to expose the project. Accordingly, it is possible to increase the reliability of the work result of the project.

이후, 서비스 제공 업체(20)는 프로젝트의 등급에 따라 해당 등급 이상의 작업자(32)에게 작업을 할당하여 작업 요청한다(S13).Thereafter, the service provider 20 requests the work by assigning the work to the workers 32 of the corresponding level or higher according to the class of the project (S13).

이후, 작업자(32)는 할당된 작업을 수행하게 된다(S14). 이때, 작업자(32)는 어떤 이유에 의해 작업 자체가 불가능한 작업에 대해서는 작업을 수행하지 않고 작업 불가 사유를 입력할 수 있다. Thereafter, the worker 32 performs the assigned task (S14). In this case, the worker 32 may input the reason for the inability to work without performing the work for a work in which the work itself is impossible for some reason.

이후, 서비스 제공 업체(20)는 작업자(32)로부터 작업 결과를 제공받고(S15), 해당 작업 결과에 대한 검수 작업을 검수자(34)에게 할당하여 검수 요청한다(S16).Thereafter, the service provider 20 receives the work result from the worker 32 (S15), and assigns the work result to the inspector 34 to request the inspection (S16).

이후, 검수자(34)는 할당된 검수를 수행하게 된다(S17). 이때, 검수자(34)는 작업이 적합하게 수행된 것으로 판단하면 검수 통과를 결정하고, 검수 작업이 잘못된 것으로 판단하면 반려 처리한다. 반려 처리 시, 검수자(34)는 어떤 이유로 작업이 잘못된 것으로 판단했는지에 대한 반려 사유를 입력한다.Thereafter, the inspector 34 performs the assigned inspection (S17). At this time, the inspector 34 decides to pass the inspection when it is determined that the work has been properly performed, and if it is determined that the inspection work is wrong, the inspector 34 rejects it. During rejection processing, the inspector 34 inputs the rejection reason for what reason the job was judged to be wrong.

이후, 서비스 제공 업체(20)는 검수자(34)로부터 검수 결과를 제공받는다(S18). Thereafter, the service provider 20 receives the inspection result from the inspector 34 (S18).

검수 결과가 검수 통과인 경우, 서비스 제공 업체(20)는 해당 작업 결과를 유효한 데이터로 사용하여, 이를 기반으로 하여 프로젝트 종료 시 최종 산출물을 추출하게 된다.If the inspection result passes the inspection, the service provider 20 uses the work result as valid data, and based on this, extracts the final product at the end of the project.

검수 결과가 반려 처리인 경우, 서비스 제공 업체(20)는 내부적으로 검수를 다시 수행하거나, 작업자(32)에게 다시 작업을 배정하여 재작업을 수행하게 할 수도 있다. 재작업시 검수자의 재검수가 필요하다.If the result of the inspection is rejection processing, the service provider 20 may internally perform the inspection again, or assign the work to the worker 32 again to perform the rework. When reworking, re-examination by an inspector is required.

이후, 서비스 제공 업체(20)는 프로젝트 기간이 종료되거나 충분한 유효 데이터를 확보하게 되면 해당 프로젝트를 종료하고(S19), 확보된 유효 데이터를 기반으로 최종 결과물을 산출하여 의뢰자(10)에게 제공한다(S20).Thereafter, when the project period ends or sufficient valid data is secured, the service provider 20 terminates the project (S19), calculates a final result based on the secured valid data, and provides it to the client 10 ( S20).

이때, 프로젝트 종료 전, 서비스 제공 업체(20)는 작업자(32) 및 검수자(34)의 수행 결과를 평가하고, 평가에 따라 작업 비용 및 검수 비용을 산출하여 작업자(32) 및 검수자(34)에게 지급한다.At this time, before the end of the project, the service provider 20 evaluates the performance results of the worker 32 and the inspector 34, calculates the work cost and the inspection cost according to the evaluation, to the worker 32 and the inspector 34. give.

도 1 및 도 2에서는 단순히 의뢰자(10), 서비스 제공 업체(20), 작업자(32), 검수자(34)로 표현하였으나, 이들은 각 참여자에 의해서 운용되는 스마트폰, 태블릿, PDA, 랩톱, 데스크톱, 서버 등과 같은 컴퓨터 장치 또는 전기 통신 장치를 의미한다.In Figures 1 and 2, it is simply expressed as a requester 10, a service provider 20, an operator 32, and an inspector 34, but these are smartphones, tablets, PDAs, laptops, desktops operated by each participant. It means a computer device such as a server or a telecommunication device.

도 3은 본 발명의 일 실시예에 따른 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법의 순서도이다. 도 4는 본 발명의 일 실시예에 따른 소정 수량 이상의 라벨링 작업 결과를 대상으로 하여 재작업 프로세스를 진행하는 것을 설명하기 위한 예시도이다. 도 5는 도 3의 단계 S130의 구체적인 순서도이다. 도 6은 본 발명의 일 실시예에 따른 신규 라벨링 작업 결과와 동일한 감정 라벨이 입력된 기존 라벨링 작업 결과를 추출하는 것을 설명하기 위한 예시도이다. 도 7은 본 발명의 일 실시예에 따른 신규 라벨링 작업 결과와 기존 라벨링 작업 결과의 소스 텍스트 간의 문장 유사도를 측정하는 것을 설명하기 위한 예시도이다. 도 8은 본 발명의 일 실시예에 따른 라벨링 작업 결과의 검수 배정을 설명하기 위한 예시도이다. 도 9는 본 발명의 일 실시예에 따른 재확인 결과로서 수정된 감정 라벨이 입력되는 경우를 설명하기 위한 예시도이다. 도 10은 본 발명의 일 실시예에 따른 재확인 결과로서 동일한 감정 라벨이 입력되는 경우를 설명하기 위한 예시도이다.3 is a flowchart of an emotion labeling method using sentence similarity of a crowdsourcing-based project according to an embodiment of the present invention. 4 is an exemplary diagram for explaining performing a rework process targeting a labeling operation result of a predetermined quantity or more according to an embodiment of the present invention. 5 is a detailed flowchart of step S130 of FIG. 3. 6 is an exemplary view for explaining extracting an existing labeling operation result inputted with the same emotion label as the new labeling operation result according to an embodiment of the present invention. 7 is an exemplary diagram for explaining measuring a sentence similarity between a result of a new labeling operation and a source text of a result of an existing labeling operation according to an embodiment of the present invention. 8 is an exemplary view for explaining the inspection and assignment of labeling work results according to an embodiment of the present invention. 9 is an exemplary view for explaining a case where a modified emotion label is input as a result of reconfirmation according to an embodiment of the present invention. 10 is an exemplary diagram for explaining a case in which the same emotion label is input as a result of reconfirmation according to an embodiment of the present invention.

한편, 도 3 및 도 5에 도시된 단계들은 서비스 제공 업체(20)에 의해 운영되는 플랫폼 서버(이하, 서버)에 의해 수행되는 것으로 이해될 수 있지만, 이에 제한되는 것은 아니다.Meanwhile, the steps illustrated in FIGS. 3 and 5 may be understood as being performed by a platform server (hereinafter, referred to as a server) operated by the service provider 20, but are not limited thereto.

또한, 복수의 작업자(32) 또는 복수의 검수자(34)는 소정의 단말 장치를 이용하여 작업을 수행한다. 작업자(32) 또는 검수자(34)의 단말 장치는 스마트폰, 태블릿, PDA, 랩톱, 데스크톱 등과 같은 컴퓨터 장치 또는 전기 통신 장치일 수 있으나, 이에 제한되는 것은 아니다.In addition, a plurality of workers 32 or a plurality of inspectors 34 perform work using a predetermined terminal device. The terminal device of the operator 32 or the inspector 34 may be a computer device or a telecommunication device such as a smart phone, tablet, PDA, laptop, desktop, etc., but is not limited thereto.

먼저, 도 3을 참조하면, 서버는 크라우드소싱 기반 프로젝트(이하, 프로젝트)의 복수의 라벨링 작업을 복수의 작업자에게 배정하여 소스 텍스트의 감정에 대한 라벨링 작업 수행을 요청하고(S110), 복수의 작업자로부터 복수의 라벨링 작업 결과로서 감정 라벨을 입력받는다(S120).First, referring to FIG. 3, the server allocates a plurality of labeling tasks of a crowdsourcing-based project (hereinafter, referred to as a project) to a plurality of workers to request a labeling task for the emotion of the source text (S110), and the plurality of workers An emotion label is received as a result of a plurality of labeling operations from (S120).

라벨링 작업은 하나 이상의 문장으로 구성되어 있는 소스 텍스트에 대한 감정을 분류하는 작업을 의미한다. The labeling operation refers to an operation of classifying emotions for a source text composed of one or more sentences.

즉, 작업자(32)는 소스 텍스트를 확인하고 주관적인 기준에 따라 소스 텍스트에 대한 감정을 판단하여, 라벨링 작업 결과로서 소스 텍스트에 대한 감정 라벨을 입력하는 것이다.That is, the operator 32 checks the source text, determines an emotion for the source text according to a subjective criterion, and inputs an emotion label for the source text as a result of the labeling operation.

여기서, 소스 텍스트는 감정이 직접적으로 표현되지 않는 텍스트 데이터를 의미한다. 예를 들어, CS(customer service)에서의 고객 상담 내용 또는 고객 리뷰 등일 수 있다.Here, the source text means text data in which emotions are not directly expressed. For example, it may be customer consultation content or customer review in CS (customer service).

작업자(32)가 라벨링 작업을 반복적으로 수행함에 따라 작업 패턴을 학습함으로써, 프로젝트 중간에 작업자(32)의 소스 텍스트에 감정 라벨을 부여하는 기준(이하, 라벨 부여 기준)이 변경될 수 있다. By learning the work pattern as the worker 32 repeatedly performs the labeling task, the criterion for giving an emotion label to the source text of the worker 32 in the middle of a project (hereinafter, a labeling criterion) may be changed.

종래의 경우 프로젝트 중간에 작업자(32)의 라벨 부여 기준이 변경되더라도 이를 자동으로 판단할 수가 없었다. 이로 인해, 유사한 문장으로 이루어진 복수의 소스 텍스트에 대해서 각각 상이한 감정 라벨이 부여되는 상황이 발생하고, 이는 곧 라벨링 작업 결과의 정확도가 낮아지는 문제로 작용하였다.In the conventional case, even if the labeling criterion of the worker 32 was changed in the middle of the project, it could not be automatically determined. As a result, a situation in which different emotional labels are given to a plurality of source texts composed of similar sentences occurs, which soon acts as a problem that the accuracy of the labeling operation result is lowered.

이러한 문제를 해소하기 위해, 본 발명의 일 실시예는, 작업자(32)가 수행한 복수의 라벨링 작업 결과 각각의 소스 데이터 간의 문장 유사도를 측정하여 프로젝트 중간에 작업자(32)의 라벨 부여 기준이 변경됐는지를 자동으로 판단할 수 있도록 한다.In order to solve this problem, an embodiment of the present invention measures the sentence similarity between each source data as a result of a plurality of labeling tasks performed by the worker 32, and the criteria for labeling the worker 32 are changed in the middle of the project. It makes it possible to automatically determine whether it has been done or not.

이어서, 서버는 라벨링 작업을 소정 수량 이상 수행한 작업자(이하, 대상 작업자)를 대상으로, 라벨 부여 기준의 변경 여부를 판단하여, 기존 라벨링 작업 결과에 대한 재확인 프로세스를 진행한다(S130).Subsequently, the server determines whether or not to change the labeling standard for the worker (hereinafter, the target worker) who has performed the labeling operation more than a predetermined quantity, and performs a reconfirmation process for the result of the existing labeling operation (S130).

여기서, 소정 수량은 프로젝트의 각 작업자의 최대 작업 수행 가능 수량의 소정의 비율에 상응하는 수량을 의미한다.Here, the predetermined quantity means a quantity corresponding to a predetermined ratio of the maximum number of possible tasks for each worker of the project.

최대 작업 수행 가능 수량은 각 작업자(32)마다 상이할 수 있다. 예를 들어, 등급이 높은 작업자일수록 더 많은 최대 작업 수행 가능 수량이 할당될 수 있다.The maximum number of tasks that can be performed may be different for each worker 32. For example, the higher the level of the worker, the greater the maximum number of tasks that can be performed may be allocated.

즉, 서버는 각 작업자(32)가 각 작업자(32)의 최대 작업 수행 가능 수량 중 소정의 비율에 상응하는 수량만큼의 라벨링 작업을 수행한 시점부터, 각 작업자(32)에 대한 라벨 부여 기준의 변경 여부를 판단하여 재확인 프로세스를 진행한다.That is, from the time when each worker 32 performs a labeling operation corresponding to a predetermined ratio of the maximum number of possible tasks of each worker 32, the labeling criteria for each worker 32 Determine whether or not to change and proceed with the reconfirmation process.

도 4를 참조하면, 특정 작업자의 최대 작업 수행 가능 수량이 50건이고 소정의 비율이 40퍼센트인 경우, 서버는 특정 작업자가 20번째의 라벨링 작업 결과를 입력한 시점부터 라벨 부여 기준의 변경 여부를 확인하고 재확인 프로세스를 진행할 수 있다.Referring to FIG. 4, when the maximum number of tasks that a specific worker can perform is 50 and a predetermined ratio is 40%, the server determines whether or not to change the labeling criteria from the time when the specific worker inputs the 20th labeling task result. You can confirm and proceed with the reconfirmation process.

즉, 서버는 라벨링 작업을 소정 수량 이상 수행한 작업자를 대상으로 하여, 해당 작업자가 신규 라벨링 작업을 입력할 때마다 해당 작업자의 라벨 부여 기준의 변경 여부를 판단하고 변경 여부에 따라 재확인 프로세스를 진행한다. That is, the server targets workers who have performed more than a predetermined number of labeling tasks, and whenever the worker enters a new labeling task, it determines whether or not the worker's labeling standard has been changed and proceeds with a reconfirmation process according to the change. .

도 4를 참조하면, 서버는 특정 작업자의 소정 수량에 상응하는 20번째 라벨링 작업 결과를 시작으로, 최대 작업 수행 가능 수량에 상응하는 50번째 라벨링 작업 결과에 대해서, 각각의 라벨링 작업 결과가 입력될 때마다 특정 작업자의 라벨 부여 기준이 변경됐는지를 판단하는 것이다.Referring to FIG. 4, the server starts with the 20th labeling result corresponding to the predetermined quantity of a specific worker, and when each labeling result is input for the 50th labeling result corresponding to the maximum number of possible jobs. Each time, it is determined whether the criteria for labeling a specific worker have changed.

이하에서는, 대상 작업자의 라벨 부여 기준의 변경 여부 판단 및 변경 여부에 따른 재확인 프로세스 진행 방법에 대해서 상세히 설명하도록 한다.Hereinafter, a method of determining whether to change the labeling criteria of a target worker and a method of performing a reconfirmation process according to the change will be described in detail.

도 5를 참조하면, 서버는 대상 작업자의 복수의 기존 라벨링 작업 결과 중에서 대상 작업자의 신규 라벨링 작업 결과와 동일한 감정 라벨이 입력된 적어도 하나의 기존 라벨링 작업 결과를 추출한다(S131).Referring to FIG. 5, the server extracts at least one existing labeling job result in which an emotion label identical to the new labeling job result of the target worker is input from among a plurality of existing labeling job results of the target worker (S131).

상술한 바와 같이, 작업자(32)는 소스 텍스트에 대한 감정 라벨을 라벨링 작업 결과로 입력한다. 서버는 대상 작업자가 수행한 복수의 기존 라벨링 작업 결과 중에서, 신규 라벨링 작업 결과로 입력된 감정 라벨과 동일한 감정 라벨이 부여된 기존 라벨링 작업 결과를 추출한다.As described above, the operator 32 inputs an emotion label for the source text as a labeling operation result. The server extracts, from among the results of a plurality of existing labeling operations performed by the target worker, an existing labeling operation result to which an emotion label identical to the emotion label input as a result of the new labeling operation is assigned.

도 6a을 참조하면, 신규 라벨링 작업 결과(10)에 대한 감정 라벨이 '짜증'으로 입력된 경우, 서버는 기존 라벨링 작업 결과(1 내지 9) 중에서 감정 라벨이 '짜증'으로 부여된 기존 라벨링 작업 결과(1 내지 3)만을 추출할 수 있다. Referring to FIG. 6A, when the emotion label for the new labeling operation result 10 is input as'annoying', the server is an existing labeling operation in which the emotional label is assigned as'annoying' among the existing labeling operation results (1 to 9). Only results (1 to 3) can be extracted.

이후, 대상 작업자로부터 다시 신규 라벨링 작업 결과를 입력받게 되면, 서버는 신규 라벨링 작업 결과에 대한 감정 라벨과 동일한 감정 라벨이 입력된 기존 라벨링 작업 결과를 추출한다.Thereafter, when the new labeling operation result is input again from the target worker, the server extracts the existing labeling operation result in which the same emotion label as the emotion label for the new labeling operation result is input.

도 6b를 참조하면, 신규 라벨링 작업 결과(11)에 대한 감정 라벨이 '당황'으로 입력된 경우, 서버는 기존 라벨링 작업 결과(1 내지 10) 중에서 감정 라벨이 '당황'으로 부여된 기존 라벨링 작업 결과(6, 7, 9)만을 추출할 수 있다. Referring to FIG. 6B, when the emotion label for the new labeling operation result 11 is input as'embarrassing', the server is an existing labeling operation in which the emotion label is assigned as'embarrassing' among the existing labeling operation results (1 to 10). Only results (6, 7, 9) can be extracted.

이어서, 다시 도 5를 참조하면, 서버는 신규 라벨링 작업 결과의 소스 텍스트와 추출된 적어도 하나의 기존 라벨링 작업 결과의 소스 텍스트의 문장 유사도를 측정한다(S132).Subsequently, referring to FIG. 5 again, the server measures sentence similarity between the source text of the new labeling result and the extracted source text of at least one existing labeling result (S132).

서버는 프로젝트의 오픈 전, 프로젝트의 복수의 라벨링 작업에 상응하는 복수의 소스 텍스트를 복수의 벡터로 임베딩(embedding)한다. 이때, 복수의 소스 텍스트를 복수의 벡터로 임베딩하기 위해서 Sent2Vec 알고리즘을 사용할 수 있지만, 이에 제한되지는 않는다.The server embeds a plurality of source texts corresponding to a plurality of labeling tasks of the project into a plurality of vectors before opening of the project. In this case, the Sent2Vec algorithm may be used to embed a plurality of source texts into a plurality of vectors, but is not limited thereto.

즉, 서버는 텍스트 임베딩 모델을 통해 텍스트 임베딩 프로세스를 수행하여, 프로젝트의 전체 라벨링 작업의 소스 텍스트가 각각 벡터로 표현되도록 한다.That is, the server performs the text embedding process through the text embedding model, so that the source texts of the entire labeling work of the project are each represented as vectors.

이후, 대상 작업자의 신규 라벨링 작업 결과의 소스 텍스트와 추출된 기존 라벨링 작업 결과의 소스 텍스트 간의 문장 유사도를 측정하기 위해, 서버는 프로젝트 오픈 전에 미리 각각의 벡터로 임베딩된 각각의 소스 텍스트를 이용하는 것이다. 벡터를 이용한 문장 유사도 측정은 자연어 처리 기술분야에서 잘 알려진 다양한 방법과 알고리즘들이 사용될 수 있다.Thereafter, in order to measure the sentence similarity between the source text of the new labeling work result of the target worker and the source text of the extracted existing labeling work result, the server uses each source text embedded in each vector before opening the project. Various methods and algorithms well known in the field of natural language processing technology can be used to measure sentence similarity using vectors.

구체적으로, 서버는 신규 라벨링 작업 결과의 소스 텍스트의 벡터와 추출된 적어도 하나의 기존 라벨링 작업 결과의 소스 텍스트의 벡터를 이용하여 문장 유사도를 측정한다.Specifically, the server measures sentence similarity by using the vector of the source text of the result of the new labeling operation and the vector of the source text of the extracted at least one existing labeling operation result.

즉, 서버는 신규 라벨링 작업 결과의 소스 텍스트의 벡터와, 추출된 기존 라벨링 작업 결과의 소스 텍스트의 벡터를 각각 비교하여, 신규 라벨링 작업 결과의 소스 텍스트와 각각의 기존 라벨링 작업 결과의 소스 텍스트의 문장 유사도를 측정한다. 이로써, 서버는 각각의 기존 라벨링 작업 결과의 소스 텍스트별로 신규 라벨링 작업 결과의 소스 텍스트와의 유사도를 확인할 수 있다.That is, the server compares the vector of the source text of the new labeling result and the vector of the extracted source text of the existing labeling result, and the source text of the new labeling result and the source text of each of the existing labeling results. Measure the degree of similarity. Accordingly, the server can check the similarity of the new labeling result to the source text for each source text of each of the existing labeling results.

도 7을 참조하면, 동일한 감정 라벨인 '짜증'이 부여된 신규 라벨링 작업 결과의 소스 텍스트와 3건의 기존 라벨링 작업 결과 각각의 소스 텍스트 간의 문장 유사도를 측정할 수 있다. 첫 번째 기존 라벨링 작업 결과의 소스 텍스트와 신규 라벨링 작업 결과의 소스 텍스트의 문장 유사도는 50퍼센트로 측정되었고, 두 번째 기존 라벨링 작업 결과의 소스 텍스트와 신규 라벨링 작업 결과의 소스 텍스트의 문장 유사도는 30퍼센트로 측정되었고, 세 번째 기존 라벨링 작업 결과의 소스 텍스트와 신규 라벨링 작업 결과의 소스 텍스트의 문장 유사도는 10퍼센트로 측정되었다. Referring to FIG. 7, it is possible to measure a sentence similarity between a source text of a result of a new labeling operation to which the same emotional label “annoyance” is assigned and a source text of each of the result of three existing labeling operations. Sentence similarity between the source text of the first existing labeling operation result and the source text of the new labeling operation result was measured at 50%, and the sentence similarity between the source text of the second existing labeling operation result and the source text of the new labeling operation result was 30%. The sentence similarity between the source text of the third existing labeling result and the source text of the new labeling result was measured as 10%.

이어서, 서버는 문장 유사도의 측정 결과에 따라, 대상 작업자의 라벨 부여 기준의 변경 여부를 판단한다(S133).Subsequently, the server determines whether to change the labeling criteria of the target worker according to the measurement result of the sentence similarity (S133).

구체적으로, 서버는 신규 라벨링 작업 결과의 소스 텍스트와 추출된 적어도 하나의 기존 라벨링 작업 결과의 소스 텍스트의 평균 문장 유사도를 산출한다.Specifically, the server calculates an average sentence similarity between the source text of the new labeling result and the extracted source text of at least one existing labeling result.

상술한 바와 같이, 추출된 기존 라벨링 작업 결과의 소스 텍스트별로 신규 라벨링 작업 결과와의 문장 유사도가 측정되면, 서버는 추출된 기존 라벨링 작업 결과의 소스 텍스트의 평균 문장 유사도를 산출한다.As described above, when the sentence similarity between the extracted original labeling result and the new labeling result is measured for each source text, the server calculates the average sentence similarity of the extracted source text of the existing labeling result.

도 7을 참조하면, 3건의 기존 라벨링 작업 결과의 소스 텍스트의 문장 유사도는 각각 50퍼센트, 30퍼센트 및 10퍼센트로 측정되었으므로, 기존 라벨링 작업 결과의 평균 문장 유사도는 30퍼센트로 산출될 수 있다.Referring to FIG. 7, since the sentence similarity of the source text of the three existing labeling results was measured as 50%, 30%, and 10%, respectively, the average sentence similarity of the existing labeling result may be calculated as 30%.

그리고, 산출된 평균 문장 유사도가 소정의 기준 값 이하인 경우, 서버는 대상 작업자의 라벨 부여 기준이 변경된 것으로 판단한다.In addition, when the calculated average sentence similarity is less than or equal to a predetermined reference value, the server determines that the labeling criterion of the target worker has been changed.

즉, 신규 라벨링 작업 결과의 소스 텍스트와 적어도 하나의 기존 라벨링 작업 결과의 소스 텍스트의 평균 문장 유사도가 소정의 기준 값 이하로 산출되면, 서버는 신규 라벨링 작업 결과와 기존 라벨링 작업 결과에 동일하게 부여된 감정 라벨에 관련해서 대상 작업자의 라벨 부여 기준이 변경된 것으로 판단할 수 있다.That is, if the average sentence similarity between the source text of the new labeling result and the source text of at least one existing labeling result is calculated to be less than a predetermined reference value, the server is assigned the same to the new labeling result and the existing labeling result. In relation to the emotion label, it may be determined that the target worker's labeling criteria have changed.

도 7을 참조하면, 소정의 기준 값이 60퍼센트인데, 신규 라벨링 작업 결과의 소스 텍스트와 적어도 하나의 기존 라벨링 작업 결과의 소스 텍스트의 평균 문장 유사도가 30퍼센트로 산출된 경우, 서버는 대상 작업자의 라벨 부여 기준이 변경된 것으로 판단할 수 있다. 구체적으로, 서버는 신규 라벨링 작업 결과와 3건의 기존 라벨링 작업 결과에 동일하게 부여된 감정 라벨인 '짜증'과 관련하여 대상 작업자의 라벨 부여 기준이 변경된 것을 알 수 있다.Referring to FIG. 7, when the predetermined reference value is 60%, and the average sentence similarity between the source text of the new labeling operation result and the source text of at least one existing labeling operation result is calculated as 30%, the server It can be determined that the labeling criteria have changed. Specifically, the server can know that the labeling criteria of the target worker has been changed in relation to the emotional label'Annoyance', which is equally assigned to the result of the new labeling operation and the results of the three existing labeling operations.

신규 라벨링 작업 결과의 소스 텍스트와 기존 라벨링 작업 결과의 소스 텍스트의 문장 유사도가 낮은데 동일한 감정 라벨이 부여되었다는 것은, 대상 작업자가 이전과는 다른 기준으로 라벨링 작업을 수행하고 있다는 것을 의미하기 때문에, 서버가 대상 작업자의 라벨 부여 기준의 변경 여부를 판단할 수 있게 되는 것이다.The fact that the sentence similarity between the source text of the new labeling result and the source text of the existing labeling result is low, but the same sentiment label is given means that the target worker is performing labeling with a different standard than before, so the server It is possible to determine whether or not the target worker's labeling criteria have been changed.

이어서, 서버는 라벨 부여 기준의 변경 여부의 판단 결과에 따라, 대상 작업자에게 추출된 적어도 하나의 기존 라벨링 작업 결과에 대한 재확인 수행을 요청한다(S134).Subsequently, the server requests the target operator to perform reconfirmation of the extracted at least one existing labeling operation result according to the determination result of whether to change the labeling criterion (S134).

대상 작업자의 라벨 부여 기준이 변경된 것으로 판단되면, 서버는 대상 작업자에게 추출된 적어도 하나의 기존 라벨링 작업 결과에 대한 재확인 수행을 요청한다.If it is determined that the target worker's labeling criteria have been changed, the server requests the target worker to perform reconfirmation of the extracted at least one existing labeling work result.

즉, 특정 감정에 대한 라벨 부여 기준이 변경되었다면, 특정 감정에 대한 라벨 부여 기준이 변경되기 전에 대상 작업자가 수행했던 해당 감정 라벨이 부여된 적어도 하나의 기존 라벨링 작업 결과가 적절하지 않은 라벨 부여 기준에 따라 라벨링되었을 가능성이 크기 때문에, 서버는 대상 작업자에게 해당 기존 라벨링 작업 결과에 대해서 재확인하도록 하는 것이다.In other words, if the labeling criteria for a specific emotion is changed, the result of at least one existing labeling task with the corresponding emotion label performed by the target worker before the labeling standard for the specific emotion was changed is not appropriate for the labeling criteria. Since it is likely to have been labeled accordingly, the server asks the target operator to reconfirm the result of the existing labeling operation.

이어서, 서버는 대상 작업자로부터 추출된 적어도 하나의 기존 라벨링 작업 결과에 대한 재확인 결과를 입력받는다(S135).Subsequently, the server receives a reconfirmation result for at least one existing labeling job result extracted from the target worker (S135).

재확인 수행 요청에 따라, 대상 작업자는 기존의 라벨 부여 기준(학습되기 전의 기준)에 따라 자신이 수행했던 기존 라벨링 작업 결과의 소스 텍스트에 부여된 감정 라벨이 적절한지를 자신의 현재 라벨 부여 기준(학습된 후의 기준)에 따라 다시 한번 판단한다. Upon request to perform reconfirmation, the target operator determines whether the sentiment label given to the source text of the result of the existing labeling work performed according to the existing labeling criteria (the criteria before learning) is appropriate. Judgment once again according to the following criteria).

대상 작업자는 추출된 기존 라벨링 작업 결과의 소스 텍스트에 부여된 감정 라벨이 적절하지 않은 것으로 판단하면, 부여된 감정 라벨과 상이한 감정 라벨로 수정하여 입력한다. 즉, 서버는 대상 작업자로부터 추출된 기존 라벨링 작업 결과에 대한 재확인 결과로서 수정된 감정 라벨을 입력받는다.If the target worker determines that the sentiment label assigned to the source text of the extracted existing labeling work result is not appropriate, it modifies the sentiment label and inputs it to a different sentiment label. That is, the server receives the modified emotion label as a result of reconfirming the result of the existing labeling operation extracted from the target worker.

예를 들어, 대상 작업자는 기존 라벨링 작업 결과에 대한 재작업 결과로서 기존의 '짜증'의 감정 라벨에서 수정된 '당황'의 감정 레벨을 입력할 수 있다. For example, the target worker may input the emotion level of'embarrassment' modified from the existing emotion label of'annoyance' as a result of rework on the result of the existing labeling work.

대상 작업자는 추출된 기존 라벨링 작업 결과의 소스 텍스트에 부여된 감정 라벨이 적절한 것으로 판단하면, 부여된 감정 라벨을 수정하지 않고 동일한 감정 라벨을 그대로 입력한다. 즉, 서버는 대상 작업자로부터 추출된 기존 라벨링 작업 결과에 대한 재확인 결과로서 동일한 감정 라벨을 입력받는다.If the target worker determines that the sentiment label assigned to the source text of the extracted existing labeling work result is appropriate, the same sentiment label is input as it is without modifying the assigned sentiment label. That is, the server receives the same emotion label as a result of reconfirming the result of the existing labeling operation extracted from the target worker.

예를 들어, 대상 작업자는 기존 라벨링 작업 결과에 대한 재작업 결과로서 기존의 '짜증'의 감정 라벨에서 수정되지 않은 '짜증'의 감정 라벨을 동일하게 입력할 수 있다.For example, the target worker may input the same emotion label of'Annoyance', which is not modified from the emotion label of'Annoyance', as a result of rework on the result of the existing labeling work.

이와 같이, 잘못 수행되었을 가능성이 있는 기존 라벨링 작업 결과에 대해 대상 작업자가 재확인을 수행하게 함으로써, 각 대상 작업자의 라벨링 작업 결과의 정확도를 높일 수 있고, 정확도가 높은 작업 결과는 검수하기 까다롭지 않고 검수 통과될 확률이 높기 때문에 검수자(34)의 검수 부담도 줄일 수 있다.In this way, by allowing the target worker to reconfirm the results of the existing labeling work that may have been performed incorrectly, the accuracy of the labeling work results of each target worker can be increased, and work results with high accuracy are not difficult to inspect and inspect. Since the probability of passing is high, the inspection burden on the inspector 34 can be reduced.

이어서, 서버는 재확인 결과를 포함하는 복수의 라벨링 작업 결과를 복수의 검수자에게 배정하여 검수 수행을 요청하고(S140), 복수의 검수자로부터 복수의 라벨링 작업 결과에 대한 복수의 검수 결과를 검수 통과 또는 반려로 입력받는다(S150).Subsequently, the server allocates the results of a plurality of labeling operations including the reconfirmation results to a plurality of inspectors to request inspection (S140), and passes or rejects a plurality of inspection results for the results of the plurality of labeling operations from the plurality of inspectors. It is input as (S150).

서버는 대상 작업자에 의해 재확인 수행되지 않은 라벨링 작업 결과에 대해서는, 도 8a에 도시된 바와 같이, 복수의 검수자(34)에게 순차적으로 배정하되, 대상 작업자에 의해 재확인 수행된 라벨링 작업 결과에 대해서는, 도 8b에 도시된 바와 같이, 재확인 결과를 검수 배정 대상으로 하여 복수의 검수자(34)에게 순차적으로 배정한다.The server sequentially allocates to a plurality of inspectors 34 for the labeling work results that have not been reconfirmed by the target worker, as shown in Fig.8a, but for the labeling work results that have been reconfirmed by the target worker, Fig. As shown in 8b, the reconfirmation result is sequentially assigned to a plurality of inspectors 34 as a subject of inspection and assignment.

검수자(34)는 배정받은 라벨링 작업 결과에 대해서, 감정 라벨이 적절하게 부여되었다고 판단하면 검수 통과를 입력하고, 감정 라벨이 적절하지 않게 부여되었다고 판단하면 반려로 입력한다.The inspector 34 inputs an inspection pass when it is determined that the emotional label has been appropriately assigned to the assigned labeling operation result, and inputs a rejection when it determines that the emotional label has been improperly assigned.

한편, 도 3에 명확하게 도시하지는 않았지만, 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법은 재확인 결과에 대한 검수 결과에 따라, 추출된 기존 라벨링 작업 결과에 대한 재확인을 수행한 작업자에게 소정의 보상을 제공하는 단계를 더 포함할 수 있다.On the other hand, although not clearly shown in Fig. 3, the emotion labeling method using sentence similarity of a crowdsourcing-based project provides a predetermined compensation to the worker who performs reconfirmation of the extracted existing labeling work result according to the inspection result of the reconfirmation result. It may further include the step of providing a.

구체적으로, 서버는 수정된 감정 라벨이 입력된 재확인 결과에 대한 검수 결과가 검수 통과인 경우에 대해서 작업자에게 소정의 보상을 제공한다. Specifically, the server provides a predetermined compensation to the operator in the case that the inspection result for the reconfirmation result in which the revised emotion label is input is passed the inspection.

도 9를 참조하면, 대상 작업자가 3건의 기존 라벨링 작업 결과에 대해서, 수정된 감정 라벨을 재확인 결과로 입력하였고, 3건의 재확인 결과에 대한 검수 결과가 전부 검수 통과이기 때문에, 서버는 검수 통과된 3건의 기존 라벨링 작업 결과에 대해서 대상 작업자에게 소정의 보상을 제공한다. 도 9에서는 복수의 기존 라벨링 작업 결과에 대해서 전부 수정된 감정 라벨을 입력한 것으로 설명하였지만, 대상 작업자의 판단에 따라 복수의 기존 라벨링 작업 결과 중 일부에 대해서만 수정된 감정 라벨을 입력할 수도 있다.Referring to FIG. 9, the target worker inputs the corrected emotional label as a reconfirmation result for the results of three existing labeling operations, and all the inspection results for the three reconfirmation results pass the inspection, so the server passes the inspection 3 A predetermined compensation is provided to the target worker for the result of the existing labeling work of the case. In FIG. 9, it has been described that all of the modified emotion labels are input for the results of the plurality of existing labeling operations, but the modified emotion labels may be input only for some of the plurality of existing labeling results according to the judgment of the target worker.

이때, 대상 작업자는 자신이 (수정된 감정 라벨을 부여하여) 재확인 수행한 라벨링 작업 결과가 검수 통과되면, 검수 통과된 작업 결과에 대해서 기본적으로 지급되는 보상(예를 들어, 작업 단가) 이외에, 추가적인 보상을 제공받는다. 이는, 대상 작업자가 재확인을 성실하게 수행함에 따라 반려-재작업-재검수의 과정을 거치지 않고 바로 검수 통과됨으로써 프로젝트가 원활히 진행될 수 있도록 하였기 때문이다.At this time, the target worker is additionally paid in addition to the compensation (e.g., work unit cost) that is basically paid for the result of the work that has passed the inspection when the result of the labeling work that he or she has reconfirmed (by granting a modified emotion label) is passed. Get rewarded. This is because the project can proceed smoothly by passing the inspection immediately without going through the process of rejection-rework-re-review as the target worker faithfully performs the reconfirmation.

또한, 도 3에 명확하게 도시하지는 않았지만, 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법은 재확인 결과에 대한 검수 결과에 따라, 추출된 기존 라벨링 작업 결과에 대한 재확인을 수행한 작업자에게 소정의 패널티를 제공하는 단계를 더 포함할 수 있다.In addition, although not clearly shown in FIG. 3, the emotion labeling method using sentence similarity of a crowdsourcing-based project is a penalty for the operator who reconfirms the extracted existing labeling result according to the inspection result of the reconfirmation result. It may further include the step of providing a.

구체적으로, 서버는 동일한 감정 라벨이 입력된 재확인 결과에 대한 검수 결과가 반려인 경우에 대해서 작업자에게 소정의 패널티를 제공한다.Specifically, the server provides a predetermined penalty to the worker when the result of the reconfirmation for which the same emotion label is input is rejected.

도 10을 참조하면, 대상 작업자가 3건의 기존 라벨링 작업 결과에 대해서, 동일한 감정 라벨을 재확인 결과로 입력하였고, 3건의 재확인 결과에 대한 검수 결과가 전부 반려이기 때문에, 서버는 반려된 3건의 기존 라벨링 작업 결과에 대해서 소정의 패널티를 제공한다. 도 10에서는 복수의 기존 라벨링 작업 결과에 대해서 전부 동일한 감정 라벨을 입력한 것으로 설명하였지만, 대상 작업자의 판단에 따라 복수의 기존 라벨링 작업 결과 중 일부에 대해서만 동일한 감정 라벨을 입력할 수도 있다.Referring to FIG. 10, the target worker inputs the same emotion label as the reconfirmation result for 3 existing labeling work results, and since all the inspection results for the 3 reconfirmation results are rejected, the server performs 3 rejected existing labeling. Provides a certain penalty for work results. In FIG. 10, it has been described that the same emotion label is input for all of the results of a plurality of existing labeling operations, but the same emotion label may be input only for some of the plurality of existing labeling operation results according to the judgment of the target worker.

대상 작업자가 (동일한 감정 라벨을 부여하여) 재확인 수행한 라벨링 작업 결과가 반려된 경우 패널티를 제공하는 것은, 서버가 해당 대상 작업자를 성실하지 않은 작업자로 판단했기 때문이다. 즉, 대상 작업자는 라벨링 작업 결과를 적절하게 수정할 기회가 있었음에도 불구하고, 이를 제대로 수행하지 않고 결국에는 정확도 낮은 작업 결과를 제공하여, 검수 병목 현상을 유발할 수 있기 때문이다.If the result of the labeling work performed by the target worker (by giving the same emotional label) is rejected, the penalty is provided because the server determined the target worker as an unscrupulous worker. That is, even though the target worker had the opportunity to appropriately correct the labeling work result, it does not properly perform this and eventually provides the work result with low accuracy, which may cause the inspection bottleneck.

한편, 상술한 설명에서, 단계 S11 내지 S135은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. 아울러, 기타 생략된 내용이라 하더라도 후술하는 도 11의 내용은 도 1 내지 도 10의 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법에도 적용될 수 있다.Meanwhile, in the above description, steps S11 to S135 may be further divided into additional steps or may be combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, or the order between steps may be changed. In addition, even if other contents are omitted, the contents of FIG. 11 to be described later can also be applied to the emotion labeling method using sentence similarity of the crowdsourcing-based project of FIGS. 1 to 10.

이하에서는 도 11을 참조하여 본 발명의 일 실시예에 따른 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 장치(200)에 대하여 설명하도록 한다.Hereinafter, an emotion labeling apparatus 200 using sentence similarity of a crowdsourcing-based project according to an embodiment of the present invention will be described with reference to FIG. 11.

도 11은 본 발명의 일 실시예에 따른 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 장치를 설명하기 위한 도면이다.11 is a diagram for describing an emotion labeling apparatus using sentence similarity of a crowdsourcing-based project according to an embodiment of the present invention.

도 11을 참조하면, 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 장치(200)(이하, 감정 라벨링 장치)는 통신모듈(210), 메모리(220) 및 프로세서(230)를 포함한다.Referring to FIG. 11, an emotion labeling device 200 (hereinafter, an emotion labeling device) using sentence similarity of a crowdsourcing-based project includes a communication module 210, a memory 220, and a processor 230.

통신모듈(210)은 하나의 프로젝트에 대한 크라우드소싱 기반의 작업을 복수의 작업자(32)에게 송신하여 작업 수행을 요청하고, 복수의 작업자(32)로부터 작업 결과를 수신한다. 복수의 작업자(32)로부터 수신된 작업 결과를 복수의 검수자(34)에게 송신하여 검수를 요청하고, 복수의 검수자(34)로부터 검수 결과를 수신한다. The communication module 210 transmits a crowdsourcing-based work for one project to a plurality of workers 32 to request a task to be performed, and receives a work result from the plurality of workers 32. The work results received from the plurality of workers 32 are transmitted to the plurality of inspectors 34 to request inspection, and the inspection results are received from the plurality of inspectors 34.

또한, 통신모듈(210)은 라벨링 작업을 소정 수량 이상 수행한 작업자(이하, 대상 작업자)에게 기존 라벨링 작업 결과에 대한 재확인 수행을 요청하고, 대상 작업자로부터 기존 라벨링 작업 결과에 대한 재확인 결과를 수신한다.In addition, the communication module 210 requests the operator (hereinafter, the target worker) who has performed the labeling operation more than a predetermined quantity to perform reconfirmation of the result of the existing labeling operation, and receives the reconfirmation result of the result of the existing labeling operation from the target operator. .

메모리(220)에는 통신모듈(210)로부터 수신한 데이터에 기초하여 라벨 부여 기준이 변경된 것으로 판단되는 대상 작업자를 대상으로 하여 기존 라벨링 작업 결과에 대한 재확인 프로세스를 진행하기 위한 프로그램이 저장된다.The memory 220 stores a program for reconfirming the result of the existing labeling operation targeting a target worker who is determined to have changed the labeling criteria based on the data received from the communication module 210.

프로세서(230)는 메모리(220)에 저장된 프로그램을 실행시킨다. 프로세서(230)는 메모리(220)에 저장된 프로그램을 실행시킴에 따라, 라벨링 작업을 소정 수량 이상 수행한 작업자(이하, 대상 작업자)를 대상으로, 라벨 부여 기준의 변경 여부를 판단하여 기존 라벨링 작업 결과에 대한 재확인 프로세스를 진행한다.The processor 230 executes a program stored in the memory 220. As the processor 230 executes the program stored in the memory 220, the operator (hereinafter referred to as the target worker) who has performed the labeling operation more than a predetermined quantity determines whether or not the labeling standard has been changed, and the result of the existing labeling operation. Proceed with the reconfirmation process.

구체적으로, 프로세서(230)는 대상 작업자의 복수의 기존 라벨링 작업 결과 중에서 대상 작업자의 신규 라벨링 작업 결과와 동일한 감정 라벨이 입력된 적어도 하나의 기존 라벨링 작업 결과를 추출하고, 신규 라벨링 작업 결과의 소스 텍스트와 상기 추출된 적어도 하나의 기존 라벨링 작업 결과의 소스 텍스트의 문장 유사도를 측정하고, 문장 유사도의 측정 결과에 따라, 대상 작업자의 라벨 부여 기준의 변경 여부를 판단하고, 라벨 부여 기준의 변경 여부의 판단 결과에 따라, 대상 작업자에게 추출된 적어도 하나의 기존 라벨링 작업 결과에 대한 재확인 수행을 요청할 수 있다.Specifically, the processor 230 extracts at least one existing labeling job result in which the same emotion label as the new labeling job result of the target worker is input from among the plurality of existing labeling job results of the target worker, and the source text of the new labeling job result. And measure the sentence similarity of the source text of the extracted at least one existing labeling result, determine whether to change the labeling criteria of the target worker according to the measurement result of the sentence similarity, and determine whether to change the labeling criteria Depending on the result, it is possible to request the target worker to perform reconfirmation of the extracted at least one existing labeling work result.

또한, 프로세서(230)는 대상 작업자로부터 추출된 기존 라벨링 작업 결과에 대한 재확인 결과로서 수정된 감정 라벨을 입력받는 경우, 입력된 재확인 결과의 검수 결과가 검수 통과이면, 대상 작업자에게 소정의 보상을 제공한다. In addition, when the processor 230 receives the corrected emotional label as a result of reconfirming the result of the existing labeling operation extracted from the target worker, if the result of the reconfirmation of the input reconfirmation result passes the inspection, the processor 230 provides a predetermined compensation to the target worker. do.

반면에, 프로세서(230)는 대상 작업자로부터 추출된 기존 라벨링 작업 결과에 대한 재확인 결과로서 동일한 감정 라벨을 입력받는 경우, 입력된 재확인 결과의 검수 결과가 반려이면, 대상 작업자에게 소정의 패널티를 제공한다.On the other hand, when the processor 230 receives the same emotional label as a result of reconfirming the result of the existing labeling operation extracted from the target worker, if the result of the verification of the input reconfirmation result is rejected, the processor 230 provides a predetermined penalty to the target worker. .

도 11을 참조하여 설명한 감정 라벨링 장치(200)는 상술한 서버의 구성요소로 제공될 수 있다.The emotion labeling apparatus 200 described with reference to FIG. 11 may be provided as a component of the above-described server.

이상에서 전술한 본 발명의 일 실시예에 따른 크라우드소싱 기반 프로젝트의 문장 유사도를 이용한 감정 라벨링 방법은, 하드웨어인 컴퓨터와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The emotion labeling method using sentence similarity of a crowdsourcing-based project according to an embodiment of the present invention described above may be implemented as a program (or application) to be executed by being combined with a computer, which is hardware, and stored in a medium.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, Ruby, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The above-described program includes C, C++, JAVA, Ruby, which can be read by a processor (CPU) of the computer through the device interface of the computer, in order for the computer to read the program and execute the methods implemented as a program. It may include a code (Code) coded in a computer language such as machine language. Such code may include a functional code related to a function defining necessary functions for executing the methods, and a control code related to an execution procedure necessary for the processor of the computer to execute the functions according to a predetermined procedure. can do. In addition, these codes may further include additional information required for the processor of the computer to execute the functions or code related to a memory reference to which location (address address) of the internal or external memory of the computer should be referenced. have. In addition, when the processor of the computer needs to communicate with any other computer or server in a remote in order to execute the functions, the code is It may further include a communication-related code for whether to communicate, what kind of information or media to be transmitted and received during communication.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버 상의 다양한 기록매체 또는 사용자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The stored medium is not a medium that stores data for a short moment, such as a register, cache, memory, etc., but a medium that stores data semi-permanently and can be read by a device. Specifically, examples of the storage medium include, but are not limited to, ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. That is, the program may be stored in various recording media on various servers to which the computer can access, or on various recording media on the user's computer. In addition, the medium may be distributed over a computer system connected through a network, and computer-readable codes may be stored in a distributed manner.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains will be able to understand that other specific forms can be easily modified without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative and non-limiting in all respects. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

10 : 의뢰자
20 : 서비스 제공 업체
30 : 대중
32 : 작업자
34 : 검수자
200 : 감정 라벨링 장치
210 : 통신모듈
220 : 메모리
230 : 프로세서10: requester
20: service provider
30: public
32: worker
34: inspector
200: emotion labeling device
210: communication module
220: memory
230: processor

Claims

As a method performed by a computer,
Allocating a plurality of labeling tasks of a crowdsourcing-based project (hereinafter, referred to as a project) to a plurality of workers and requesting to perform a labeling task for emotion of the source text;
Receiving an emotion label as a result of a plurality of labeling operations from the plurality of workers;
Determining whether to change a labeling standard for a worker (hereinafter, referred to as a target worker) who has performed a labeling operation more than a predetermined quantity, and performing a reconfirmation process for the result of the existing labeling operation;
Allocating a plurality of labeling work results including a reconfirmation result to a plurality of inspectors and requesting to perform the inspection; And
Including; receiving a plurality of inspection results for the plurality of labeling work results from the plurality of inspectors as passing or rejecting the inspection; Including,
The step of performing a reconfirmation process for the result of the existing labeling operation,
Extracting at least one existing labeling result from among the plurality of existing labeling work results of the target worker and inputting the same emotion label as the new labeling work result of the target worker; and
Measuring a sentence similarity between the source text of the new labeling result and the extracted source text of the at least one existing labeling result; and
Determining whether to change the labeling criterion of the target worker according to the measurement result of the sentence similarity; and
Requesting the target operator to perform reconfirmation of the extracted at least one existing labeling operation result, according to a determination result of whether or not the labeling criterion has been changed; and
And receiving a reconfirmation result of the extracted at least one existing labeling operation result from the target worker,
The step of determining whether to change the labeling criteria of the target worker,
Calculating an average sentence similarity between the source text of the new labeling result and the extracted source text of the at least one existing labeling result;
In the case where the average sentence similarity is less than or equal to the predetermined reference value, determining that the labeling criterion of the target worker has been changed,
Sentiment labeling method using sentence similarity of crowdsourcing-based project.

delete

The method of claim 1,
The step of receiving a reconfirmation result of the extracted at least one existing labeling operation result from the target worker,
Receiving a modified emotion label as a reconfirmation result of the extracted existing labeling work result from the target worker,
Sentiment labeling method using sentence similarity of crowdsourcing-based project.

The method of claim 3,
In accordance with a result of the inspection of the reconfirmation result, the step of providing a predetermined compensation to the operator who has performed reconfirmation of the extracted existing labeling operation result,
Providing a predetermined compensation to an operator who has performed reconfirmation of the extracted existing labeling operation result according to the inspection result of the reconfirmation result,
To provide a predetermined compensation to the operator in the case that the inspection result for the reconfirmation result inputted with the corrected emotion label passes the inspection,
Sentiment labeling method using sentence similarity of crowdsourcing-based project.

The method of claim 1,
The step of receiving a reconfirmation result of the extracted at least one existing labeling operation result from the target worker,
To receive the same emotion label as a reconfirmation result of the extracted existing labeling work result from the target worker,
Sentiment labeling method using sentence similarity of crowdsourcing-based project.

The method of claim 5,
According to the inspection result of the reconfirmation result, further comprising the step of providing a predetermined penalty to the operator who has performed reconfirmation of the extracted existing labeling operation result,
Providing a predetermined penalty to an operator who has performed reconfirmation of the extracted existing labeling operation result according to the inspection result of the reconfirmation result,
To provide a predetermined penalty to the worker in the case that the inspection result for the reconfirmation result inputted with the same emotion label is rejected,
Sentiment labeling method using sentence similarity of crowdsourcing-based project.

The method of claim 1,
Before opening the project, further comprising embedding a plurality of source texts corresponding to the plurality of labeling tasks of the project into a plurality of vectors,
Measuring the sentence similarity,
Measuring the sentence similarity by using the vector of the source text of the new labeling result and the vector of the extracted source text of the at least one existing labeling result,
Sentiment labeling method using sentence similarity of crowdsourcing-based project.

The method of claim 7,
Embedding the plurality of source texts into a plurality of vectors,
Is to use the Sent2Vec algorithm,
Sentiment labeling method using sentence similarity of crowdsourcing-based project.

The method of claim 1,
The predetermined quantity is a quantity corresponding to a predetermined ratio of the maximum workable quantity of each worker of the project,
Sentiment labeling method using sentence similarity of crowdsourcing-based project.

A computer program stored in a computer-readable recording medium to perform an emotion labeling method using the sentence similarity of the crowdsourcing-based project of any one of claims 1, 3 to 9 in combination with a computer.