KR102345815B1

KR102345815B1 - Method and system for generating sentences containing target words

Info

Publication number: KR102345815B1
Application number: KR1020210061650A
Authority: KR
Inventors: 김기영
Original assignee: 주식회사 아티피셜 소사이어티
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2021-12-31

Abstract

The present invention relates to a method and system for generating a sentence comprising a target word. In one example of the present invention, the method comprises: a step of receiving, by a system, a target word from a user; a step of generating, by the system, a sentence comprising the target word as a generation result using the generated learning model based on crawling or self-generation data; a step of evaluating, by the system, a suitability for the generation result; and a step of outputting, by the system, an evaluation result for the suitability, wherein in the evaluating step, the suitability of the at least one sentence is evaluated based on the length of each sentence according to the generation result, the number of sentences, and the degree of association between two adjacent words in each sentence. Therefore, the present invention is capable of enabling to output a more natural sentence.

Description

{Method and system for generating sentences containing target words}

본 발명은 타겟 단어가 포함된 문장 생성 방법 및 시스템에 관한 것으로, 보다 상세하게는 사용자에 의해 입력된 타겟 단어를 이용하여 문장을 생성하고, 이에 대한 적합도를 평가하여 보다 자연스러운 문장이 출력되도록 하는 문장 생성 방법 및 시스템에 관한 것이다.The present invention relates to a method and system for generating a sentence including a target word, and more particularly, a sentence that generates a sentence using a target word input by a user, evaluates the suitability thereof, and outputs a more natural sentence It relates to a production method and system.

최근 들어서 영유아 언어학습 도구, 외국어 학습도구, 어휘력이나 문장구사력 향상 도구, 난독증이나 지적장애로 인한 학습 장애를 해소하기 위한 보조도구, 언어능력이 부족하더라도 과제 수행을 위한 문장 표현이 가능하도록 하는 도구, 및 보완 대체 의사소통 도구로서 활용될 수 있는 문장 형성 학습 도구들이 컴퓨터 프로그램 형태로서 개발되고 있다.Recently, language learning tools for infants and young children, foreign language learning tools, tools for improving vocabulary or sentence comprehension, auxiliary tools for resolving learning disabilities caused by dyslexia or intellectual disabilities, tools for enabling sentence expression for task performance even if language skills are lacking, And sentence formation learning tools that can be used as complementary and alternative communication tools are being developed in the form of computer programs.

이와 같은 종래의 문장 형성 방법은 정해진 틀 안에서만 문장을 보여주기 때문에 새로운 문장의 적극적인 생성이 아닌 주어진 문장의 빈칸 채우기 연습에 국한되어 있다는 문제점이 있다.Such a conventional sentence formation method has a problem in that it is limited to the practice of filling in the blanks of a given sentence, not the active creation of a new sentence, because the sentence is displayed only within a predetermined frame.

최근에는 키워드에 따라 입력된 키워드와 각각 매치될 수 있는 전후 관계의 단어 리스트가 자동으로 제시 및 재검색될 수 있도록 함으로써 언어 교육 효과를 향상시킬 수 있는 언어 문장 패턴 제안을 위한 생성 시스템이 공개되기도 하였다. Recently, a generation system for proposing a language sentence pattern that can improve the effect of language education by allowing a list of contextual words that can be matched with each keyword input according to the keyword can be automatically presented and searched again has been disclosed.

한국 제10-1670326호(2016.11.10. 공고)Korea No. 10-1670326 (2016.11.10. Announcement)

본 발명이 해결하려는 과제는, 타겟 단어가 포함된 문장 생성 방법 및 시스템을 제공하는데, 그 목적이 있다.SUMMARY OF THE INVENTION An object of the present invention is to provide a method and system for generating a sentence including a target word.

보다 상세하게는 본 발명이 해결하려는 과제는, 사용자에 의해 입력된 타겟 단어를 이용하여 문장을 생성하고, 이에 대한 적합도를 평가하여 보다 자연스러운 문장이 출력되도록 하는 문장 생성 방법 및 시스템을 제공하는데, 그 목적이 있다.More specifically, the problem to be solved by the present invention is to provide a sentence generation method and system for generating a sentence using a target word input by a user and evaluating its suitability to output a more natural sentence, the There is a purpose.

본 발명의 일례에 따른 타겟 단어가 포함된 문장 생성 방법은 시스템이 사용자로부터 적어도 하나의 타겟 단어를 입력받는 단계, 상기 시스템이 크롤링 또는 자체 생성 데이터에 기반하여 생성된 학습 모델을 이용하여 상기 적어도 하나의 타겟 단어를 포함하는 적어도 하나의 문장을 생성 결과로 생성하는 단계, 상기 시스템이 상기 생성 결과에 대해 적합도를 평가하는 단계 및 상기 시스템이 상기 적합도에 대한 평가 결과를 출력하는 단계를 포함하고, 상기 평가하는 단계에서는 상기 생성 결과에 따른 문장 각각의 길이, 문장의 개수 및 각 문장 내에서 서로 인접한 두 단어 사이의 연관도-크롤링 또는 자체 생성 데이터에 기반하여 서로 인접한 두 단어가 실제로 인접하여 사용되는 빈도-에 기초하여 상기 적어도 하나의 문장의 적합도를 평가한다.In the method for generating a sentence including a target word according to an example of the present invention, the system receives at least one target word from a user, the system uses a learning model generated based on crawling or self-generated data, and the at least one generating at least one sentence including a target word of In the evaluation step, the length of each sentence according to the generation result, the number of sentences, and the degree of association between two words adjacent to each other in each sentence- The frequency at which two adjacent words are actually used adjacently based on crawling or self-generated data - The suitability of the at least one sentence is evaluated based on

상기 평가하는 단계에서는 상기 적어도 하나의 문장 각각의 길이가 미리 설정된 임계 범위 이내인 경우 상기 적합도에 대한 점수를 중간값보다 높게 평가하고, 상기 적어도 하나의 문장 각각의 길이가 상기 임계 범위 밖에 있는 경우 상기 적합도에 대한 점수를 상기 중간값보다 낮게 평가하거나 부적합으로 평가할 수 있다.In the evaluating, when the length of each of the at least one sentence is within a preset threshold range, the score for the fitness is evaluated higher than the median value, and when the length of each of the at least one sentence is outside the threshold range, the The score for the goodness of fit may be evaluated lower than the median value or evaluated as non-conformity.

상기 평가하는 단계에서는 상기 생성 결과에 따른 상기 적어도 하나의 문장 개수가 미리 설정된 임계 범위 이내인 경우 상기 적합도에 대한 점수를 중간값보다 높게 평가하고, 상기 생성 결과에 따른 상기 적어도 하나의 문장 개수가 상기 임계 범위보다 많은 경우 상기 적합도에 대한 점수를 상기 중간값보다 낮게 낮게 평가하거나 부적합으로 평가할 수 있다.In the evaluating, when the number of the at least one sentence according to the generation result is within a preset threshold range, the score for the suitability is evaluated to be higher than a median value, and the number of the at least one sentence according to the generation result is the If it is more than the threshold range, the score for the goodness of fit may be evaluated to be lower than the median value or evaluated as inappropriate.

상기 평가하는 단계에서는 상기 생성 결과에 따른 각 문장에 포함된 모든 단어 각각에 대해 서로 인접한 두 단어 사이의 연관도에 대한 점수를 산출하고, 상기 연관도에 대한 점수에 기초하여 상기 적합도에 대한 점수를 평가할 수 있다.In the evaluating step, a score for the degree of relevance between two adjacent words is calculated for each word included in each sentence according to the generation result, and a score for the fitness is calculated based on the score for the degree of relevance. can be evaluated

상기 평가하는 단계에서는 상기 적어도 하나의 문장 각각에서 상기 타겟 단어에 앞 또는 뒤에 단어가 부가되어 상기 타겟 단어의 의미가 변경된 경우, 상기 적합도에 대한 점수를 상기 중간값보다 낮게 낮게 평가하거나 부적합으로 평가할 수 있다.In the evaluating step, when the meaning of the target word is changed by adding a word before or after the target word in each of the at least one sentence, the score for the suitability may be evaluated lower than the median value or evaluated as inappropriate. have.

상기 평가하는 단계에서는 상기 적어도 하나의 문장 각각에서 상기 타겟 단어의 개수 및 상기 생성 결과를 생성하는 단계에서 생성되되 상기 타겟 단어와 동일한 품사를 갖는 제1 비타겟 단어의 개수에 기초하여 상기 적어도 하나의 문장 각각의 적합도를 평가할 수 있다.In the evaluating, the number of the target words in each of the at least one sentence and the number of the first non-target words generated in the generating step and having the same part-of-speech as the target word are selected based on the number of the target words. The suitability of each sentence can be evaluated.

상기 평가하는 단계에서는 상기 적어도 하나의 문장 각각에서 상기 제1 비타겟 단어의 개수가 상기 타겟 단어의 개수보다 작으면 상기 적합도에 대한 점수를 중간값보다 높게 평가하고, 상기 적어도 하나의 문장 각각에서 상기 제1 비타겟 단어의 개수가 상기 타겟 단어의 개수보다 많으면 상기 적합도에 대한 점수를 상기 중간값보다 낮게 평가할 수 있다.In the evaluating, if the number of the first non-target words in each of the at least one sentence is less than the number of the target words, the score for the fitness is evaluated to be higher than the median value, and in each of the at least one sentence When the number of first non-target words is greater than the number of target words, the score for the fitness may be evaluated to be lower than the median value.

상기 평가하는 단계에서는 상기 적어도 하나의 문장 각각에서 상기 타겟 단어의 개수에 따라 상기 제1 비타겟 단어의 개수에 대한 허용 임계치를 달리하여 평가할 수 있다.In the evaluating, an acceptance threshold for the number of the first non-target words may be different and evaluated according to the number of the target words in each of the at least one sentence.

상기 평가하는 단계에서는 상기 적어도 하나의 문장 각각에서 상기 제1 비타겟 단어와 상기 타겟 단어 사이의 개념 유사도에 따라 상기 적합도에 대한 점수를 달리하여 평가할 수 있다.In the evaluating, a score for the suitability may be differently evaluated according to a degree of conceptual similarity between the first non-target word and the target word in each of the at least one sentence.

상기 평가하는 단계에서는 상기 적어도 하나의 문장 각각에서, 상기 제1 비타겟 단어가 상기 타겟 단어의 상위 개념, 하위 개념, 대비 개념 및 유사 개념 중 적어도 하나에 포함되는 경우 상기 개념 유사도가 있는 것으로 판단하여 상기 적합도에 대한 점수를 상기 중간값의 범위로 평가하고, 상기 개념 유사도가 없는 경우 상기 적합도에 대한 점수를 상기 중간값보다 낮게 평가할 수 있다.In the evaluating step, in each of the at least one sentence, if the first non-target word is included in at least one of a higher concept, a lower concept, a contrast concept, and a similar concept of the target word, it is determined that there is the concept similarity. The score for the fitness may be evaluated within the range of the median value, and if there is no concept similarity, the score for the fitness may be evaluated to be lower than the median value.

상기 평가하는 단계에서는 상기 적어도 하나의 문장 각각에서 상기 타겟 단어가 명사이고 상기 제1 비타겟 단어가 대명사 또는 의존 명사인 경우, 상기 대명사 또는 의존 명사는 상기 적합도 평가 대상에서 제외될 수 있다.In the evaluating, when the target word is a noun and the first non-target word is a pronoun or a dependent noun in each of the at least one sentence, the pronoun or the dependent noun may be excluded from the relevance evaluation target.

상기 적어도 하나의 타겟 단어가 제1 품사를 갖는 제1 타겟 단어와 상기 제1 품사와 다른 제2 품사를 갖는 제2 타겟 단어를 포함하는 경우, 상기 평가하는 단계에서는 상기 제1 타겟 단어에 대한 제1 비타겟 단어의 개수 또는 상기 제2 타겟 단어 대한 제1 비타겟 단어의 개수에 대한 허용 임계치가 증가할 수 있다.When the at least one target word includes a first target word having a first part-of-speech and a second target word having a second part-of-speech different from the first part-of-speech, in the evaluating step, the first target word for the first target word An acceptance threshold for the number of first non-target words or the number of first non-target words for the second target word may increase.

상기 제1 타겟 단어가 명사이고, 상기 제2 타겟 단어는 명사를 제외한 다른 품사인 경우, 상기 평가하는 단계에서는 상기 제1 타겟 단어에 대한 제1 비타겟 단어의 개수에 대한 허용 임계치를 증가시켜 평가할 수 있다.When the first target word is a noun and the second target word is a part-of-speech other than a noun, in the evaluating step, the allowable threshold for the number of first non-target words for the first target word is increased to evaluate can

상기 제1 타겟 단어가 명사이고 상기 제2 타겟 단어가 명사를 제외한 다른 품사인 경우, 상기 평가하는 단계에서는 상기 적어도 하나의 문장 각각의 내에서 상기 제1 타겟 단어와 상기 제2 타겟 단어 사이의 이격 거리에 따라 상기 적합도 평가를 달리할 수 있다.When the first target word is a noun and the second target word is a part-of-speech other than a noun, in the evaluating step, a space between the first target word and the second target word in each of the at least one sentence Depending on the distance, the fitness evaluation may be different.

상기 평가 결과를 출력하는 단계에서는 상기 생성 결과에 따른 적어도 하나의 문장과, 상기 적어도 하나의 문장 각각에 대한 적합도 점수가 표시될 수 있다.In the step of outputting the evaluation result, at least one sentence according to the generation result and a suitability score for each of the at least one sentence may be displayed.

상기 평가 결과를 출력하는 단계 이후, 상기 시스템이 상기 적합도를 판단한 결과에 대한 적절성에 대한 판단을 입력받는 단계 및 상기 시스템이 상기 적절성 판단의 결과에 따라 가중치를 부여하고, 상기 가중치에 따라 상기 적어도 하나의 문장을 크롤링 또는 자체 생성 데이터에 업데이트하거나 문장 생성 가중치 및 임계치를 조정하는 단계를 더 포함할 수 있다.After the step of outputting the evaluation result, the system receives a judgment on the suitability of the result of determining the suitability, and the system assigns a weight according to the result of the suitability judgment, and the at least one of the at least one The method may further include updating the sentences in crawled or self-generated data or adjusting sentence generation weights and thresholds.

상기 생성 결과로 출력되는 상기 적어도 하나의 문장은 제1 문장과 제2 문장을 포함하고, 상기 평가 결과를 출력하는 단계에서 상기 제1 문장에 대한 적합도 점수가 상기 제2 문장에 대한 적합도 점수보다 높게 나왔으나, 상기 판단을 입력받는 단계에서, 상기 사용자의 인터랙션에 의하여 상기 제2 문장이 선택되어 입력되는 경우, 상기 업데이트하는 단계에서, 상기 시스템은 상기 제2 문장이 상기 제1 문장에 대한 빈도보다 높은 빈도를 갖도록 가중치를 부여하여, 크롤링 또는 자체 생성 데이터에 업데이트하거나 문장 생성 가중치 및 임계치를 조정할 수 있다.The at least one sentence output as a result of the generation includes a first sentence and a second sentence, and in the step of outputting the evaluation result, the suitability score for the first sentence is higher than the suitability score for the second sentence However, in the step of receiving the judgment, when the second sentence is selected and input by the user's interaction, in the updating step, the system determines that the second sentence is more frequent than the first sentence By weighting it to have a high frequency, it is possible to update crawled or self-generated data, or to adjust sentence generation weights and thresholds.

본 발명의 일례에 따른 타겟 단어가 포함된 문장 생성 시스템은 메모리 및 상기 메모리와 연결되고, 상기 메모리에 포함된 명령들을 실행하도록 구성된 프로세서를 포함하고, 상기 프로세서가 사용자로부터 입력된 적어도 하나의 타겟 단어를 이용하여 크롤링 또는 자체 생성 데이터에 기반하여 생성된 학습 모델을 이용하여 상기 적어도 하나의 타겟 단어를 포함하는 적어도 하나의 문장에 대한 생성 결과를 출력하도록 제어하고, 상기 생성 결과에 대해 적합도를 평가하고, 상기 적합도에 대한 평가 결과를 출력하되, 상기 적합도의 평가는 상기 생성 결과에 따른 문장 각각의 길이, 문장의 개수 및 각 문장 내에서 서로 인접한 두 단어 사이의 연관도-크롤링 또는 자체 생성 데이터에 기반하여 서로 인접한 두 단어가 실제로 인접하여 사용되는 빈도-에 기초하여 상기 적어도 하나의 문장의 적합도를 평가한다.A system for generating a sentence including a target word according to an example of the present invention includes a memory and a processor connected to the memory and configured to execute instructions included in the memory, wherein the processor includes at least one target word input from a user control to output a generation result for at least one sentence including the at least one target word using a learning model generated based on crawling or self-generated data using , output the evaluation result of the fitness, wherein the evaluation of the fitness is based on the length of each sentence according to the generation result, the number of sentences, and the correlation between two words adjacent to each other in each sentence-crawl or self-generated data Thus, the suitability of the at least one sentence is evaluated based on the frequency in which two words adjacent to each other are actually used adjacent to each other.

본 발명은 생성된 문장에 대해 문장의 길이, 개수 및 인접 단어 사이의 연관도를 고려하여 적합도를 평가함으로써, 보다 자연스러운 문장이 출력되도록 할 수 있다.According to the present invention, a more natural sentence can be output by evaluating the suitability of the generated sentence in consideration of the length and number of sentences and the degree of association between adjacent words.

또한, 생성된 문장에 대해 적합도 평가에 대한 항목을 추가함으로써, 더욱 자연스럽고 실생활에서 사용 빈도가 높은 문장이 출력되도록 할 수 있다.In addition, by adding an item for the fitness evaluation to the generated sentence, it is possible to output a sentence that is more natural and frequently used in real life.

도 1은 본 발명의 일례에 따른 타겟 단어가 포함된 문장 생성 시스템의 일례를 설명하기 위한 도이다.
도 2는 본 발명의 제1 실시예에 따른 타겟 단어가 포함된 문장 생성 방법을 설명하기 위한 도이다.
도 3은 도 2에서 생성 결과에 대한 적합도를 평가하는 단계에서의 평가 항목을 설명하기 위한 도이다.
도 4a는 도 3에서의 평가 항목 중 (1) 내지 (4)에 대한 구체적인 일례를 설명하기 위한 도이다.
도 4b는 도 3에서의 평가 항목 중 (5)에 대한 구체적인 일례를 설명하기 위한 도이다.
도 5는 본 발명의 제2 실시예에 따른 타겟 단어가 포함된 문장 생성 방법을 설명하기 위한 도이다.1 is a diagram for explaining an example of a sentence generation system including a target word according to an example of the present invention.
2 is a diagram for explaining a method of generating a sentence including a target word according to the first embodiment of the present invention.
FIG. 3 is a diagram for explaining evaluation items in the step of evaluating the fitness for the generation result in FIG. 2 .
4A is a diagram for explaining a specific example of (1) to (4) among the evaluation items in FIG. 3 .
4B is a diagram for explaining a specific example of (5) among the evaluation items in FIG. 3 .
5 is a diagram for explaining a method of generating a sentence including a target word according to a second embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Hereinafter, the embodiments disclosed in the present specification will be described in detail with reference to the accompanying drawings, but the same or similar components are assigned the same reference numbers regardless of reference numerals, and redundant description thereof will be omitted. In addition, in describing the embodiments disclosed in the present specification, if it is determined that detailed descriptions of related known technologies may obscure the gist of the embodiments disclosed in this specification, the detailed description thereof will be omitted.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms including an ordinal number such as 1st, 2nd, etc. may be used to describe various elements, but the elements are not limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. The singular expression includes the plural expression unless the context clearly dictates otherwise.

본 출원에서, 설명되는 각 단계들은 특별한 인과관계에 의해 나열된 순서에 따라 수행되어야 하는 경우를 제외하고, 나열된 순서와 상관없이 수행될 수 있다.In the present application, each of the steps described may be performed regardless of the listed order, except for a case where they must be performed in the listed order due to a special causal relationship.

본 출원에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In the present application, terms such as “comprises” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

이하, 첨부된 도면들을 참조하여 본 발명에 대해 설명한다.Hereinafter, the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명의 일례에 따른 타겟 단어가 포함된 문장 생성 시스템의 일례를 설명하기 위한 도이다.1 is a diagram for explaining an example of a sentence generation system including a target word according to an example of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일례에 따른 타겟 단어가 포함된 문장 생성 시스템은 프로세서(100), 메모리(200), 입력부(300) 및 출력부(400)를 포함할 수 있다.1 , the system for generating a sentence including a target word according to an example of the present invention may include a processor 100 , a memory 200 , an input unit 300 , and an output unit 400 .

도 1에 도시된 본 발명의 일례에 따른 시스템은 스마트폰(smart phone), 휴대폰, 태블릿 PC, 컴퓨터, 노트북, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player) 등을 포함할 수 있다. The system according to an example of the present invention shown in FIG. 1 may include a smart phone, a mobile phone, a tablet PC, a computer, a notebook computer, a personal digital assistant (PDA), a portable multimedia player (PMP), and the like.

도 1에서는 시스템은 인터넷 네트워크를 통하여 공간적으로 이격되는 외부 단말과 통신을 수행할 수 있는 통신 모듈이 도시되지는 않았지만, 경우에 따라서 통신 모듈이 구비될 수도 있다.Although a communication module capable of performing communication with an external terminal spatially separated through the Internet network is not shown in the system in FIG. 1, a communication module may be provided in some cases.

아울러, 도 1에서는 입력부(300)와 출력부(400)를 함께 구비한 경우를 일례로 도시하였으나, 본 발명은 이에 한정되는 것은 아니고, 도 1의 시스템에서 입력부(300)와 출력부(400)가 생략되고, 통신 모듈이 구비될 수도 있다. 이와 같은 경우, 입력부(300)와 출력부(400)가 인터넷 네트워크를 통하여 시스템에 연동되는 외부 단말에 구비되는 것도 가능하다. In addition, although the case in which the input unit 300 and the output unit 400 are provided together is illustrated as an example in FIG. 1 , the present invention is not limited thereto, and in the system of FIG. 1 , the input unit 300 and the output unit 400 . is omitted, and a communication module may be provided. In this case, it is also possible that the input unit 300 and the output unit 400 are provided in an external terminal that is linked to the system through the Internet network.

이하에서는 설명의 편의상 도 1과 같이, 본 발명의 시스템이 메모리(200), 프로세서(100), 입력부(300) 및 출력부(400)를 모두 구비한 경우를 일례로 설명한다.Hereinafter, for convenience of description, as shown in FIG. 1 , a case in which the system of the present invention includes all of the memory 200 , the processor 100 , the input unit 300 , and the output unit 400 will be described as an example.

메모리(200)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있으며, 인터넷(internet)상에서 메모리(200)의 저장 기능을 수행하는 웹 스토리지(webstorage) 형태도 포함될 수 있다.The memory 200 is a computer-readable recording medium, and may include a random access memory (RAM), a read only memory (ROM), and a non-volatile mass storage device such as a disk drive, and the Internet ( A form of web storage that performs a storage function of the memory 200 on the internet) may also be included.

또한, 메모리(200)에는 운영체제나 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(200)와는 별도의 컴퓨터에서 판독 가능한 기록 매체로부터 로딩될 수 있다. Also, an operating system or at least one program code may be stored in the memory 200 . These software components may be loaded from a computer-readable recording medium separate from the memory 200 .

이러한 별도의 컴퓨터에서 판독 가능한 기록 매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리(200) 카드 등의 컴퓨터에서 판독 가능한 기록 매체를 포함할 수 있다. The separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory 200 card.

이와 같은 메모리(200)에는 일례로, 크롤링 또는 자체 생성 데이터에 기반하여 생성된 학습 모델에 대한 프로그램이 저장될 수 있으며, 타겟 단어가 포함된 문장을 생성하는 프로그램이 저장될 수 있다. 또한, 메모리(200)에는 시스템의 동작 중에 발생되는 중간 결과값 또는 최종 결과값(에, 문장 생성 결과, 적합도 평가 결과 등)이 저장될 수 있다.The memory 200 may store, for example, a program for a learning model generated based on crawling or self-generated data, and a program for generating a sentence including a target word. Also, the memory 200 may store an intermediate result value or a final result value (eg, a sentence generation result, a suitability evaluation result, etc.) generated during the operation of the system.

여기서, 타겟 단어라 함은 사용자에 의해 입력된 단어를 의미하며, 단어는 명사, 형용사, 부사 등과 같이 다양한 품사를 가질 수 있다.Here, the target word means a word input by a user, and the word may have various parts-of-speech such as a noun, an adjective, an adverb, and the like.

또한, 크롤링 또는 자체 생성 데이터라 함은 주로 인터넷 상의 웹페이지 또는 문서 등을 수집하여, 웹페이지 또는 문서에 포함된 문장을 의미하거나, 인터넷을 통하지 않고 자체적으로 수집한 데이터를 의미할 수 있다. 본 발명의 일례에 따른 시스템의 메모리(200)에는 이와 같이, 인터넷 상에서 실제로 사용되는 대량의 문장이 저장될 수 있다. In addition, crawled or self-generated data may mean a sentence included in a web page or document by mainly collecting web pages or documents on the Internet, or may refer to data collected by itself without going through the Internet. A large amount of sentences actually used on the Internet may be stored in the memory 200 of the system according to an example of the present invention.

프로세서(100)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(200) 또는 통신 모듈(미도시)에 의해 프로세서(100)로 제공될 수 있다. 예를 들어 프로세서(100)는 메모리(200)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 100 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. The command may be provided to the processor 100 by the memory 200 or a communication module (not shown). For example, the processor 100 may be configured to execute a received instruction according to a program code stored in a recording device such as the memory 200 .

이에 따라, 프로세서(100)는 사용자에 의해 입력된 타겟 단어를 이용하여 크롤링 또는 자체 생성 데이터에 기반하여 생성된 학습 모델을 이용하여, 타겟 단어가 포함된 문장을 생성 결과로 생성할 수 있다. Accordingly, the processor 100 may generate a sentence including the target word as a generation result by using a learning model generated based on crawling or self-generated data using the target word input by the user.

또한, 프로세서(100)는 생성 결과에 따른 문장에 대해 문장에 대한 적합도를 평가하고, 적합도에 대판 평가 결과를 출력할 수 있다. In addition, the processor 100 may evaluate the suitability of the sentence with respect to the sentence according to the generation result, and may output the standard evaluation result to the suitability.

여기서, 적합도라 함은 생성된 문장이 적정한 길이로 생성되어 있는지, 생성 결과로서 생성된 문장의 개수는 적절한지, 생성된 문장에 타겟 단어가 적절하게 표현되어 있는지, 생성된 문장이 실제로 많이 사용되는 문장인지 등에 대한 적합성을 의미할 수 있다.Here, the fitness refers to whether the generated sentence is generated with an appropriate length, whether the number of generated sentences as a result of the generation is appropriate, whether the target word is appropriately expressed in the generated sentence, and whether the generated sentence is actually used a lot. It may mean suitability for whether it is a sentence or the like.

입력부(300) 는 사용자로부터 타겟 단어의 입력을 받는 기능을 수행하며, 일례로, 입력부(300)는 키보드, 마우스, 터치 센서, 마이크와 같은 물리적인 입력 장치를 포함할 수 있다.The input unit 300 performs a function of receiving an input of a target word from a user. For example, the input unit 300 may include a physical input device such as a keyboard, a mouse, a touch sensor, and a microphone.

출력부(400)는 생성된 문장 중 적합성이 인정되는 문장을 디스플레이하는 기능을 수행하며, 일례로, 모니터, 터치 스크린, 스피커와 같은 물리적 디스플레이 장치를 포함할 수 있다. The output unit 400 performs a function of displaying a sentence recognized as appropriate among the generated sentences, and may include, for example, a physical display device such as a monitor, a touch screen, and a speaker.

도 2는 본 발명의 제1 실시예에 따른 타겟 단어가 포함된 문장 생성 방법을 설명하기 위한 도이다.2 is a diagram for explaining a method of generating a sentence including a target word according to the first embodiment of the present invention.

본 발명의 제1 실시예에 따른 문장 생성 방법은 도 2의 (a)에 도시된 바와 같이, 입력받는 단계(S100), 생성하는 단계(S200), 평가하는 단계(S300) 및 출력하는 단계(S400)를 포함할 수 있다.The method for generating a sentence according to the first embodiment of the present invention includes the steps of receiving an input (S100), generating (S200), evaluating (S300), and outputting (S300) as shown in FIG. S400) may be included.

입력받는 단계(S100)에서는 도 2의 (a)와 같이, 시스템이 사용자로부터 타겟 단어를 입력받을 수 있다. 여기서, 시스템이 입력받는 타겟 단어는 적어도 하나일 수 있다. 따라서, 사용자로부터 입력받는 타겟 단어는 하나 또는 복수 개일 수 있다.In the receiving step ( S100 ), as shown in FIG. 2 ( a ), the system may receive a target word from the user. Here, the target word input to the system may be at least one. Accordingly, there may be one or a plurality of target words input by the user.

일례로, 도 2의 (a)와 같이, 입력받는 단계(S100)에서 시스템은 입력부(300)를 통해 사용자로부터 여행(T1), 맥주(T2)와 같은 2개의 타겟 단어를 입력받을 수 있다. For example, as shown in FIG. 2A , in the receiving step S100 , the system may receive two target words such as travel T1 and beer T2 from the user through the input unit 300 .

생성하는 단계(S200)에서는, 도 2의 (a)와 같이, 시스템이 크롤링 또는 자체 생성 데이터에 기반하여 생성된 학습 모델을 이용하여 적어도 하나의 타겟 단어를 포함하는 적어도 하나의 문장을 생성 결과로 생성할 수 있다.In the generating step (S200), as shown in FIG. 2 (a), the system generates at least one sentence including at least one target word using a learning model generated based on crawling or self-generated data as a result of generating can create

일례로, 도 2의 (b)와 같이, 생성하는 단계(S200)에서 시스템은, 타겟 단어인 여행(T1), 맥주(T2)가 포함된 문장으로, “여행 가서 마시는 맥주는 맛있다.”를 생성 결과로서 생성할 수 있다.As an example, as shown in Fig. 2 (b), in the generating step (S200), the system, in a sentence including the target words travel (T1) and beer (T2), “beer to drink while traveling is delicious.” It can be created as a result of creation.

도 2의 (b)에서는 생성 결과로 생성된 문장이 하나인 경우를 일례로 도시하였으나, 본 발명은 이에 한정되는 것은 아니고, 생성 결과로 생성된 문장이 복수 개일 수도 있다. In FIG. 2B , a case in which one sentence is generated as a result of generation is illustrated as an example, but the present invention is not limited thereto, and a plurality of sentences generated as a result of generation may be used.

다음, 도 2의 (a)와 같이, 적합도를 평가하는 단계(S300)에서는 시스템이 생성 결과에 대해 적합도를 평가할 수 있다.Next, as shown in FIG. 2A , in the step of evaluating the fitness ( S300 ), the system may evaluate the fitness for the generated result.

예를 들어, 적합도를 평가하는 단계(S300)에서는 생성 결과에 따른 (1) 문장 각각의 길이, (2) 문장의 개수 및 (3) 각 문장 내에서 서로 인접한 두 단어 사이의 연관도에 기초하여, 생성 결과로 생성된 적어도 하나의 문장 각각에 대한 적합도를 평가할 수 있다.For example, in the step of evaluating the fitness ( S300 ), based on (1) the length of each sentence, (2) the number of sentences, and (3) the degree of association between two adjacent words in each sentence according to the generation result , it is possible to evaluate the suitability of each of the at least one sentence generated as a result of the generation.

(1)에 대한 일례로, 평가하는 단계(S300)에서는 적어도 하나의 문장 각각의 길이가 미리 설정된 임계 범위(예, 10단어 이하) 이내인 경우 적합도에 대한 점수를 중간값보다 높게 평가하고, 적어도 하나의 문장 각각의 길이가 임계 범위 밖에 있는 경우 적합도에 대한 점수를 중간값보다 낮게 평가할 수 있다. As an example for (1), in the evaluating step (S300), if the length of each of the at least one sentence is within a preset threshold range (eg, 10 words or less), the score for the fitness is evaluated higher than the median value, and at least When the length of each sentence is outside the critical range, the score for the fitness may be evaluated lower than the median value.

반대로 평가하는 단계(S300)에서는 적어도 하나의 문장 각각의 길이가 미리 설정된 임계 범위(예, 10단어 이상) 이상인 경우 적합도에 대한 점수를 중간값보다 높게 평가하고, 적어도 하나의 문장 각각의 길이가 임계 범위 보다 적을 경우 적합도에 대한 점수를 중간값보다 낮게 평가할 수 있다.Conversely, in the step of evaluating ( S300 ), if the length of each of the at least one sentence is greater than or equal to a preset threshold range (eg, more than 10 words), the score for the fitness is evaluated to be higher than the median value, and the length of each of the at least one sentence is the threshold If it is less than the range, the score for goodness of fit may be evaluated lower than the median value.

여기서, 문장의 길이에 대한 중간값은 사전에 미리 설정될 수 있다. 여기서, 문장 각각의 길이는 문장에 포함된 단어의 개수를 연산하여 계산할 수 있다.Here, the median value for the length of the sentence may be preset in advance. Here, the length of each sentence may be calculated by calculating the number of words included in the sentence.

또는, 문장 각각의 길이가 임계 범위 조건에 따라 특정값 이하의 점수를 받을 경우, 부적합 판정을 내릴수도 있으며, 이와 같은 경우, 출력하는 단계(S400)에서 부적합 판정을 받은 문장이 출력되지 않을 수도 있고, 이와 같은 경우 생성하는 단계(S200)가 다시 수행되어 문장이 다시 생성될 수도 있다.Alternatively, when the length of each sentence receives a score of less than or equal to a specific value according to the threshold range condition, an inappropriate determination may be made. , in such a case, the generating step ( S200 ) may be performed again to generate the sentence again.

또한, (2)에 대한 일례로, 평가하는 단계(S300)에서는 생성 결과에 따른 적어도 하나의 문장 개수가 문장의 개수에 대한 미리 설정된 임계 범위(예, 2개 이하) 이내인 경우 적합도에 대한 점수를 중간값보다 높게 평가하고, 생성 결과에 따른 적어도 하나의 문장 개수가 임계 범위보다 많은 경우 적합도에 대한 점수를 중간값보다 낮게 평가할 수도 있다. In addition, as an example of (2), in the evaluating step (S300), if the number of at least one sentence according to the generation result is within a preset threshold range (eg, 2 or less) for the number of sentences, the score for the fitness may be evaluated higher than the median value, and when the number of at least one sentence according to the generation result is greater than the threshold range, the fitness score may be evaluated lower than the median value.

여기서, 하나의 생성 결과에 포함된 문장의 개수는 마침표의 개수 또는 종결 어미의 개수를 계산하여 산출할 수 있다.Here, the number of sentences included in one generation result may be calculated by calculating the number of periods or ending endings.

만약, 평가하는 단계(S300)에서 하나의 생성 결과에 따른 문장의 개수(예, 5개)가 임계 범위(예, 2개)보다 큰 경우, 각 문장에 포함된 각 단어의 연관도를 연산하여 연관도가 점수가 상대적으로 높은 2개의 문장만 출력하는 단계(S400)에서 출력되고, 연관도 점수가 상대적으로 낮은 3개의 문장은 출력하는 단계(S400)에서 출력되지 않을 수 있다.If, in the evaluation step (S300), the number of sentences (eg, 5) according to one generation result is greater than the threshold range (eg, 2), by calculating the degree of relevance of each word included in each sentence Only two sentences having a relatively high relevance score are output in step S400, and three sentences having a relatively low relevance score may not be output in output S400.

또한, (3)에 대한 일례로, 평가하는 단계(S300)에서는 생성 결과에 따른 각 문장에 포함된 모든 단어 각각에 대해 서로 인접한 두 단어 사이의 연관도에 대한 점수를 산출하고, 연관도에 대한 점수에 기초하여 적합도에 대한 점수를 평가할 수 있다. 여기서, 연관도라 함은 크롤링 또는 자체 생성 데이터에 기반하여 서로 인접한 두 단어가 실제로 인접하여 사용되는 빈도 또는 확률을 의미할 수 있다.In addition, as an example of (3), in the evaluation step (S300), a score for the degree of relevance between two adjacent words is calculated for each of all words included in each sentence according to the generation result, and the degree of relevance is calculated. Based on the score, a score for goodness of fit may be evaluated. Here, the degree of relevance may mean a frequency or probability that two words adjacent to each other are actually used adjacent to each other based on crawled or self-generated data.

여기서, 단어라 함은 띄어쓰기, 각각의 글자, 형태소 및 특정한 기준으로 묶인 글자를 의미할 수 있다.Here, the word may mean a space, each letter, morpheme, and letters grouped by a specific standard.

예를 들어, 평가하는 단계(S300)에서는 도 2의 (b)에서, 여행(1)과 가서(2) 사이의 연관도, 가서(2)와 마시는(3) 사이의 연관도 등 문장 내에서 서로 인접한 두 단어가 실제로 크롤링 또는 자체 생성 데이터에서 사용되는 빈도 또는 확률에 기초하여, 각 단어 사이의 연관도를 산출하고, 이를 평균하여 해당 문장의 연관도에 대한 연관도 점수를 산출할 수 있다.For example, in the evaluation step (S300), in Fig. 2 (b), the relationship between the trip (1) and the go (2), the relationship between the go (2) and the drinking (3), etc. within the sentence Based on the frequency or probability that two adjacent words are actually used in crawled or self-generated data, a degree of association between each word may be calculated, and a relevance score for the degree of relevance of the corresponding sentence may be calculated by averaging them.

출력하는 단계(S400)에서는, 시스템이 적합도에 대한 평가 결과를 출력할 수 있다. 예를 들어, 도 2의 (b)와 같이, 출력하는 단계(S400)에서 출력될 때, 생성 결과에 따른 적어도 하나의 문장과, 적어도 하나의 문장 각각에 대한 적합도 점수가 표시될 수 있다.In the step of outputting ( S400 ), the system may output the evaluation result for the fitness. For example, as shown in (b) of FIG. 2 , when outputting in the step of outputting ( S400 ), at least one sentence according to the generation result and a fitness score for each of the at least one sentence may be displayed.

이때, 출력되는 문장은 평가하는 단계(S300)에서 적합으로 평가받은 문장이 출력될 수 있다.In this case, the output sentence may be outputted as the sentence evaluated as suitable in the step of evaluating ( S300 ).

보다 구체적 일례로, 만약 생성하는 단계(S200)에서 하나의 생성 결과로 생성된 문장이 하나였으나 평가하는 단계(S300)에서 부적합으로 평가된 경우, 해당 문장은 출력하는 단계(S400)에서 출력되지 않고, 시스템은 생성하는 단계(S200)를 다시 수행할 수 있다.As a more specific example, if there was one sentence generated as a result of one generation in the generating step (S200) but evaluated as inappropriate in the evaluating step (S300), the sentence is not output in the outputting step (S400) , the system may perform the generating step (S200) again.

아울러, 만약 생성하는 단계(S200)에서 하나의 생성 결과로 생성된 문장이 4 개였으나 평가하는 단계(S300)에서 2개의 문장만 적합으로 평가된 경우, 출력하는 단계(S400)에서 2개의 문장에 대해서만 적합도 점수가 표시되어 출력되고, 부적합으로 평가받은 나머지 2개의 문장은 출력되지 않을 수도 있다.In addition, if there were four sentences generated as a result of one generation in the generating step (S200), but only two sentences are evaluated as suitable in the evaluating step (S300), only for the two sentences in the outputting step (S400) The fitness score is displayed and output, and the remaining two sentences evaluated as non-conforming may not be output.

도 2에서는 평가하는 단계(S300)에서 적합도를 평가하는 항목이 3 가지인 경우를 일례로 설명하였으나, 본 발명은 이에 한정되는 것은 아니고, 보다 자연스럽고 실제 생활에서 많이 쓰이는 문장을 출력하기 위하여 적합도에 대한 평가 항목이 추가될 수도 있다. 이하에서는 이에 대해 설명한다.In FIG. 2, the case where there are three items for evaluating the degree of fitness in the step of evaluating (S300) has been described as an example, but the present invention is not limited thereto. Evaluation items may be added. Hereinafter, this will be described.

도 3은 도 2에서 생성 결과에 대한 적합도를 평가하는 단계(S300)에서의 평가 항목을 설명하기 위한 도이고, 도 4a는 도 3에서의 평가 항목 중 (1) 내지 (4)에 대한 구체적인 일례를 설명하기 위한 도이고, 도 4b는 도 3에서의 평가 항목 중 (5)에 대한 구체적인 일례를 설명하기 위한 도이다.3 is a diagram for explaining the evaluation items in the step S300 of evaluating the fitness for the generation result in FIG. 2 , and FIG. 4A is a specific example of (1) to (4) among the evaluation items in FIG. 3 . FIG. 4B is a diagram for explaining a specific example of (5) among the evaluation items in FIG. 3 .

도 3 이하에서는 앞선 도 1 내지 도 2에서 설명한 내용과 중복되는 내용에 대해서는 전술한 내용으로 대체하고, 다른 내용을 위주로 설명한다.3 or less, content overlapping with the content described in FIGS. 1 to 2 is replaced with the above content, and other content will be mainly described.

도 4a 및 도 4b에 기재된 바와 같이, 본 발명의 생성 결과에 대한 적합도를 평가하는 단계(S300)에서는 전술한 (1) 문장의 길이 고려, (2) 문장의 개수 고려, (3) 연관도 고려 이외에도 (4) 타겟 단어의 의미 변경 및 (5) 타겟 단어의 개수 대비 제1 비타겟 개수를 적합도의 평가 항목으로 고려할 수 있다.4A and 4B, in the step (S300) of evaluating the suitability for the generation result of the present invention, (1) considering the length of the sentences, (2) considering the number of sentences, (3) considering the degree of relevance In addition, (4) change in the meaning of the target word and (5) the first non-target number compared to the number of target words may be considered as evaluation items of suitability.

도 3 내지 도 4b에서 적합도에 대한 평가 항목 중 (1) 내지 (3)은 앞선 도 2에서 설명한 내용과 동일하므로 구체적인 설명은 생략한다.In FIGS. 3 to 4B , (1) to (3) of the evaluation items for the degree of fitness are the same as those described with reference to FIG. 2 , and thus a detailed description thereof will be omitted.

도 3와 같이, 평가하는 단계(S300)에서는 적어도 하나의 문장 각각에서 타겟 단어에 앞 또는 뒤에 단어가 부가되어 타겟 단어의 의미가 변경된 경우, 적합도에 대한 점수를 중간값보다 낮게 평가하거나 부적합으로 평가할 수 있다.As shown in FIG. 3 , in the evaluating step ( S300 ), when the meaning of the target word is changed by adding a word before or after the target word in each of at least one sentence, the score for the suitability is evaluated lower than the median value or evaluated as inappropriate. can

예를 들어, 도 4a와 같이, 타겟 단어로 서양 및 맥주가 입력된 경우, 생성 결과로 “대서양에서 맥주를 마셨다.”의 문장이 생성된 경우, 시스템은 타겟 단어인 서양 앞에 단어가 부가되어 대서양으로 표현되면서 의미가 변경된 경우, 해당 문장에 대해 낮은 적합도 점수를 부여하거나 부적합으로 평가할 수 있다.For example, as shown in FIG. 4A , when Western and beer are input as target words, and a sentence of “I drank beer in the Atlantic Ocean” is generated as a result of the generation, the system adds a word in front of the target word Western and Atlantic Ocean If the meaning is changed while being expressed as

또한, 도 3의 (5)와 같이, 평가하는 단계(S300)에서는 적어도 하나의 문장 각각에서 타겟 단어의 개수 및 생성 결과를 생성하는 단계(S200)에서 생성되되 제1 비타겟 단어의 개수에 기초하여 적어도 하나의 문장 각각의 적합도를 평가할 수 있다.In addition, as shown in FIG. 3 ( 5 ), in the evaluating step ( S300 ), the number of target words in each of the at least one sentence and the generation result are generated in the step ( S200 ) based on the number of first non-target words Thus, the suitability of each of the at least one sentence can be evaluated.

여기서, 제1 비타겟 단어는 생성된 문장 내에서 타겟 단어와 동일한 품사를 갖되, 사용자에 의해 입력되지 않고 시스템에 의해 생성된 단어를 의미할 수 있다.Here, the first non-target word may mean a word that has the same part-of-speech as the target word in the generated sentence, but is not input by the user but is generated by the system.

아울러, 생성된 문장 내에서 시스템에 의해 생성되되 타겟 단어와 다른 품사를 갖는 단어도 존재할 수 있는데, 이와 같은 경우 제2 비타겟 단어로 정의될 수 있다. 제2 비타겟 단어의 경우 적합도 평가 대상에서 제외될 수 있다.In addition, a word generated by the system in the generated sentence and having a different part-of-speech from the target word may exist, and in this case, it may be defined as the second non-target word. The second non-target word may be excluded from the suitability evaluation target.

이와 같이, 도 3의 (5)와 같이, 문장 내에서 타겟 단어의 개수 대비 제1 비타겟 단어의 개수를 평가 항목으로 추가할 수 있다.In this way, as shown in (5) of FIG. 3 , the number of first non-target words compared to the number of target words in the sentence may be added as an evaluation item.

이때, 평가하는 단계(S300)에서는 적어도 하나의 문장 각각에서 제1 비타겟 단어의 개수가 타겟 단어의 개수보다 작으면 적합도에 대한 점수를 중간값보다 높게 평가할 수 있다. In this case, in the evaluating step ( S300 ), if the number of first non-target words in each of the at least one sentence is smaller than the number of target words, the score for fitness may be evaluated to be higher than the median value.

또한, 적어도 하나의 문장 각각에서 제1 비타겟 단어의 개수가 타겟 단어의 개수보다 많으면 적합도에 대한 점수를 중간값보다 낮게 평가할 수 있다.Also, when the number of first non-target words in each of the at least one sentence is greater than the number of target words, the score for the fitness may be evaluated to be lower than the median value.

이때, 보다 구체적 일례로, 도 3과 같이, 평가하는 단계(S300)에서는 적어도 하나의 문장 각각에서 타겟 단어의 개수에 따라 제1 비타겟 단어의 개수에 대한 허용 임계치를 달리하여 평가할 수 있다.At this time, as a more specific example, as shown in FIG. 3 , in the evaluating step S300 , the acceptance threshold for the number of the first non-target words may be different and evaluated according to the number of target words in each of at least one sentence.

일례로, 도 4b의 (5-1)과 같이, 타겟 단어의 개수가 2개인 경우 평가하는 단계(S300)에서 문장 내에서 제1 비타겟 단어의 개수가 1개인 경우까지 적합으로 평가할 수 있으며, 타겟 단어의 개수가 3개인 경우 제1 비타겟 단어의 개수가 2개인 경우까지 적합으로 평가할 수 있다. 이때, 타겟 단어와 품사가 다른 제2 비타겟 단어의 경우 적합도 평가에서 제외될 수 있다. As an example, as shown in (5-1) of FIG. 4B, in the evaluating step (S300) when the number of target words is two, it can be evaluated as suitable until the number of first non-target words in the sentence is one, When the number of target words is three, it may be evaluated as appropriate until the number of first non-target words is two. In this case, the second non-target word having a different part-of-speech from the target word may be excluded from the fitness evaluation.

구체적으로 도 4b의 (5-1)과 같이, 타겟 단어가 여행, 사과인 경우, 생성 결과로 “여행가서 먹는 과일 중에 가장 맛있는 것은 사과이다.”라는 문장이 생성될 수 있다. Specifically, as shown in (5-1) of FIG. 4B , when the target word is travel or apple, a sentence “the most delicious fruit eaten while traveling is an apple” may be generated as a result of the generation.

여기서, 타겟 단어는 “여행”, “사과”와 같은 명사로 2개이고, 제1 비타겟 단어는 명사인 “과일”과 “것”이 있으며, 이 중에서 “것”은 대명사로 평가 대상에서 제외되고, “과일”만 평가 대상에 포함될 수 있다.Here, there are two target words as nouns such as “travel” and “apple”, and the first non-target word includes “fruit” and “thing”, which are nouns, among which “thing” is excluded from evaluation as a pronoun, , only “fruit” can be included in the evaluation target.

아울러, 문장 내에서 나머지 명사가 아닌 다른 품사들은 제2 비타겟 단어로 제1 비타겟 단어의 개수에 대한 허용 임계치를 연산할 때 제1 비타겟 단어의 연산 대상에서 제외될 수 있다.In addition, parts-of-speech other than the remaining nouns in the sentence may be excluded from the calculation target of the first non-target word when calculating the allowable threshold for the number of the first non-target word as the second non-target word.

또한, 도 3의 (5-2)와 같이, 타겟 단어와 제1 비타겟 단어 사이의 개념 유사도를 평가 항목으로 추가할 수 있다.Also, as shown in (5-2) of FIG. 3 , a degree of conceptual similarity between the target word and the first non-target word may be added as an evaluation item.

이와 같은 경우, 문장 각각에서 제1 비타겟 단어와 타겟 단어 사이의 개념 유사도에 따라 적합도에 대한 점수를 달리하여 평가할 수도 있다.In this case, it is also possible to evaluate by varying the score for the fitness according to the conceptual similarity between the first non-target word and the target word in each sentence.

여기서, 개념이 유사하다는 것은 제1 비타겟 단어가 타겟 단어의 상위 개념, 하위 개념, 대비 개념, 유사 개념 등인 경우일 수 있다.Here, that the concept is similar may be a case in which the first non-target word is a higher concept, a lower concept, a contrast concept, a similar concept, or the like of the target word.

이와 같은 평가하는 단계(S300)에서는 생성된 문장 각각에서, 제1 비타겟 단어가 타겟 단어의 상위 개념, 하위 개념, 대비 개념 및 유사 개념 중 적어도 하나에 포함되는 경우 개념 유사도가 있는 것으로 판단하여 적합도에 대한 점수를 중간값의 범위로 평가할 수 있고, 개념 유사도가 없는 경우 적합도에 대한 점수를 중간값보다 낮게 평가할 수 있다.In the evaluating step (S300), in each of the generated sentences, when the first non-target word is included in at least one of a higher concept, a lower concept, a contrast concept, and a similar concept of the target word, it is determined that there is a concept similarity and the degree of suitability It is possible to evaluate the score for the median value, and if there is no concept similarity, the score for the fitness can be evaluated lower than the median value.

구체적 일례로, 도 4b의 (5-2)와 같이, 여행, 사과라는 타겟 단어에 대해, “여행가서 먹는 과일 중에 가장 맛있는 것은 사과이다.”와 같은 문장이 생성된 경우, 제1 비타겟 단어인 과일은 타겟 단어인 사과와 개념 유사도가 있으므로, 평가하는 단계(S300)에서 적합도에 대한 점수가 중간값의 범위로 평가될 수 있다.As a specific example, as shown in (5-2) of FIG. 4B , when a sentence such as “the most delicious fruit eaten while traveling is an apple” is generated with respect to the target word travel and apple, the first non-target word Since the phosphorus fruit has a conceptual similarity with the target word apple, a score for the fitness may be evaluated in a range of a median value in the evaluation step S300 .

그러나, “여행가서 컴퓨터를 하면서 사과를 먹었다.”와 같은 문장이 생성된 경우, 제1 비타겟 단어인 컴퓨터는 사과와 개념 유사도가 없으므로, 평가하는 단계(S300)에서 적합도에 대한 점수가 중간값보다 낮게 평가될 수 있다.However, when a sentence such as “I ate an apple while using the computer while traveling” is generated, the computer, which is the first non-target word, has no conceptual similarity to the apple. may be rated lower.

또한, 도 3의 (5-3)과 같이, 타겟 단어가 제1 품사를 갖는 제1 타겟 단어와 제1 품사와 다른 제2 품사를 갖는 제2 타겟 단어를 포함할 수 있다.Also, as shown in (5-3) of FIG. 3 , the target word may include a first target word having a first part-of-speech and a second target word having a second part-of-speech different from the first part-of-speech.

이와 같은 경우, 평가하는 단계(S300)에서는 제1 타겟 단어에 대한 제1 비타겟 단어의 개수 또는 제2 타겟 단어 대한 제1 비타겟 단어의 개수에 대한 허용 임계치가 증가할 수 있다.In this case, in the evaluating step ( S300 ), the acceptance threshold for the number of first non-target words for the first target word or the number of first non-target words for the second target word may increase.

일례로, 도 4b의 (5-3)과 같이, 제1 타겟 단어가 “여행”으로 명사이고, 제2 타겟 단어가 “맛있다”로 명사를 제외한 다른 품사(예, 형용사)이고, 생성하는 단계(S200)에서 “여행가서 먹는 사과는 맛있다.”라는 생성된 문장이 생성될 수 있다.As an example, as shown in (5-3) of FIG. 4B , the first target word is “travel” as a noun, and the second target word is “delicious” as other parts of speech (eg, adjectives) other than the noun, generating step In ( S200 ), a generated sentence “Eating apples while traveling is delicious” may be generated.

평가하는 단계(S300)에서는 제1 타겟 단어(여행)에 대한 제1 비타겟 단어의 개수(사과-1개)에 대한 허용 임계치를 증가시켜 적합으로 평가할 수 있다.In the evaluating step ( S300 ), the acceptance threshold for the number of first non-target words (apples - 1) for the first target word (travel) may be increased to evaluate as appropriate.

또한, 제1 타겟 단어가 명사이고 제2 타겟 단어가 명사를 제외한 다른 품사인 경우, 평가하는 단계(S300)에서는 적어도 하나의 문장 각각의 내에서 제1 타겟 단어와 제2 타겟 단어 사이의 이격 거리에 따라 적합도 평가를 달리할 수도 있다. In addition, when the first target word is a noun and the second target word is a part-of-speech other than a noun, in the evaluating step ( S300 ), the separation distance between the first target word and the second target word in each of at least one sentence Depending on the fit, the evaluation of the fitness may be different.

예를 들어, 도 4b의 (5-3)과 같이, 제1 타겟 단어가 명사(예, 여행), 제2 타겟 단어가 형용사(예, 맛있다)이고, “여행 가서 먹는 사과는 맛있다.”라는 문장이 생성된 경우, 제1 타겟 단어와 제2 타겟 단어 사이에 미리 설정된 단어의 개수만큼 이격된 경우에 적합으로 평가할 수 있다.For example, as shown in (5-3) of FIG. 4B , the first target word is a noun (eg, travel), the second target word is an adjective (eg, delicious), and “an apple eaten while traveling is delicious.” When a sentence is generated, if the first target word and the second target word are spaced apart by a preset number of words, it may be evaluated as appropriate.

이와 같이, 본 발명은 생성된 문장에 대해 문장의 길이, 개수 및 인접 단어 사이의 연관도를 고려하여 적합도를 평가함으로써, 보다 자연스러운 문장이 출력되도록 할 수 있다.As described above, according to the present invention, a more natural sentence can be output by evaluating the suitability of the generated sentence in consideration of the length and number of sentences and the degree of association between adjacent words.

도 5는 본 발명의 제2 실시예에 따른 타겟 단어가 포함된 문장 생성 방법을 설명하기 위한 도이다.5 is a diagram for explaining a method of generating a sentence including a target word according to a second embodiment of the present invention.

본 발명은 시스템에 의해 출력되는 단에서 제공되는 문장에 대해 사용자로부터 적절성에 대한 판단을 입력받아, 크롤링 또는 자체 생성 데이터의 품질 또는 문장 생성 방식내 가중치 및 임계치를 보다 향상시킬 수 있다.The present invention can further improve the quality of crawled or self-generated data or weights and thresholds in a sentence generation method by receiving a judgment on the adequacy of the sentence provided by the stage output by the system from the user.

이를 위해, 본 발명의 제2 실시예에 따른 문장 생성 방법은 도 5에 도시된 바와 같이, 입력받는 단계(S100), 생성하는 단계(S200), 평가하는 단계(S300), 출력하는 단계(S400), 판단을 입력받는 단계(S500) 및 업테이트하는 단계(S600)를 포함할 수 있다.To this end, the method for generating a sentence according to the second embodiment of the present invention, as shown in FIG. 5 , includes receiving an input (S100), generating (S200), evaluating (S300), and outputting (S400). ), receiving a decision (S500) and updating (S600).

여기서, 입력받는 단계(S100), 생성하는 단계(S200), 평가하는 단계(S300) 및 출력하는 단계(S400)는 앞선 도 1 내지 4에서 설명한 바와 동일하므로, 전술한 내용으로 대체한다.Here, the step of receiving input (S100), the step of generating (S200), the step of evaluating (S300), and the step of outputting (S400) are the same as described with reference to FIGS.

출력하는 단계(S400) 이후, 판단을 입력받는 단계(S500)에서는, 시스템이 적합도를 판단한 결과에 대한 적절성에 대한 판단을 입력받을 수 있다. 여기서, 적절성에 대한 판단을 입력은 사용자의 인터랙션에 의하여 입력될 수 있다.After the step of outputting ( S400 ), in the step of receiving a judgment ( S500 ), the system may receive a judgment on the adequacy of the result of determining the suitability. Here, the input of judgment on appropriateness may be input through user interaction.

여기서, 사용자의 인터랙션은 적합도를 판단한 결과에 대하여 사용자가 어느 하나의 문장을 선택하는 클릭, 터치, 키보드 입력, 음성 입력 등을 통해 피드백하는 동작을 의미할 수 있다.Here, the user's interaction may refer to an operation in which the user feeds back a result of determining the suitability through a click, a touch, a keyboard input, a voice input, etc. for selecting any one sentence.

아울러, 업데이트하는 단계에서는, 시스템이 적절성 판단의 결과에 따라 가중치를 부여하고, 가중치에 따라 적어도 하나의 문장을 크롤링 또는 자체 생성 데이터에 업데이트하거나 문장 생성 방식 내 가중치 및 임계치를 조정할 수 있다.In addition, in the updating step, the system assigns weights according to the result of the adequacy determination, and according to the weights, at least one sentence may be updated to crawled or self-generated data, or weights and thresholds in the sentence generation method may be adjusted.

일례로, 출력하는 단계(S400)에서 생성 결과로 출력되는 적어도 하나의 문장은 제1 문장과 제2 문장을 포함할 수 있으며, 제1 문장에 대한 적합도 점수가 제2 문장에 대한 적합도 점수보다 높게 나올 수 있다.For example, at least one sentence output as a result of the generation in the outputting step S400 may include the first sentence and the second sentence, and the fitness score for the first sentence is higher than the fitness score for the second sentence. can come out

판단을 입력받는 단계(S500)에서, 사용자의 인터랙션에 의하여 제2 문장이 선택되어 입력되는 경우, 업데이트하는 단계에서, 시스템은 제2 문장이 제1 문장에 대한 빈도보다 높은 빈도를 갖도록 가중치를 부여하여, 크롤링 또는 자체 생성 데이터를 업데이트하거나 문장 생성 가중치 및 임계치를 조정할 수 있다.In the step of receiving the judgment (S500), when the second sentence is selected and input by the user's interaction, in the updating step, the system assigns a weight so that the second sentence has a higher frequency than that of the first sentence Thus, it is possible to update crawled or self-generated data, or to adjust sentence generation weights and thresholds.

이에 따라 크롤링 또는 자체 생성 데이터 또는 문장 생성 가중치 및 임계치는 제1 문장에 대한 빈도(또는 확률)보다 제2 문장에 대한 빈도(또는 확률)가 상대적으로 높게 업데이트될 수 있다.Accordingly, the crawled or self-generated data or sentence generation weight and threshold may be updated with a relatively high frequency (or probability) for the second sentence than the frequency (or probability) for the first sentence.

이에 따라, 보다 자연스럽고 실생활에서 사용 빈도 또는 사용 확률이 높은 문장이 시스템에 의해 생성 및 출력되도록, 크롤링 또는 자체 생성 데이터의 품질을 보다 향상시키거나 문장 생성 성능을 향상시킬 수 있다.Accordingly, it is possible to further improve the quality of crawled or self-generated data or improve sentence generation performance so that sentences that are more natural and have a high frequency or probability of use in real life are generated and output by the system.

본 발명의 각 실시예에 개시된 기술적 특징들은 해당 실시예에만 한정되는 것은 아니고, 서로 양립 불가능하지 않은 이상, 각 실시예에 개시된 기술적 특징들은 서로 다른 실시예에 병합되어 적용될 수 있다.The technical features disclosed in each embodiment of the present invention are not limited only to the embodiment, and unless they are mutually incompatible, the technical features disclosed in each embodiment may be combined and applied to different embodiments.

따라서, 각 실시예에서는 각각의 기술적 특징을 위주로 설명하지만, 각 기술적 특징이 서로 양립 불가능하지 않은 이상, 서로 병합되어 적용될 수 있다.Accordingly, in each embodiment, each technical feature will be mainly described, but unless the technical features are incompatible with each other, they may be merged and applied.

본 발명은 상술한 실시예 및 첨부한 도면에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자의 관점에서 다양한 수정 및 변형이 가능할 것이다. 따라서 본 발명의 범위는 본 명세서의 청구범위뿐만 아니라 이 청구범위와 균등한 것들에 의해 정해져야 한다. The present invention is not limited to the above-described embodiments and the accompanying drawings, and various modifications and variations will be possible from the point of view of those of ordinary skill in the art to which the present invention pertains. Accordingly, the scope of the present invention should be defined not only by the claims of the present specification, but also by those claims and their equivalents.

100: 프로세서 200: 메모리
300: 입력부 400: 출력부100: processor 200: memory
300: input unit 400: output unit

Claims

receiving, by the system, at least one target word from a user;
generating, by the system, at least one sentence including the at least one target word as a generation result by using a learning model generated based on crawling or self-generated data;
evaluating, by the system, a fitness for the generated result; and
outputting, by the system, an evaluation result for the fitness,
In the evaluation step
Based on the length of each sentence according to the generation result, the number of sentences, and the degree of association between two words adjacent to each other in each sentence - the frequency in which two adjacent words are actually used adjacent to each other based on crawled or self-generated data evaluating the suitability of the at least one sentence;
The suitability of each of the at least one sentence is determined based on the number of the target word in each of the at least one sentence and the number of first non-target words that are generated in the step of generating the generation result and have the same part-of-speech as the target word to evaluate
How to create a sentence containing the target word.

According to claim 1,
In the evaluation step
When the length of each of the at least one sentence is within a preset threshold range, the score for the fitness is evaluated to be higher than the median value,
When the length of each of the at least one sentence is outside the threshold range, the method for generating a sentence including a target word for evaluating the score for the fitness to be lower than the median value or as inappropriate.

According to claim 1,
In the evaluation step
When the number of the at least one sentence according to the generation result is within a preset threshold range, the score for the fitness is evaluated to be higher than the median value,
When the number of the at least one sentence according to the generation result is greater than the threshold range, the sentence generating method including a target word for evaluating the score for the suitability to be lower than the median value or as inappropriate.

According to claim 1,
In the evaluation step
A target word that calculates a score for the degree of relevance between two adjacent words for each of the words included in each sentence according to the generation result, and evaluates the score for the fitness based on the score for the degree of relevance How to create embedded sentences.

According to claim 1,
In the evaluation step
When the meaning of the target word is changed by adding a word before or after the target word in each of the at least one sentence, a sentence including a target word that evaluates the score for the suitability lower than the median value or evaluates it as inappropriate is generated Way.

delete

According to claim 1,
In the evaluation step
If the number of the first non-target words in each of the at least one sentence is less than the number of the target words, the score for the fitness is evaluated to be higher than the median value,
When the number of the first non-target words in each of the at least one sentence is greater than the number of the target words, the method for generating a sentence including a target word for evaluating the score for the fitness to be lower than the median value.

According to claim 1,
In the evaluation step
A method of generating a sentence including a target word in each of the at least one sentence by varying an acceptance threshold for the number of the first non-target word according to the number of the target word.

According to claim 1,
In the evaluation step
A method of generating a sentence including a target word for evaluating the suitability by varying a score for the fitness according to a degree of conceptual similarity between the first non-target word and the target word in each of the at least one sentence.

10. The method of claim 9,
In the evaluation step
In each of the at least one sentence,
When the first non-target word is included in at least one of an upper concept, a lower concept, a contrast concept, and a similar concept of the target word, it is determined that there is the concept similarity, and the score for the suitability is evaluated as a range of the median value, ,
A method of generating a sentence including a target word for evaluating a score for the suitability lower than the median value when there is no concept similarity.

According to claim 1,
In the evaluation step
In each of the at least one sentence, when the target word is a noun and the first non-target word is a pronoun or a dependent noun, the pronoun or the dependent noun is a target word that is excluded from the relevance evaluation method.

According to claim 1,
When the at least one target word includes a first target word having a first part-of-speech and a second target word having a second part-of-speech different from the first part-of-speech,
In the step of evaluating, the target word is generated in which the acceptance threshold for the number of first non-target words for the first target word or the number of first non-target words for the second target word increases.

13. The method of claim 12,
When the first target word is a noun and the second target word is a part-of-speech other than a noun,
In the evaluating, a sentence generating method including a target word is evaluated by increasing an acceptance threshold for the number of first non-target words with respect to the first target word.

13. The method of claim 12,
When the first target word is a noun and the second target word is a part-of-speech other than a noun,
In the evaluating step, a target word in which the suitability evaluation is different according to a separation distance between the first target word and the second target word in each of the at least one sentence is included.

According to claim 1,
In the step of outputting the evaluation result,
A method of generating a sentence including at least one sentence according to the generation result and a target word in which a suitability score for each of the at least one sentence is displayed.

According to claim 1,
After outputting the evaluation result,
receiving, by the system, a judgment on the suitability of the result of determining the suitability; and
The system assigns weights according to the result of the relevance determination, and according to the weights, updating the at least one sentence to crawled or self-generated data or adjusting sentence generation weights and thresholds; How to create embedded sentences.

17. The method of claim 16,
The at least one sentence output as a result of the generation includes a first sentence and a second sentence,
In the step of outputting the evaluation result, the fitness score for the first sentence was higher than the fitness score for the second sentence,
In the step of receiving the judgment, when the second sentence is selected and input by the user's interaction,
In the updating step, the system assigns a weight so that the second sentence has a higher frequency than that of the first sentence, and updates the crawled or self-generated data or a target word for adjusting a sentence generation weight and threshold. How to create sentences with

Memory; and
a processor coupled to the memory and configured to execute instructions contained in the memory;
the processor
Control to output a generation result for at least one sentence including the at least one target word using a learning model generated based on crawling or self-generated data using at least one target word input from the user,
Evaluating the fitness for the generated results,
Output the evaluation result for the fitness,
The evaluation of the fitness is
Based on the length of each sentence according to the generation result, the number of sentences, and the degree of association between two words adjacent to each other in each sentence - the frequency in which two adjacent words are actually used adjacent to each other based on crawled or self-generated data evaluating the suitability of the at least one sentence;
Each of the at least one sentence is generated in the process of controlling to output the number of the target word and the generation result in each of the at least one sentence, and based on the number of first non-target words having the same part-of-speech as the target word. to evaluate the fit
Sentence generation system with target words.