KR20220141521A

KR20220141521A - Apparatus and method for generating query

Info

Publication number: KR20220141521A
Application number: KR1020210047731A
Authority: KR
Inventors: 임경태; 이유한; 서호건; 류승형; 전병일; 김승근; 유용균
Original assignee: 한국원자력연구원
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2022-10-20

Abstract

A method for generating a query performed by a device for generating a query according to an embodiment comprises the steps of: receiving an image; inputting the image into a pre-trained deep learning model to extract features of the image; generating at least one word using the features; generating a first query having the at least one word combined therein; and generating a second query so that the first word used in the first query cannot be included with a preset probability, wherein the pre-trained deep learning model may be pre-trained with a plurality of images and a plurality of questions for each of the plurality of images, which are label data for each of the plurality of images, as input. Therefore, various kinds of creative queries can be generated for an input image.

Description

Apparatus and method for generating a query statement {APPARATUS AND METHOD FOR GENERATING QUERY}

본 발명은 질의문 생성 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for generating a query.

이미지 캡셔닝(Image Captioning) 기술이란 주어진 시각정보(이미지 또는 영상)를 입력으로 받아, 시각정보를 서술하는 자연어 문장을 만드는 인공지능 기술이다.Image captioning technology is an artificial intelligence technology that receives given visual information (image or video) as input and creates a natural language sentence describing the visual information.

이러한, 이미지 캡셔닝 기술은 뉴스기사 자동생성, 문서요약, 사진 정보 서술 등의 기술에서 활용되고 있다.Such image captioning technology is being utilized in technologies such as automatic news article generation, document summary, and photo information description.

하지만, 현재의 이미지 캡셔닝 기술은 이미지에 대한 일반적인 서술과 같은 이미지의 묘사에만 포커싱된 간단한 유형의 질문만을 생성하기 때문에 다양한 종류의 창의적 질의문을 생성할 수는 없다.However, since the current image captioning technology generates only simple types of questions focused only on the description of the image, such as a general description of the image, it is not possible to generate various kinds of creative questions.

한국공개특허공보, 10-2014-0066476호 (2014.06.02. 공개)Korean Patent Publication No. 10-2014-0066476 (published on Jun. 2, 2014)

본 발명의 해결하고자 하는 과제는, 질의문 생성 장치 및 방법을 제공하는 것이다.SUMMARY OF THE INVENTION It is an object of the present invention to provide an apparatus and method for generating a query.

또한, 입력 받은 이미지에 대하여 다양한 종류의 창의적 질의문을 생성하는 것 등이 본 발명의 해결하고자 하는 과제에 포함될 수 있다.In addition, generating various kinds of creative questions with respect to the input image may be included in the task to be solved by the present invention.

다만, 본 발명의 해결하고자 하는 과제는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재로부터 본 발명이 속하는 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the problems to be solved of the present invention are not limited to those mentioned above, and other problems to be solved that are not mentioned can be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description. will be.

본 발명의 일 실시예에 따른 질의문 생성 방법은, 질의문 생성 장치에 의해 수행되는 질의문 생성 방법에 있어서, 이미지를 입력 받는 단계와, 상기 이미지를 기 학습된 딥러닝 모델에 입력하여, 상기 이미지의 특징을 추출하는 단계와, 상기 특징을 이용하여, 적어도 하나의 단어를 생성하는 단계와, 상기 적어도 하나의 단어를 조합한 첫번째 질의문을 생성하는 단계와, 상기 첫번째 질의문에서 사용된 첫번째 단어가 기 설정된 확률로 포함되지 않도록 두번째 질의문을 생성하는 단계를 포함하되, 상기 기 학습된 딥러닝 모델은, 다수의 이미지와 상기 다수의 이미지 각각의 레이블 데이터로써 상기 다수의 이미지 각각에 대한 다수의 질문을 입력으로 하여 기 학습되어 있을 수 있다.A method for generating a query according to an embodiment of the present invention comprises the steps of: receiving an image; inputting the image into a pre-trained deep learning model; extracting a feature of the image, generating at least one word by using the feature, generating a first query sentence combining the at least one word, and the first query used in the first query sentence. generating a second query so that the word is not included with a preset probability, wherein the pre-trained deep learning model is a plurality of images for each of the plurality of images as label data of each of the plurality of images It may be pre-learned by inputting a question of.

또한, 상기 적어도 하나의 단어를 생성하는 단계는, 상기 특징을 이용하여 상기 첫번째 단어를 생성하고, 상기 첫번째 단어 및 상기 이미지의 특징을 이용하여 두번째 단어를 생성할 수 있다.In addition, the generating of the at least one word may include generating the first word using the feature, and generating a second word using the first word and features of the image.

또한, 상기 적어도 하나의 단어를 생성하는 단계는, 상기 두번째 단어는 상기 첫번째 단어와 시계열적으로 연결되도록 생성할 수 있다.In addition, the generating of the at least one word may include generating the second word to be time-series connected to the first word.

또한, 상기 적어도 하나의 단어를 생성하는 단계는, 기 정의된 문장의 종료를 의미하는 문장부호를 포함하는 단어가 생성될 경우, 단어 생성을 종료할 수 있다.Also, in the generating of the at least one word, when a word including a punctuation mark indicating the end of a predefined sentence is generated, word generation may be terminated.

또한, 상기 기 학습된 딥러닝 모델은, 트랜스포머(Transformer) 네트워크 구조가 적용된 딥러닝 모델일 수 있다.In addition, the pre-trained deep learning model may be a deep learning model to which a transformer network structure is applied.

본 발명의 일 실시예에 따른 질의문 생성 장치는, 이미지를 입력 받는 입출력부; 메모리; 및 상기 메모리와 전기적으로 연결된 프로세서를 포함하고, 상기 프로세서는, 상기 이미지를 기 학습된 딥러닝 모델에 입력하여, 상기 이미지의 특징을 추출하고, 상기 특징을 이용하여, 적어도 하나의 단어를 생성하고, 상기 적어도 하나의 단어를 조합한 첫번째 질의문을 생성하고, 상기 첫번째 질의문에서 사용된 첫번째 단어가 기 설정된 확률로 포함되지 않도록 두번째 질의문을 생성하고, 상기 기 학습된 딥러닝 모델은, 다수의 이미지와 상기 다수의 이미지 각각의 레이블 데이터로써 상기 다수의 이미지 각각에 대한 다수의 질문을 입력으로 하여 기 학습되어 있을 수 있다.An apparatus for generating a query according to an embodiment of the present invention includes an input/output unit for receiving an image; Memory; and a processor electrically connected to the memory, wherein the processor inputs the image to a pre-trained deep learning model, extracts features of the image, uses the features to generate at least one word, , generating a first query by combining the at least one word, generating a second query so that the first word used in the first query is not included with a preset probability, and the pre-trained deep learning model includes a plurality of It may be pre-learned by inputting a plurality of questions for each of the plurality of images as the image of and label data of each of the plurality of images.

본 발명의 실시예에 대한 질의문 생성 장치는, 입력 받은 이미지에 대하여 다양한 종류의 창의적 질의문을 생성할 수 있다.The apparatus for generating a query according to an embodiment of the present invention may generate various types of creative questions with respect to an input image.

또한, 본 발명의 실시예에 대한 질의문 생성 장치는, 생성한 다양한 종류의 창의적 질의문을 시각 정보 기반의 질의응답 기술(VQA: Visual Question Answering)에 사용되는 딥러닝 학습 모델의 학습 데이터 셋으로 사용할 수 있으며, 이에 따라 학습 데이터 셋의 구축 비용을 절감할 수 있다.In addition, the apparatus for generating a question for an embodiment of the present invention uses various types of generated creative questions as a learning data set of a deep learning learning model used in Visual Question Answering (VQA) based on visual information. can be used, thereby reducing the cost of constructing the training data set.

다만, 본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those of ordinary skill in the art to which the present disclosure belongs from the description below. will be able

도 1은 본 발명의 일 실시예에 따른 질의문 생성 장치의 블록도이다.
도 2는 본 발명의 일 실시예에 따른 기 학습된 딥러닝 모델을 이용하여 질의문을 생성하는 방법의 예시를 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 질의문 생성 방법의 절차에 대한 예시적인 순서도이다. 1 is a block diagram of an apparatus for generating a query according to an embodiment of the present invention.
2 is a diagram for explaining an example of a method of generating a query using a pre-trained deep learning model according to an embodiment of the present invention.
3 is an exemplary flowchart of a procedure of a method for generating a query according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the technical field to which the present invention pertains It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing the embodiments of the present invention, if it is determined that a detailed description of a well-known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the terms to be described later are terms defined in consideration of functions in an embodiment of the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification.

도 1은 본 발명의 일 실시예에 따른 질의문 생성 장치의 블록도이다.1 is a block diagram of an apparatus for generating a query according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 질의문 생성 장치(100)는 입출력부(101), 통신부(102), 메모리(110) 및/또는 프로세서(120)를 포함할 수 있다.Referring to FIG. 1 , an apparatus 100 for generating a query according to an embodiment of the present invention may include an input/output unit 101 , a communication unit 102 , a memory 110 , and/or a processor 120 .

입출력부(101)는, 예를 들면, 사용자 또는 다른 외부 기기로부터 입력된 명령 또는 데이터를 일 실시예에 따른 질의문 생성 장치(100)의 다른 구성요소(들)에 전달하거나, 또는 일 실시예에 따른 질의문 생성 장치(100)의 다른 구성요소(들)로부터 수신된 명령 또는 데이터를 사용자 또는 외부 기기로 출력할 수 있다.The input/output unit 101 transmits, for example, a command or data input from a user or other external device to other component(s) of the query generating apparatus 100 according to an embodiment, or, according to an embodiment A command or data received from other component(s) of the apparatus 100 for generating a query according to the above may be output to a user or an external device.

예를 들어, 입출력부(101)는 이미지를 입력받을 수 있으나, 이에 한정되지 않고 동적인 이미지로서 영상을 입력받을 수도 있다.For example, the input/output unit 101 may receive an image, but is not limited thereto, and may receive an image as a dynamic image.

통신부(102)는 질의문 생성 장치(100)와 외부 장치와의 유선 또는 무선 통신 채널의 수립 및 수립된 통신 채널을 통한 통신 수행을 지원할 수 있다.The communication unit 102 may support establishment of a wired or wireless communication channel between the query generating apparatus 100 and an external device and performing communication through the established communication channel.

메모리(110)는 질의문 생성 장치(100)의 적어도 하나의 구성요소(프로세서(120), 입출력부(101) 및/또는 통신부(102))에 의해 사용되는 다양한 데이터, 예를 들어, 소프트웨어(예: 프로그램) 및, 이와 관련된 명령에 대한 입력 데이터 또는 출력 데이터를 저장할 수 있다. 메모리(110)는, 휘발성 메모리 또는 비휘발성 메모리를 포함할 수 있다.The memory 110 includes various data used by at least one component (the processor 120, the input/output unit 101, and/or the communication unit 102) of the query generating apparatus 100, for example, software ( Example: program) and input data or output data for commands related thereto can be stored. The memory 110 may include a volatile memory or a non-volatile memory.

프로세서(120)(제어부, 제어 장치 또는 제어 회로라고도 함)는 연결된 질의문 생성 장치(100)의 적어도 하나의 다른 구성요소(예: 하드웨어 구성 요소(예: 입출력 부(101), 통신부(102) 및/또는 메모리(110)) 또는 소프트웨어 구성요소)를 제어할 수 있고, 다양한 데이터 처리 및 연산을 수행할 수 있다.The processor 120 (also referred to as a control unit, a control device, or a control circuit) includes at least one other component (eg, a hardware component (eg, an input/output unit 101 , a communication unit 102 ) of the connected query generating device 100 . and/or the memory 110) or a software component), and may perform various data processing and operations.

또한, 프로세서(120)는 다른 구성요소들 중 적어도 하나로부터 수신된 명령 또는 데이터를 휘발성 메모리에 로드하여 처리하고, 다양한 데이터를 비휘발성 메모리에 저장할 수 있다.In addition, the processor 120 may load and process commands or data received from at least one of the other components into the volatile memory, and store various data in the non-volatile memory.

이를 위해, 프로세서(120)는 해당 동작을 수행하기 위한 전용 프로세서(예를 들어, 임베디드 프로세서) 또는 메모리 디바이스에 저장된 하나 이상의 소프트웨어 프로그램을 실행함으로써, 해당 동작들을 수행할 수 있는 범용 프로세서(generic-purpose processor)(예를 들어, CPU 또는 application processor 또는 MCU(Micro controller unit) 등)로 구현될 수 있다.To this end, the processor 120 executes one or more software programs stored in a dedicated processor (eg, an embedded processor) or a memory device for performing the corresponding operation, thereby performing the corresponding operation (generic-purpose processor). processor) (eg, a CPU or an application processor or a micro controller unit (MCU), etc.).

보다 구체적으로, 프로세서(120)는 입출력부(101)에서 입력 받은 이미지를 기 학습된 딥러닝 모델에 입력하여, 이미지의 특징을 추출할 수 있다.More specifically, the processor 120 may input the image received from the input/output unit 101 into the pre-trained deep learning model to extract features of the image.

여기서, 기 학습된 딥러닝 모델은 멀티-헤드 어텐션(Multi-head attention)층을 기반으로 한 트랜스포머(Transformer) 네트워크 구조가 적용된 딥러닝 모델일 수 있다.Here, the pre-trained deep learning model may be a deep learning model to which a transformer network structure based on a multi-head attention layer is applied.

이하, 프로세서(120)는 기 학습된 딥러닝 모델을 이용하여 입출력부(101)에서 입력 받은 이미지와 관련된 적어도 하나의 단어를 생성할 수 있다.Hereinafter, the processor 120 may generate at least one word related to the image input from the input/output unit 101 using the pre-learned deep learning model.

이때, 프로세서(120)는 메모리(110)에 기 저장된 기 학습된 기계학습 모델을 이용하여 입출력부(101)에서 입력 받은 이미지와 관련된 적어도 하나의 단어를 생성하거나, 프로세서(120)는 외부의 다른 장치로부터 기 학습된 기계학습 모델을 로드한 후, 기 학습된 기계학습 모델을 이용하여 입출력부(101)에서 입력 받은 이미지와 관련된 적어도 하나의 단어를 생성할 수 있다.In this case, the processor 120 generates at least one word related to the image input from the input/output unit 101 using a pre-learned machine learning model stored in the memory 110 , or the processor 120 uses another external word. After loading the pre-learned machine learning model from the device, at least one word related to the image input from the input/output unit 101 may be generated using the pre-learned machine learning model.

보다 상세히, 프로세서(120)는 추출한 이미지의 특징을 이용하여 적어도 하나의 단어를 생성할 수 있으나, 이에 한정되지 않으며 프로세서(120)는 추출한 이미지의 특징을 이용하여 적어도 하나의 어절을 생성할 수 있다.In more detail, the processor 120 may generate at least one word using the features of the extracted image, but is not limited thereto, and the processor 120 may generate at least one word using the features of the extracted image. .

예를 들어, 프로세서(120)는 추출한 이미지의 특징을 이용하여 첫번째 단어를 생성하고, 생성한 첫번째 단어 및 상기 이미지의 특징을 이용하여 두번째 단어를 생성할 수 있다.For example, the processor 120 may generate a first word using the features of the extracted image, and may generate the second word using the generated first word and the features of the image.

이때, 프로세서(120)는 두번째 단어는 첫번째 단어와 시계열적으로 연결되도록 생성할 수 있다.In this case, the processor 120 may generate the second word to be time-series connected to the first word.

즉, 프로세서(120)는 먼저, 추출한 이미지의 특징을 이용하여 단어를 생성한 후, 이전에 생성한 단어와 추출한 이미지의 특징을 이용하여 다수개의 단어를 생성할 수 있다. 또한, 프로세서(120)는 생성된 다수개의 단어들은 이전에 생성된 단어들과 시계열적으로 연결되도록 생성할 수 있다.That is, the processor 120 may first generate a word using the features of the extracted image, and then generate a plurality of words using the previously generated word and the features of the extracted image. Also, the processor 120 may generate a plurality of generated words to be time-series connected to previously generated words.

한편, 프로세서(120)는 기 정의된 문장의 종료를 의미하는 문장부호를 포함하는 단어가 생성될 경우, 단어 생성을 종료할 수 있다.Meanwhile, when a word including a punctuation mark indicating the end of a predefined sentence is generated, the processor 120 may end word generation.

여기서, 기 정의된 문장의 종료를 의미하는 문장부호는 마침표(.), 물음표(?), 느낌표(!)등을 포함할 수 있으나, 이에 한정되는 것은 아니다.Here, the punctuation marks indicating the end of a predefined sentence may include a period (.), a question mark (?), an exclamation mark (!), and the like, but is not limited thereto.

프로세서(120)는 생성된 적어도 하나의 단어를 조합한 첫번째 질의문을 생성할 수 있으며, 이후 첫번째 질의문에서 사용된 첫번째 단어가 기 설정된 확률로 포함되지 않도록 두번째 질의문을 생성할 수 있다.The processor 120 may generate a first query by combining at least one generated word, and then generate a second query such that the first word used in the first query is not included with a preset probability.

예컨대, 프로세서(120)는 첫번째 질의문, 두번째 질의문 이외에도 기 설정된 개수의 질의문을 생성할 수 있으며, 예를 들어, 프로세서(120)는 두번째 질의문에서 사용된 첫번째 단어가 기 설정된 확률로 포함되지 않도록 세번째 질의문을 생성할 수 있다.For example, the processor 120 may generate a preset number of queries in addition to the first and second queries. For example, the processor 120 includes the first word used in the second query with a preset probability. You can create a third query to prevent this from happening.

보다 상세히, 프로세서(120)는 첫번째 질의문이 생성된 후, 추출한 이미지의 특징을 이용하여 두번째 질의문에 포함될 첫번째 단어를 생성하되, 두번째 질의문에 포함될 첫번째 단어는 기 설정된 확률(예를 들어, 85%의 확률)로 첫번째 질의문에서 사용된 첫번째 단어가 생성되지 않을 수 있다.In more detail, after the first query is generated, the processor 120 generates a first word to be included in the second query using the features of the extracted image, and the first word to be included in the second query has a preset probability (eg, 85% chance), the first word used in the first query may not be generated.

예를 들어, 프로세서(120)는 두번째 질의문에 포함될 첫번째 단어를 생성할 경우, 첫번째 질의문에서 사용된 첫번째 단어가 생성되지 않을 확률이 85%가 되도록 설정되어 있기 때문에, 프로세서(120)에서 생성하는 두번째 질의문의 첫번째 단어는 첫번째 질의문의 첫번째 단어와 중복되지 않을 가능성이 높다.For example, when the processor 120 generates the first word to be included in the second query, since the probability that the first word used in the first query is not generated is set to 85%, the processor 120 generates It is highly likely that the first word of the second query does not overlap with the first word of the first query.

따라서, 본 발명의 일 실시예에 따른 질의문 생성 장치(100)는 중복되는 질의문을 생성할 가능성이 낮으며, 이전 질의문 생성 이후에 생성되는 질의문의 첫번째 단어는 이전 생성된 질의문의 첫번째 단어와 중복성이 낮으므로 다양한 유형의 질의문이 생성될 수 있다.Therefore, the possibility that the query generating apparatus 100 according to an embodiment of the present invention generates a duplicate query is low, and the first word of the query generated after the previous query is generated is the first word of the previously generated query. and redundancy is low, so various types of query statements can be created.

이하, 기 학습된 딥러닝 모델을 이용하여 질의문을 생성하는 방법의 예시에 대하여 도 2를 참조하여 상세히 설명하도록 한다.Hereinafter, an example of a method of generating a query using a pre-learned deep learning model will be described in detail with reference to FIG. 2 .

도 2는 본 발명의 일 실시예에 따른 기 학습된 딥러닝 모델을 이용하여 질의문을 생성하는 방법의 예시를 설명하기 위한 도면이다.2 is a diagram for explaining an example of a method of generating a query using a pre-trained deep learning model according to an embodiment of the present invention.

도 2를 참조하면, 먼저 입출력부(101)에서 이미지(200)를 입력 받으면, 입력 받은 이미지는 기 학습된 딥러닝 모델(220)에 입력되기 위해, 이미지가 기 설정된 크기로 분할되는 등의 데이터 전처리 과정(210)이 수행된다.Referring to FIG. 2 , when an image 200 is first inputted from the input/output unit 101 , the received image is input to the pre-trained deep learning model 220 , so that the image is divided into preset sizes. A pre-processing process 210 is performed.

이후, 데이터 전처리 과정이 수행된 이미지는 기 학습된 딥러닝 모델(220)에 입력되고, 기 학습된 딥러닝 모델(220)은 데이터 전처리 과정이 수행된 이미지를 입력 받아, 적어도 하나의 단어를 출력(Output Probabilities)할 수 있다.Thereafter, the image on which the data preprocessing has been performed is input to the pre-trained deep learning model 220 , and the pre-trained deep learning model 220 receives the image on which the data pre-processing has been performed, and outputs at least one word. (Output Probabilities) is possible.

여기서, 기 학습된 딥러닝 모델(220)은 다수의 이미지와 다수의 이미지 각각의 레이블 데이터로써 다수의 이미지 각각에 대한 다수의 질문을 입력으로 하여 기 학습되어 있을 수 있다.Here, the pre-trained deep learning model 220 may be pre-learned by inputting a plurality of questions for each of a plurality of images as label data of a plurality of images and each of the plurality of images.

보다 상세히, 기 학습된 딥러닝 모델(220)은 인코더부(213)와 디코더부(225)를 포함할 수 있으며, 인코더부(213)는 데이터 전처리 과정이 수행된 이미지를 입력 받아, 이미지의 특징을 추출할 수 있고, 디코더부(235)는 인코더부(213)로부터 추출된 특징을 입력 받아 적어도 하나의 단어를 생성할 수 있다.In more detail, the pre-trained deep learning model 220 may include an encoder unit 213 and a decoder unit 225 , and the encoder unit 213 receives an image on which data pre-processing has been performed, and features of the image may be extracted, and the decoder unit 235 may receive the features extracted from the encoder unit 213 and generate at least one word.

여기서, 디코더부(235)는 인코더부(213)로부터 추출된 이미지의 특징을 입력 받아 트랜스포머(Transformer) 네트워크 구조 이후, 시계열 기반의 네트워크 구조인 LSTM(Long Short Term Memory) 네트워크 구조의 각 레이어를 통과함에 따라 시계열 기반으로 입력 받은 이미지의 특징과 가장 연관된 적어도 하나의 단어를 생성할 수 있다.Here, the decoder unit 235 receives the characteristics of the image extracted from the encoder unit 213 and passes through each layer of the LSTM (Long Short Term Memory) network structure, which is a time series-based network structure after the Transformer network structure. Accordingly, it is possible to generate at least one word most related to the characteristics of the input image based on the time series.

예를 들어, 입출력부(101)에서 여성의 얼굴에 바나나가 위치하며, 바나나의 위치가 여성의 얼굴에 콧수염 형태를 띄고 있는 이미지(200)를 입력 받을 경우, 입력 받은 이미지(200)는 이미지가 기 설정된 크기로 분할되는 등의 데이터 전처리 과정(210)이 수행되고, 데이터 전처리 과정이 수행된 이미지는 기 학습된 딥러닝 모델(220)에 입력될 수 있다.For example, when input/output unit 101 receives an image 200 in which a banana is located on a woman's face and the location of the banana has a mustache shape on the woman's face, the input image 200 is the image A data pre-processing process 210 such as division into a preset size is performed, and the image on which the data pre-processing process is performed may be input to the pre-trained deep learning model 220 .

이때, 기 학습된 딥러닝 모델(220)은 데이터 전처리 과정이 수행된 이미지를 입력 받아, 적어도 하나의 단어를 출력(Output Probabilities)할 수 있다.In this case, the pre-learned deep learning model 220 may receive an image on which data preprocessing has been performed, and may output at least one word (Output Probabilities).

예를 들어, 기 학습된 딥러닝 모델(220)은 첫번째 단어로 “어떤”을 생성할 경우, 첫번째 단어인 “어떤”과 인코더부(233)로부터 추출된 이미지의 특징을 고려하여 두번째 단어로 “색깔의”를 생성하고, 첫번째 단어(“어떤”), 두번째 단어(“색깔의”) 및 인코더부(233)로부터 추출된 이미지의 특징을 고려하여 세번째 단어로 “콧수염”을 생성할 수 있다. 이후, 기 학습된 딥러닝 모델(220)은 첫번째 단어(“어떤”), 두번째 단어(“색깔의”), 세번째 단어("콧수염") 및 인코더부(233)로부터 추출된 이미지의 특징을 고려하여 네번째 단어로 "입니까?"를 생성할 수 있다.For example, when the pre-trained deep learning model 220 generates “any” as the first word, “any” is the second word in consideration of the first word “any” and the characteristics of the image extracted from the encoder unit 233 . Colored” may be generated, and “mustache” may be generated as the third word in consideration of the first word (“some”), the second word (“colored”), and features of the image extracted from the encoder unit 233 . Thereafter, the pre-trained deep learning model 220 considers the first word (“some”), the second word (“colored”), the third word (“mustache”), and the features of the image extracted from the encoder unit 233 . to produce "is it?" as the fourth word.

여기서, 기 학습된 딥러닝 모델(220)은 기 정의된 문장의 종료를 의미하는 문장부호로써, 네번째 단어(“입니까?”)에 물음표(?)가 포함되는 단어가 생성되었으므로, 단어 생성을 종료할 수 있다.Here, the pre-trained deep learning model 220 is a punctuation mark indicating the end of a predefined sentence, and since a word including a question mark (?) is generated in the fourth word (“is it?”), word generation is terminated. can do.

이후, 프로세서(120)에서 기 학습된 딥러닝 모델(220)에서 생성한 4가지의 단어를 순서대로 조합하여 첫번째 질의문을 생성할 수 있다.Thereafter, the first query may be generated by sequentially combining the four words generated by the deep learning model 220 previously learned by the processor 120 .

이후, 기 학습된 딥러닝 모델(220)은 다시 두번째 질의문을 위한 첫번째 단어를 생성할 수 있는데, 이때 두번째 질의문의 첫번째 단어는 첫번째 질의문에서의 첫번째 단어인 “어떤”이 기 설정된 확률(예를 들어, 85%의 확률)로 선택되지 않을 수 있다.Thereafter, the pre-trained deep learning model 220 may again generate the first word for the second query, where the first word of the second query has a preset probability (eg, for example, 85% chance).

도 3은 본 발명의 일 실시예에 따른 질의문 생성 방법의 절차에 대한 예시적인 순서도이다. 도 3의 질의문 생성 방법은 도 1에 도시된 질의문 생성 장치(100)에 의해 수행 가능하다. 아울러, 도 3에 도시된 질의문 생성 방법은 예시적인 것에 불과하다.3 is an exemplary flowchart of a procedure of a method for generating a query according to an embodiment of the present invention. The method for generating a query of FIG. 3 may be performed by the apparatus 100 for generating a query shown in FIG. 1 . In addition, the method of generating a query shown in FIG. 3 is merely exemplary.

도 3을 참조하면, 입출력부(101)는 이미지를 입력 받을 수 있다(단계 S10).Referring to FIG. 3 , the input/output unit 101 may receive an image (step S10 ).

이후, 프로세서(120)는 입출력부(101)에서 입력 받은 이미지를 기 학습된 딥러닝 모델에 입력하여, 이미지의 특징을 추출할 수 있다(단계 S20).Thereafter, the processor 120 may input the image received from the input/output unit 101 into the pre-trained deep learning model to extract features of the image (step S20 ).

여기서, 기 학습된 딥러닝 모델은 다수의 이미지와 다수의 이미지 각각의 레이블 데이터로써 다수의 이미지 각각에 대한 다수의 질문을 입력으로 하여 기 학습되어 있을 수 있다.Here, the pre-trained deep learning model may be pre-trained by inputting a plurality of images and a plurality of questions for each of the plurality of images as label data of each of the plurality of images.

이후, 프로세서(120)는 추출한 특징을 이용하여 적어도 하나의 단어를 생성할 수 있다(단계 S30).Thereafter, the processor 120 may generate at least one word by using the extracted features (step S30 ).

예를 들어, 프로세서(120)는 추출한 이미지의 특징을 이용하여 첫번째 단어를 생성하고, 생성한 첫번째 단어 및 상기 이미지의 특징을 이용하여 두번째 단어를 생성할 수 있으며, 이때, 프로세서(120)는 두번째 단어는 첫번째 단어와 시계열적으로 연결되도록 생성할 수 있다.For example, the processor 120 may generate a first word using the features of the extracted image, and may generate a second word using the generated first word and the features of the image. In this case, the processor 120 may generate a second word Words can be generated to be connected in time series with the first word.

또한, 프로세서(120)는 기 정의된 문장의 종료를 의미하는 문장부호를 포함하는 단어가 생성될 경우, 단어 생성을 종료할 수 있다.Also, when a word including a punctuation mark indicating the end of a predefined sentence is generated, the processor 120 may end word generation.

이후, 프로세서(120)는 생성한 적어도 하나의 단어를 조합한 첫번째 질의문을 생성할 수 있다(단계 S40).Thereafter, the processor 120 may generate a first query by combining at least one generated word (step S40 ).

마지막으로, 프로세서(120)는 첫번째 질의문에서 사용된 첫번째 단어가 기 설정된 확률로 포함되지 않도록 두번째 질의문 생성할 수 있다(단계 S50).Finally, the processor 120 may generate the second query so that the first word used in the first query is not included with a preset probability (step S50 ).

예컨대, 프로세서(120)는 첫번째 질의문, 두번째 질의문 이외에도 기 설정된 개수의 질의문을 생성할 수 있으며, 예를 들어, 프로세서(120)는 세번째 질의문은 두번째 질의문에서 사용된 첫번째 단어가 기 설정된 확률로 포함되지 않도록 세번째 질의문을 생성할 수 있다.For example, the processor 120 may generate a preset number of queries in addition to the first and second queries. For example, the processor 120 determines that the third query is the first word used in the second query. A third query can be created so that it is not included with a set probability.

이상에서 살펴본 바와 같이, 본 발명의 실시예에 대한 질의문 생성 장치는, 입력 받은 이미지에 대하여 다양한 종류의 창의적 질의문을 생성할 수 있다.As described above, the apparatus for generating a query according to an embodiment of the present invention may generate various types of creative questions with respect to an input image.

한편, 본 발명의 일 실시예에 따른 질의문 생성 장치(100)는 다양한 플랫폼에서 사용될 수 있으며, 예를 들어 시각정보를 보고 인지(또는 생각)할 수 있는 질문을 제공할 수 있는 인공 지능 서비스 시스템 등에서 사용될 수 있다.Meanwhile, the apparatus 100 for generating a question according to an embodiment of the present invention can be used in various platforms, for example, an artificial intelligence service system that can provide a question that can be recognized (or thought) by viewing visual information. etc. can be used.

본 발명에 첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방법으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록도의 각 블록 또는 흐름도 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.Combinations of each block in the block diagram attached to the present invention and each step in the flowchart may be performed by computer program instructions. These computer program instructions may be embodied in the encoding processor of a general purpose computer, special purpose computer, or other programmable data processing equipment, such that the instructions executed by the encoding processor of the computer or other programmable data processing equipment may correspond to each block or Each step of the flowchart creates a means for performing the functions described. These computer program instructions may also be stored in a computer-usable or computer-readable memory that may direct a computer or other programmable data processing equipment to implement a function in a particular manner, and thus the computer-usable or computer-readable memory. The instructions stored in the block diagram may produce an article of manufacture containing instruction means for performing the functions described in each block in the block diagram or in each step in the flowchart. The computer program instructions may also be mounted on a computer or other programmable data processing equipment, such that a series of operational steps are performed on the computer or other programmable data processing equipment to create a computer-executed process to create a computer or other programmable data processing equipment. It is also possible that instructions for performing the processing equipment provide steps for carrying out the functions described in each block of the block diagram and each step of the flowchart.

또한, 각 블록 또는 각 단계는 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.Further, each block or each step may represent a module, segment, or portion of code that includes one or more executable instructions for executing the specified logical function(s). It should also be noted that in some alternative embodiments it is also possible for the functions recited in blocks or steps to occur out of order. For example, it is possible that two blocks or steps shown one after another may in fact be performed substantially simultaneously, or that the blocks or steps may sometimes be performed in the reverse order according to the corresponding function.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 품질에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and various modifications and variations will be possible without departing from the essential quality of the present invention by those skilled in the art to which the present invention pertains. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

100: 질의문 생성 장치
101: 입출력부
110: 메모리
120: 프로세서100: query generating device
101: input/output unit
110: memory
120: processor

Claims

A method for generating a query performed by an apparatus for generating a query, the method comprising:
receiving an image, and
inputting the image to a pre-trained deep learning model, and extracting features of the image;
generating at least one word using the feature;
generating a first query sentence combining the at least one word;
generating a second query so that the first word used in the first query is not included with a preset probability;
The pre-trained deep learning model is,
Pre-learned by inputting a plurality of questions for each of the plurality of images as label data of a plurality of images and each of the plurality of images
How to create a query.

The method of claim 1,
The generating of the at least one word comprises:
generating the first word using the feature, and generating a second word using the first word and features of the image
How to create a query.

3. The method of claim 2,
The generating of the at least one word comprises:
The second word is generated to be time-series connected with the first word
How to create a query.

The method of claim 1,
The generating of the at least one word comprises:
When a word including a punctuation mark indicating the end of a predefined sentence is generated, the word generation is terminated.
How to create a query.

The method of claim 1,
The pre-trained deep learning model is,
It is a deep learning model to which the Transformer network structure is applied.
How to create a query.

an input/output unit for receiving an image;
Memory; and
a processor electrically connected to the memory;
The processor is
Input the image to a pre-trained deep learning model, extract features of the image, use the features to generate at least one word, and generate a first query that combines the at least one word, generating a second query so that the first word used in the first query is not included with a preset probability;
The pre-trained deep learning model is,
Pre-learned by inputting a plurality of questions for each of the plurality of images as label data of a plurality of images and each of the plurality of images
Query generator.

As a computer-readable recording medium storing a computer program,
The computer program, when executed by a processor,
receiving an image, and
inputting the image to a pre-trained deep learning model, and extracting features of the image;
generating at least one word using the feature;
generating a first query sentence combining the at least one word;
and instructions for causing the processor to perform a method including generating a second query so that the first word used in the first query is not included with a preset probability;
The pre-trained deep learning model is,
Pre-learned by inputting a plurality of questions for each of the plurality of images as label data of a plurality of images and each of the plurality of images
computer readable recording medium.

As a computer program stored in a computer-readable recording medium,
The computer program, when executed by a processor,
receiving an image, and
inputting the image to a pre-trained deep learning model, and extracting features of the image;
generating at least one word using the feature;
generating a first query sentence combining the at least one word;
and instructions for causing the processor to perform a method including generating a second query so that the first word used in the first query is not included with a preset probability;
The pre-trained deep learning model is,
Pre-learned by inputting a plurality of questions for each of the plurality of images as label data of a plurality of images and each of the plurality of images
computer program.