KR102155768B1

KR102155768B1 - Method for providing question and answer data set recommendation service using adpative learning from evoloving data stream for shopping mall

Info

Publication number: KR102155768B1
Application number: KR1020190121899A
Authority: KR
Inventors: 한경훈
Original assignee: 한경훈
Priority date: 2019-10-02
Filing date: 2019-10-02
Publication date: 2020-09-14

Abstract

Provided is a shopping mall question and answer recommendation service providing method using a question and answer data set evolving through learning, which comprises the steps of: receiving product data describing a product from a seller terminal; pre-processing text included in the product data, generating a word of the smallest unit after analyzing a morpheme through part-of-speech tagging, and analyzing the generated word; performing text analysis modeled by using a classifier after forming the analyzed text into a matrix; and extracting a previously stored question and answer data set based on the analyzed text based on similarity and recommending the question and answer data set to the seller terminal.

Description

Shopping mall Q&A recommendation service using Q&A data set evolving through learning {METHOD FOR PROVIDING QUESTION AND ANSWER DATA SET RECOMMENDATION SERVICE USING ADPATIVE LEARNING FROM EVOLOVING DATA STREAM FOR SHOPPING MALL}

본 발명은 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법에 관한 것으로, 상품이 업로드될 때 상품의 텍스트 또는 이미지를 분석하여 자동으로 상품의 질의응답 셋을 추천해주는 방법을 제공한다.The present invention relates to a method for providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning, and a method for automatically recommending a Q&A set of products by analyzing text or images of a product when a product is uploaded. to provide.

챗봇(Chatbot)이란 사람들이 일상에서 쓰는 언어로 동작하는 대화형 시스템 혹은 서비스를 의미하는데, 이미 주요 IT 기업들은 챗봇을 차세대 주력산업으로 삼아 치열한 플랫폼 경쟁을 시작했다. 변화에 민감한 패션 분야도 이와 같은 흐름을 놓치지 않고 챗봇을 개인화된 고객 접점 채널로 활용하려는 움직임을 보이고 있으며, 온라인 상거래 분야도 패션과 챗봇을 결합하는 것에 대한 관심이 지대하다. 하지만, 현재 대부분의 챗봇은 사용자의 기대수준을 만족시키지 못하고 있는데, 이는 챗봇에 대해 양적인 팽창보다 질적인 검증이 필요하다는 것을 의미한다. 고객 개개인에 대해 맞춤형 추천 서비스를 제공하는 패션 챗봇은 높은 수준의 품질이 더욱 필수적인데, 대부분의 쇼핑몰에서는 챗봇은 커녕 질의응답(Q&A)에도 고객이 궁금한 사항이 모두 답변되지 않은 경우가 많다.Chatbot refers to an interactive system or service that operates in the language people use in everyday life, and major IT companies have already started fierce platform competition with chatbots as the next major industry. The fashion field, which is sensitive to change, is showing a movement to use chatbots as personalized customer contact channels without missing this trend, and the online commerce field is also very interested in combining fashion and chatbots. However, most chatbots currently do not meet the expectations of users, which means that chatbots require qualitative verification rather than quantitative expansion. Fashion chatbots that provide personalized recommendation services to each customer are more essential to a high level of quality.In most shopping malls, not only chatbots, but also in Q&A, customers' questions are not answered in many cases.

이때, 질의응답을 제공할 때 자연어에 기반하여 서비스를 제공하는 방법이 연구 및 개발되었는데, 이와 관련하여, 선행기술인 한국공개특허 제2019-0059084호(2019년05월30일 공개)에는, 자연어 텍스트를 파싱하여 검증용 질문과 검증용 정답을 생성하는 구성, 질문 정답 생성 장치로부터의 검증용 질문을 자연어 처리 알고리즘에 따라 분석하여 정답 유형을 인식하고, 지식 베이스에서 인식한 정답 유형에 대응하는 정답을 생성하는 구성, 및 검증용 정답을 기반으로 질의 응답 장치로부터의 정답을 검증하고, 정답이 오답인 경우, 웹으로부터 검증용 정답을 포함하는 정보를 수집하고, 수집된 정보로 지식 베이스를 업데이트 하는 구성이 개시되어 있다.At this time, when providing a question and answer, a method of providing a service based on natural language has been researched and developed. In this regard, Korean Patent Application Publication No. 2019-0059084 (published on May 30, 2019), which is a prior art, describes natural language text. Parsing and generating a verification question and verification answer, the verification question from the question answer generation device is analyzed according to the natural language processing algorithm to recognize the correct answer type, and the correct answer corresponding to the correct answer type recognized in the knowledge base Configuration that verifies the correct answer from the Q&A device based on the generated configuration and the correct answer for verification, and if the correct answer is an incorrect answer, collects information including the correct answer for verification from the web, and updates the knowledge base with the collected information Is disclosed.

다만, 상술한 질의 응답 모델을 사용하게 되면 자동으로 답변을 할 수는 있으나, 데이터베이스에 적재되지 않은 답변은 할 수가 없기 때문에 이를 사용하는 고객들은 답답함을 호소하며 ARS로 전화를 걸어 상담원 연결을 기다린다. 또한, 쇼핑몰 사업자가 쇼핑몰에 상품을 업로드할 때 상품에 대한 상세정보를 올리기는 하지만 고객과 공급자인 사업자 간의 정보 불균형이나 비대칭성에 기반하여 사업자는 고객이 무엇을 모르는지를 잘 몰라서 질의응답에 고객이 원하는 정보를 모두 게재하지 않는 것이 대부분이며, 고객이 질문을 하면 그때마다 Q&A 등록을 하거나 Q&A에 등록을 하는 것조차 잊어서 같은 대답을 계속하여 반복하게 되므로 사업자 또한 불필요한 업무부하가 증가하는 등의 문제점이 있었다.However, if the above-described Q&A model is used, the answer can be automatically answered, but the answer that is not loaded in the database can not be answered, so customers using it complain of frustration and call ARS to wait for a call to a counselor. In addition, when a shopping mall operator uploads a product to the shopping mall, detailed information about the product is uploaded, but based on the information imbalance or asymmetry between the customer and the supplier, the business operator does not know what the customer does not know, so the customer wants to answer the question. Most of the information is not posted, and when a customer asks a question, they forget to register for Q&A or register for Q&A each time and repeat the same answer over and over again.Therefore, there was a problem that the operator also increased unnecessary workload. .

본 발명의 일 실시예는, 쇼핑몰 판매자가 상품을 플랫폼에 업로드할 때 업로드한 상품에 요구되는 Q&A 데이터 셋을 추천해줌으로써, 쇼핑몰 판매자는 일일이 질의응답 데이터를 작성하지 않아도 되고, 고객도 자신이 원하는 질의응답을 찾지 못하여 상담원 연결만을 기다리지 않아도 되며, 판매자 스스로 데이터 셋에서 부족하거나 추가가 되어야 하는 부분을 추가함으로써 Q&A의 질(Quality)과 양(Qantity)을 지속적으로 높일 수 있도록 학습의 인풋을 제공할 수 있으며, 추천 모델이 스스로 진화하면서 재학습되기 때문에 사용할수록 Q&A의 데이터베이스를 풍부하게 할 수 있으며, 나아가 챗봇을 도입하는 경우 챗봇이 답변가능한 질의가 늘어나기 때문에 고객센터로의 연결을 줄일 수 있어 사업자의 업무 부하를 낮출 수 있고, 고객도 바로 답변을 받을 수 있으므로 고객의 감성 품질까지 높일 수 있는, 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법을 제공할 수 있다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.According to an embodiment of the present invention, when a shopping mall seller uploads a product to the platform, it recommends the Q&A data set required for the uploaded product, so that the shopping mall seller does not have to write question-and-answer data, and the customer You do not have to wait for a contact with an agent because you cannot find the answer to the question, and the seller itself provides the input for learning so that the quality and quantity of Q&A can be continuously increased by adding parts that are insufficient or need to be added. Since the recommendation model is self-evolving and retrained, the Q&A database can be enriched as it is used. Furthermore, when a chatbot is introduced, the connection to the customer center can be reduced because the number of queries that can be answered by the chatbot increases. It is possible to provide a method of providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning, which can lower the work load of the customer and increase the customer's emotional quality because customers can receive answers immediately. However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예는, 판매자 단말로부터 상품을 설명하는 상품 데이터를 업로드받는 단계, 상품 데이터에 포함된 텍스트를 전처리를 수행하고, 품사 태깅을 통한 형태소 분석 후 최소 단위의 어절을 생성하고, 생성된 어절을 분석하는 단계, 분석된 텍스트를 매트릭스로 정형화시킨 후 분류기를 이용하여 모델링하는 텍스트 분석을 수행하는 단계, 및 분석된 텍스트를 기반으로 기 저장된 질의응답 데이터 셋을 유사도에 기초하여 추출하여 판매자 단말로 추천하는 단계를 포함한다.As a technical means for achieving the above-described technical problem, an embodiment of the present invention includes the step of receiving product data describing a product from a seller terminal, pre-processing the text included in the product data, and performing part-of-speech tagging. After morpheme analysis, the smallest unit of word is generated, the generated word is analyzed, the analyzed text is formed into a matrix and then modeled using a classifier is performed, and a pre-stored based on the analyzed text And extracting the Q&A data set based on the similarity and recommending it to the seller terminal.

본 발명의 다른 실시예는, 판매자 단말로부터 상품을 설명하는 상품 데이터를 업로드받는 업로드부, 상품 데이터에 포함된 텍스트를 전처리를 수행하고, 품사 태깅을 통한 형태소 분석 후 최소 단위의 어절을 생성하고, 생성된 어절을 분석하는 분석부, 분석된 텍스트를 매트릭스로 정형화시킨 후 분류기를 이용하여 모델링하는 텍스트 분석을 수행하는 수행부, 및 분석된 텍스트를 기반으로 기 저장된 질의응답 데이터 셋을 유사도에 기초하여 추출하여 판매자 단말로 추천하는 추천부를 포함한다.In another embodiment of the present invention, an upload unit that receives product data describing a product from a seller terminal, performs pre-processing of text included in the product data, and generates a word of a minimum unit after morpheme analysis through POS tagging, Based on the similarity, the analysis unit that analyzes the generated word, the execution unit that performs the text analysis modeled using the classifier after formalizing the analyzed text into a matrix, and the previously stored Q&A data set based on the analyzed text. It includes a recommendation unit that is extracted and recommended to the seller terminal.

본 발명의 또 다른 실시예는, 상품을 설명하는 상품 데이터를 업로드받는 업로드부, 상품 데이터에 포함된 텍스트를 전처리를 수행하고, 품사 태깅을 통한 형태소 분석 후 최소 단위의 어절을 생성하고, 생성된 어절을 분석하는 분석부, 분석된 텍스트를 매트릭스로 정형화시킨 후 분류기를 이용하여 모델링하는 텍스트 분석을 수행하는 수행부, 분석된 텍스트를 기반으로 기 저장된 질의응답 데이터 셋을 유사도에 기초하여 추출하여 판매자 단말로 추천하는 추천부를 포함하는 질의응답 추천 서비스 제공 서버, 및 상품을 설명하는 상품 데이터를 업로드하고, 질의응답 추천 서비스 제공 서버로부터 추천된 질의응답 데이터 셋을 수신하여 쇼핑몰 플랫폼의 질의응답 포맷으로 설정하거나 수신된 질의응답 데이터 셋을 편집하는 판매자 단말을 포함한다.Another embodiment of the present invention is an upload unit that receives product data describing a product, pre-processes text included in the product data, generates a word of the smallest unit after morpheme analysis through POS tagging, and generates An analysis unit that analyzes word phrases, an execution unit that performs text analysis modeled using a classifier after formalizing the analyzed text into a matrix, and a seller by extracting a previously stored Q&A data set based on the analyzed text based on similarity A Q&A recommendation service providing server including a recommendation unit recommended to a terminal, and product data describing the product are uploaded, and the Q&A data set recommended from the Q&A recommendation service providing server is received and set as a Q&A format of the shopping mall platform. Or a seller terminal for editing the received Q&A data set.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 쇼핑몰 사업자가 상품을 플랫폼에 업로드할 때 업로드한 상품에 요구되는 Q&A 데이터 셋을 추천해줌으로써, 쇼핑몰 사업자는 일일이 질의응답 데이터를 작성하지 않아도 되고, 고객도 자신이 원하는 질의응답을 찾지 못하여 상담원 연결만을 기다리지 않아도 되며, 사업자 스스로 데이터 셋에서 부족하거나 추가가 되어야 하는 부분을 추가함으로써 Q&A의 질(Quality)과 양(Qantity)을 지속적으로 높일 수 있도록 학습의 인풋을 제공할 수 있으며, 추천 모델이 스스로 진화하면서 재학습되기 때문에 사용할수록 Q&A의 데이터베이스를 풍부하게 할 수 있으며, 나아가 챗봇을 도입하는 경우 챗봇이 답변가능한 질의가 늘어나기 때문에 고객센터로의 연결을 줄일 수 있어 사업자의 업무 부하를 낮출 수 있고, 고객도 바로 답변을 받을 수 있으므로 고객의 감성 품질까지 높일 수 있다.According to any one of the above-described problem solving means of the present invention, when a shopping mall operator uploads a product to the platform, it recommends a Q&A data set required for the uploaded product, so that the shopping mall operator does not have to create question-and-answer data individually. , Customers do not have to wait for a call to an agent because they cannot find the Q&A they want, and the business operator can continuously increase the quality and quantity of Q&A by adding insufficient or additional parts in the data set. It can provide input for learning, and the recommendation model is retrained while evolving itself, so the more you use it, the more you can enrich the database of Q&A. In addition, when you introduce a chatbot, the number of queries that can be answered by the chatbot increases. By reducing the number of connections, the operator's workload can be lowered, and the customer's emotional quality can be improved as the customer can receive an answer right away.

도 1은 본 발명의 일 실시예에 따른 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 시스템을 설명하기 위한 도면이다.
도 2는 도 1의 시스템에 포함된 질의응답 추천 서비스 제공 서버를 설명하기 위한 블록 구성도이다.
도 3은 본 발명의 일 실시예에 따른 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 도 1의 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 시스템에 포함된 각 구성들 상호 간에 데이터가 송수신되는 과정을 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따른 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법을 설명하기 위한 동작 흐름도이다.1 is a view for explaining a system for providing a Q&A recommendation service for a shopping mall using a Q&A data set evolving through learning according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a Q&A recommendation service providing server included in the system of FIG. 1.
3 is a view for explaining an embodiment in which a shopping mall Q&A recommendation service using a Q&A data set evolving through learning according to an embodiment of the present invention is implemented.
FIG. 4 is a diagram illustrating a process of transmitting and receiving data between components included in a system for providing a Q&A recommendation service using a Q&A data set evolving through learning of Fig. 1 according to an embodiment of the present invention.
5 is an operation flowchart illustrating a method of providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included, and one or more other features, not excluding other components, unless specifically stated to the contrary. It is to be understood that it does not preclude the presence or addition of any number, step, action, component, part, or combination thereof.

명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본 발명의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본 발명의 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다. The terms "about", "substantially" and the like, as used throughout the specification, are used in or close to the numerical value when manufacturing and material tolerances specific to the stated meaning are presented, and are used in the sense of the present invention. To assist, accurate or absolute figures are used to prevent unfair use of the stated disclosure by unscrupulous infringers. As used throughout the specification of the present invention, the term "step (to)" or "step of" does not mean "step for".

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1개의 유닛이 2개 이상의 하드웨어를 이용하여 실현되어도 되고, 2개 이상의 유닛이 1개의 하드웨어에 의해 실현되어도 된다. In the present specification, the term "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, or two or more units may be realized using one hardware.

본 명세서에 있어서 단말, 장치 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말, 장치 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말, 장치 또는 디바이스에서 수행될 수도 있다. In this specification, some of the operations or functions described as being performed by the terminal, device, or device may be performed instead in a server connected to the terminal, device, or device. Likewise, some of the operations or functions described as being performed by the server may also be performed by a terminal, device, or device connected to the server.

본 명세서에서 있어서, 단말과 매핑(Mapping) 또는 매칭(Matching)으로 기술된 동작이나 기능 중 일부는, 단말의 식별 정보(Identifying Data)인 단말기의 고유번호나 개인의 식별정보를 매핑 또는 매칭한다는 의미로 해석될 수 있다.In this specification, some of the operations or functions described as mapping or matching with the terminal means mapping or matching the unique number of the terminal or the identification information of the individual, which is the identification information of the terminal. Can be interpreted as.

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 시스템을 설명하기 위한 도면이다. 도 1을 참조하면, 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 시스템(1)은, 적어도 하나의 판매자 단말(100), 및 질의응답 추천 서비스 제공 서버(300)를 포함할 수 있다. 다만, 이러한 도 1의 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 시스템(1)은, 본 발명의 일 실시예에 불과하므로, 도 1을 통하여 본 발명이 한정 해석되는 것은 아니다.1 is a diagram illustrating a system for providing a Q&A recommendation service for a shopping mall using a Q&A data set evolving through learning according to an embodiment of the present invention. Referring to FIG. 1, a shopping mall Q&A recommendation service providing system 1 using a Q&A data set evolving through learning includes at least one seller terminal 100, and a Q&A recommendation service providing server 300. can do. However, since the shopping mall Q&A recommendation service providing system 1 using the Q&A data set evolving through learning of Fig. 1 is only an embodiment of the present invention, the present invention is limitedly interpreted through Fig. 1 no.

이때, 도 1의 각 구성요소들은 일반적으로 네트워크(network, 200)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 적어도 하나의 판매자 단말(100)은 네트워크(200)를 통하여 질의응답 추천 서비스 제공 서버(300)와 연결될 수 있다. 그리고, 질의응답 추천 서비스 제공 서버(300)는, 네트워크(200)를 통하여 적어도 하나의 판매자 단말(100)과 연결될 수 있다.In this case, each component of FIG. 1 is generally connected through a network 200. For example, as shown in FIG. 1, at least one seller terminal 100 may be connected to a Q&A recommendation service providing server 300 through a network 200. In addition, the Q&A recommendation service providing server 300 may be connected to at least one seller terminal 100 through the network 200.

여기서, 네트워크는, 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 RF, 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, 5GPP(5th Generation Partnership Project) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, NFC 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Here, the network refers to a connection structure in which information exchange is possible between respective nodes such as a plurality of terminals and servers, and examples of such networks include RF, 3rd Generation Partnership Project (3GPP) network, and Long Term (LTE). Evolution) network, 5GPP (5th Generation Partnership Project) network, WIMAX (World Interoperability for Microwave Access) network, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network) , Personal Area Network (PAN), Bluetooth (Bluetooth) network, NFC network, satellite broadcasting network, analog broadcasting network, Digital Multimedia Broadcasting (DMB) network, and the like, but are not limited thereto.

하기에서, 적어도 하나의 라는 용어는 단수 및 복수를 포함하는 용어로 정의되고, 적어도 하나의 라는 용어가 존재하지 않더라도 각 구성요소가 단수 또는 복수로 존재할 수 있고, 단수 또는 복수를 의미할 수 있음은 자명하다 할 것이다. 또한, 각 구성요소가 단수 또는 복수로 구비되는 것은, 실시예에 따라 변경가능하다 할 것이다.In the following, the term “at least one” is defined as a term including the singular number and the plural number, and even if the term “at least one” does not exist, each component may exist in the singular or plural, and may mean the singular or plural. It will be self-evident. In addition, it will be possible to change according to the embodiment that each component is provided in a singular or plural.

적어도 하나의 판매자 단말(100)은, 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 상품 데이터를 업로드하는 단말일 수 있다. 이때, 적어도 하나의 판매자 단말(100)은, 업로드된 상품 데이터를 기반으로 추천된 질의응답 셋을 수신하는 단말일 수 있다. 그리고, 적어도 하나의 판매자 단말(100)은, 수신된 질의응답 셋을 그대로 상품 데이터를 업로드한 페이지에 삽입하도록 하거나, 질의응답 셋을 추가, 삭제, 또는 변형을 통한 수정을 가하는 단말일 수도 있다. The at least one seller terminal 100 may be a terminal that uploads product data using a web page, an app page, a program, or an application related to a shopping mall Q&A recommendation service using a Q&A data set evolving through learning. At this time, the at least one seller terminal 100 may be a terminal that receives a set of recommended Q&A based on the uploaded product data. In addition, the at least one seller terminal 100 may be a terminal for inserting the received Q&A set into a page where product data is uploaded as it is, or adding, deleting, or modifying a Q&A set.

여기서, 적어도 하나의 판매자 단말(100)은, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 판매자 단말(100)은, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 판매자 단말(100)은, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(smartphone), 스마트 패드(smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one seller terminal 100 may be implemented as a computer capable of accessing a remote server or terminal through a network. Here, the computer may include, for example, a navigation system, a notebook equipped with a web browser, a desktop, a laptop, and the like. At this time, the at least one seller terminal 100 may be implemented as a terminal capable of accessing a remote server or terminal through a network. At least one vendor terminal 100 is, for example, a wireless communication device with guaranteed portability and mobility, and includes navigation, personal communication system (PCS), global system for mobile communications (GSM), personal digital cellular (PDC), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) ) All types of handheld-based wireless communication devices such as terminals, smartphones, smartpads, and tablet PCs may be included.

질의응답 추천 서비스 제공 서버(300)는, 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 제공하는 서버일 수 있다. 그리고, 질의응답 추천 서비스 제공 서버(300)는, 판매자 단말(100)로부터 상품 데이터를 업로드받는 서버일 수 있다. 또한, 질의응답 추천 서비스 제공 서버(300)는, 판매자 단말(100)에서 업로드된 상품 데이터를 분석하는 서버일 수 있고, 분석 결과에 대응하는 질의응답 데이터 셋을 추출하여 판매자 단말(100)로 추천하는 서버일 수 있다. 이를 위하여, 질의응답 추천 서비스 제공 서버(300)는, 유기적으로 상호연동되는 텍스트 분석 엔진, OCR(Optical Character Recognition) 텍스트 엔진, 유사이미지 검색 엔진, 카테고리 분석 엔진을 이용하여 상품 데이터를 분석하는 서버일 수 있다. 그리고, 질의응답 추천 서비스 제공 서버(300)는, 상품 데이터에 텍스트가 포함된 경우에는 텍스트 분석 엔진을 이용하고, 상품 데이터에 텍스트가 포함되어 있으나 이미지 처리된 문자 형태로 존재하는 경우에는 OCR 텍스트 엔진을 이용하고, 상품 데이터 내에 문자 형태 또는 이미지 형태의 문자도 존재하지 않는 경우에는 유사이미지 검색 엔진을 이용하고, 텍스트도 없고 이미지도 없는 경우에는 상품명으로 카테고리를 추출함으로써 질의응답 데이터 셋을 추천하는 서버일 수 있다. 또한, 질의응답 추천 서비스 제공 서버(300)는, 판매자 단말(100)에서 추천된 질의응답 데이터 셋이 편집되는 경우, 판매자 단말(100)에서 업로드한 상품과 편집된 질의응답 데이터 셋을 이용하여 학습 모델을 재학습시킴으로써 이후 질의응답 데이터 셋이 추천될 때 더욱 정확하고 풍부한 질의응답 데이터 셋이 될 수 있도록 진화시키는 서버일 수 있다.The Q&A recommendation service providing server 300 may be a server that provides a shopping mall Q&A recommendation service web page, an app page, a program, or an application using a Q&A data set evolving through learning. In addition, the Q&A recommendation service providing server 300 may be a server that receives product data uploaded from the seller terminal 100. In addition, the Q&A recommendation service providing server 300 may be a server that analyzes product data uploaded from the seller terminal 100, and extracts a Q&A data set corresponding to the analysis result and recommends it to the seller terminal 100. It can be a server that does. To this end, the Q&A recommendation service providing server 300 is a server that analyzes product data using an organically interconnected text analysis engine, an OCR (Optical Character Recognition) text engine, a similar image search engine, and a category analysis engine. I can. In addition, the Q&A recommendation service providing server 300 uses a text analysis engine when text is included in the product data, and an OCR text engine when text is included in the product data but exists in the form of image-processed text. A server that recommends a question-and-answer data set by using a similar image search engine if there is no text or image text in the product data, and extracting a category by product name if there is no text or image Can be In addition, the Q&A recommendation service providing server 300 learns using the product uploaded from the seller terminal 100 and the edited Q&A data set when the Q&A data set recommended by the seller terminal 100 is edited. It may be a server that evolves to become a more accurate and rich Q&A data set when the Q&A data set is recommended later by retraining the model.

여기서, 질의응답 추천 서비스 제공 서버(300)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다.Here, the Q&A recommendation service providing server 300 may be implemented as a computer that can access a remote server or terminal through a network. Here, the computer may include, for example, a navigation system, a notebook equipped with a web browser, a desktop, a laptop, and the like.

도 2는 도 1의 시스템에 포함된 질의응답 추천 서비스 제공 서버를 설명하기 위한 블록 구성도이고, 도 3은 본 발명의 일 실시예에 따른 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.2 is a block diagram illustrating a Q&A recommendation service providing server included in the system of Fig. 1, and Fig. 3 is a shopping mall Q&A using a Q&A data set evolving through learning according to an embodiment of the present invention. A diagram for describing an embodiment in which a recommendation service is implemented.

도 2를 참조하면, 질의응답 추천 서비스 제공 서버(300)는, 업로드부(310), 분석부(320), 수행부(330), 추천부(340), OCR 엔진부(350), 유사이미지부(360), 카테고리 분석부(370), 및 자동진화부(380)를 포함할 수 있다.2, the Q&A recommendation service providing server 300 includes an upload unit 310, an analysis unit 320, an execution unit 330, a recommendation unit 340, an OCR engine unit 350, and a similar image. A unit 360, a category analysis unit 370, and an automatic evolution unit 380 may be included.

본 발명의 일 실시예에 따른 질의응답 추천 서비스 제공 서버(300)나 연동되어 동작하는 다른 서버(미도시)가 적어도 하나의 판매자 단말(100)로 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 전송하는 경우, 적어도 하나의 판매자 단말(100)은 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 설치하거나 열 수 있다. 또한, 웹 브라우저에서 실행되는 스크립트를 이용하여 서비스 프로그램이 적어도 하나의 판매자 단말(100)에서 구동될 수도 있다. 여기서, 웹 브라우저는 웹(WWW: world wide web) 서비스를 이용할 수 있게 하는 프로그램으로 HTML(hyper text mark-up language)로 서술된 하이퍼 텍스트를 받아서 보여주는 프로그램을 의미하며, 예를 들어 넷스케이프(Netscape), 익스플로러(Explorer), 크롬(chrome) 등을 포함한다. 또한, 애플리케이션은 단말 상의 응용 프로그램(application)을 의미하며, 예를 들어, 모바일 단말(스마트폰)에서 실행되는 앱(app)을 포함한다.A shopping mall using a question-and-answer data set evolving through learning in at least one seller terminal 100 by a Q&A recommendation service providing server 300 or another server (not shown) interlocked with each other according to an embodiment of the present invention When transmitting a Q&A recommendation service application, program, app page, web page, etc., at least one seller terminal 100 uses a Q&A data set evolving through learning, a shopping mall Q&A recommendation service application, program, and app page. , Web pages, etc. can be installed or opened. In addition, a service program may be driven in at least one seller terminal 100 using a script executed in a web browser. Here, the web browser is a program that enables you to use the web (WWW: world wide web) service, and refers to a program that receives and displays hypertext described in HTML (hyper text mark-up language). For example, Netscape , Explorer, chrome, etc. In addition, the application means an application on the terminal, and includes, for example, an app that is executed on a mobile terminal (smartphone).

도 2를 참조하면, 본 발명의 일 실시예에 따른 질의응답 추천 서비스 제공 서버(300)는, 상술한 4 개의 엔진을 상호적으로 연계되도록 구동시켜 질의응답 데이터 셋을 추천한다. 이때, 텍스트 분석 엔진, OCR 텍스트 엔진, 유사이미지 검색 엔진, 카테고리 분석 엔진을 이하에서 설명을 하도록 하며 공통으로 적용되는 것은 반복하여 설명하지 않는다.Referring to FIG. 2, a Q&A recommendation service providing server 300 according to an embodiment of the present invention recommends a Q&A data set by driving the above-described four engines to be mutually linked. At this time, the text analysis engine, OCR text engine, similar image search engine, and category analysis engine will be described below, and those commonly applied are not described repeatedly.

<텍스트 분석 엔진><Text Analysis Engine>

판매자 단말(100)로부터 상품을 설명하는 상품 데이터를 업로드받을 수 있다. 이때, 상품 데이터는 텍스트 또는 이미지일 수 있고, 텍스트는 이미지 내에 포함된 텍스트 또는 이미지와 별도로 포함된 상세설명 등의 텍스트일 수도 있다. 그리고, 이미지는 상품의 외관을 촬영한 이미지 또는 상술한 바와 같이 텍스트가 함께 포함된 이미지일 수 있다. Product data describing a product may be uploaded from the seller terminal 100. In this case, the product data may be text or an image, and the text may be text included in the image or text such as detailed description included separately from the image. In addition, the image may be an image photographing the exterior of the product or an image including text as described above.

분석부(320)는, 상품 데이터에 포함된 텍스트를 전처리를 수행하고, 품사 태깅을 통한 형태소 분석 후 최소 단위의 어절을 생성하고, 생성된 어절을 분석할 수 있다. 여기서, 어절을 분석하는 경우, 예를 들어, TF-IDF(Term Frequency Inverse Document Frequency) 알고리즘을 이용할 수 있다. 이때, 분석부(320)에서 전처리를 수행할 때 띄어쓰기 오류 보정, 철자 오류 보정, 불용어 제거 등을 수행할 수 있는데 띄어쓰기 오류 보정 모델은 상품 데이터에 포함된 텍스트에서 모든 띄어쓰기를 삭제한 후, 기계학습 모델을 이용하여 띄어 쓸 위치를 결정하는 순차 레이블링(sequence labeling) 방법을 이용할 수 있다. 순차 레이블링 방법은, 문장의 처음부터 끝까지 음절과 음절 사이에 띄어쓰기 여부를 판단하여 표지(그림에서는 0 또는 1)을 부착하는데, 표지 결정을 위해서는 현 위치를 기준으로 앞뒤 n개의 음절을 자질로 사용하며, 기계학습 방법으로는 양방향(처음부터 현재위치와 마지막부터 현재위치까지) 전이 정보(transition information)를 잘 반영하는 것으로 알려져 있는 CRFs(Conditional Random Fields)나 Structural SVM(Support Vector Machine)이 이용될 수 있으나 이에 한정되지 않는다. 철자오류 보정 모델은 사전이나 통계 정보를 주로 이용할 수 있는데, 사전을 사용한 철자 오류 보정은 말뭉치(Corpus)에서 검색되지 않거나 일반 사전에 존재하지 않는 경우에는 철자 오류라고 가정할 수 있으며, 철자오류로 가정된 문자열은 편집거리(edit-distance)나 메타폰(metaphone) 알고리즘 등을 사용하여 해당 단어와 거리가 가까운 사전 단어로 대체될 수 있다. 이러한 사전 기반의 방법은 모든 철자 오류 후보 단어들을 미리 구축해야 하기 때문에, 통계 정보에 기반 한 확률 모델들도 이용될 수 있다. 예를 들어, 철자 오류 보정 방법인 노이즈 채널(noise channel) 모델을 이용할 수도 있고, 노이즈 채널 모델은 입력된 노이즈(철자 오류)를 생성할 확률을 최대화하는 단어를 찾는 생성 모델의 일종인데, 올바른 단어와 오류 단어 사이의 음절(또는 알파벳) 단위 편집 거리를 기반으로 계산된다. 예를 들어, 올바른 단어 ‘actress’로부터 오류 단어 ‘acress’가 생성될 확률(=‘ct’에서 ‘c’가 삭제되거나 ‘ct’가 ‘c’로 대체될 확률)를 이용하여 계산될 수 있다.The analysis unit 320 may pre-process the text included in the product data, generate a word of the smallest unit after morpheme analysis through part-of-speech tagging, and analyze the generated word. Here, when analyzing a word, for example, a TF-IDF (Term Frequency Inverse Document Frequency) algorithm may be used. At this time, when performing the preprocessing in the analysis unit 320, it is possible to perform space error correction, spelling error correction, stop word removal, etc. In the space error correction model, after deleting all spaces from the text included in the product data, machine learning You can use the sequential labeling method to determine the space to write using the model. The sequential labeling method determines whether there is a space between the syllables and the syllables from the beginning to the end of the sentence and attaches a cover (0 or 1 in the figure).To determine the cover, n syllables before and after the current position are used as features. , As a machine learning method, CRFs (Conditional Random Fields) or Structural SVM (Support Vector Machine), which are known to reflect transition information well in both directions (from the beginning to the current position and from the last to the current position) can be used. However, it is not limited thereto. The spelling error correction model can mainly use dictionary or statistical information.If the spelling error correction using a dictionary is not found in the corpus or does not exist in a general dictionary, it can be assumed that it is a spelling error, and it is assumed as a spelling error. The resulting character string may be replaced with a dictionary word that is close to the corresponding word by using an edit-distance or metaphone algorithm. Since this dictionary-based method has to construct all candidate misspelling words in advance, probability models based on statistical information can also be used. For example, a noise channel model, which is a method of correcting spelling errors, can be used, and the noise channel model is a type of generation model that finds words that maximize the probability of generating input noise (spelling errors). It is calculated based on the edit distance in syllables (or alphabets) between the and error words. For example, it may be calculated using the probability that the error word'acress' is generated from the correct word'actress' (= the probability that'c' is deleted from'ct' or'ct' is replaced with'c'). .

이때, 한국어는 조사와 어미가 발달한 전형적인 교착어(agglutinative language or affixing language)로서, 영어와 달리 단어(word)가 아닌 형태소(morpheme)가 감성분석이나 문장의 구조분석에서 중요한 역할을 한다. 형태소는 뜻을 가진 가장 작은 말의 단위로 정의되며, 단어는 하나 이상의 형태소로 이루어진다. 예를 들어, ‘예쁘고’가 단어라면 이에 대한 형태소는 ‘예쁘(=형용사)’와 ‘고(=연결 어미)’가 된다. 이러한 형태소의 중요성을 반영하여 한국어를 대상으로 하는 텍스트 분석에서는 단어 보다는 형태소를 기본 단위로 하는 것이 일반적이고, 텍스트 분석에서 입력으로 영어 텍스트에서 주로 사용되는 단어 벡터(word vector)가 아닌, 형태소 벡터(morpheme vector)가 사용될 수 있으나, 단어 벡터를 배제하는 것은 아니다. 형태소 벡터는 형태소에 대한 벡터 표현이라 할 수 있으며, 형태소 단위로 표현된 문장에 기본적인 단어 벡터 도출 메커니즘을 적용하여 도출할 수 있다. 이때, 단어 벡터(word vector)는 단어(word)를 실수(real number) 원소로 이루어진 수십에서 수백 차원의 벡터(vector)로 변환한 것이다. 이렇게 변환된 단어 벡터들은 단순한 수치 이상의 의미적 자질들을 포함하고 있다. 실제로 의미적(semantic) 문법적(syntactic) 성질이 비슷한 단어들은 벡터 공간상에서 유클리디안(Eucledian) 거리나 코사인유사도(cosine similarity) 거리가 가까운 벡터들로 표현된다. 예를 들어, Word2Vec 모델은 내부적으로 CBOW(Continuous Bag-Of-Words)나 Skip-Gram 이라는 신경망 구조를 이용해 단어들의 벡터 표현을 학습하게 된다. CBOW는 컨텍스트 단어들로부터 타겟 단어를 예측하는데, 예를 들어, “This lipstick is beautiful in (_) and has a good sustainability.”라는 문장에서 (_) 앞 뒤에 나오는 컨텍스트 단어들로부터 (_)에 해당되는 ‘color’와 같은 타겟 단어를 예측하는 방식이다. Skip-Gram 구조는 타겟 단어로부터 컨텍스트 단어들을 역으로 예측한다.At this time, Korean is a typical agglutinative language or affixing language in which research and endings are developed. Unlike English, morphemes rather than words play an important role in sentiment analysis or sentence structure analysis. A morpheme is defined as the smallest unit of speech with a meaning, and a word consists of one or more morphemes. For example, if'pretty' is a word, the morphemes for this are'pretty (= adjective)' and'go (= concatenated ending)'. Reflecting the importance of such morphemes, in text analysis targeting Korean, it is common to use morphemes as basic units rather than words, and morphemes vectors (not word vectors mainly used in English texts as input in text analysis) morpheme vector) can be used, but it does not exclude word vectors. A morpheme vector can be said to be a vector expression for a morpheme, and can be derived by applying a basic word vector derivation mechanism to a sentence expressed in morpheme units. In this case, the word vector is a word converted into a vector of tens to hundreds of dimensions of real number elements. The word vectors converted in this way contain more than just numerical values. In fact, words with similar semantic and syntactic properties are expressed as vectors with close Euclidean distance or cosine similarity distance in vector space. For example, the Word2Vec model internally learns the vector representation of words using a neural network structure called CBOW (Continuous Bag-Of-Words) or Skip-Gram. CBOW predicts the target word from context words, for example, in the sentence “This lipstick is beautiful in (_) and has a good sustainability.”, it corresponds to (_) from context words that precede and follow (_). It is a method of predicting target words such as'color'. The Skip-Gram structure inversely predicts context words from the target word.

예를 들어, “이 립스틱은 컬러도 예쁘고 지속력도 좋다.”라는 문장이 있을 때, 어절 단위로 분리한 단어를 CBOW에 대한 입력으로 사용하면 ‘립스틱은’이나 ‘컬러도’와 같은 단어 벡터를 얻게 될 것이다. 그런데 이렇게 하면‘립스틱을’, ‘립스틱이’나 ‘컬러가’, ‘컬러를’, ‘컬러는’ 등도 모두 다른 단어로 인식하게 되므로, 고려해야할 단어 수가 비효율적으로 증가하게된다. 그런데 이 문장을 형태소분석하면, 선택한 형태소분석기나 분절 정도에 따라 약간 달라지겠지만,“이(=관형사) 립스틱(=일반 명사) 은(=보조사) 컬러(=일반 명사) 도(=보조사) 예쁘(=형용사) 고(=연결 어미) 지속력(=일반 명사) 도(=보조사) 좋(=형용사) 다(=종결 어미) .(=마침표)”라는 결과를 얻을 수 있다. 이로부터 “이 립스틱 은 컬러 도 예쁘 고 지속력 도 좋 다 .”와 같은 형태소 단위의 문장 표현을 얻을 수 있고, 이것을 CBOW에 입력하면 형태소 단위의 형태소 벡터를 얻을 수 있게 된다. 여기서, 형태소 벡터를 CNN(Convolutional Neural Network) 딥러닝 모델의 입력으로 사용할 수도 있다. For example, when there is a sentence "This lipstick has a beautiful color and has good lasting power", if you use words separated by word as input to CBOW, a word vector such as'Lipstick is' or'Color degree' is used. You will get However, in this way,'Lipstick을','Lipstick이','Color','Color', and'Color is' are all recognized as different words, which inefficiently increases the number of words to consider. By the way, if this sentence is morphologically analyzed, it will be slightly different depending on the selected morpheme analyzer or the degree of segmentation, but, “Lee (= spectral sentence) lipstick (= general noun) silver (= general noun) color (= general noun) also (= auxiliary) pretty ( =adjective) high (=connection ending) persistence (= general noun) degree (= auxiliary) good (= adjective) c (= ending ending) .(=period)”. From this, a sentence expression in morpheme units such as “this lipstick has a beautiful color and good lasting power” can be obtained, and by inputting this into CBOW, a morpheme vector in morpheme units can be obtained. Here, the morpheme vector may be used as an input of a CNN (Convolutional Neural Network) deep learning model.

CNN은 다층퍼셉트론(multi-layer perceptron)이라고도 불리는 전형적인 피드포워드 뉴럴넷(feedforward neural network)은 인접한 층(layer)의 유닛(unit)이 서로 모두 연결된(fully-connected) 형태이다. CNN은 이러한 전결합층(fully-connected layer)외에 인접층의 특정한 유닛만이 연결되는 합성곱층(convolution layer)과 풀링층(pooling layer)을 갖는다. 합성곱층에서는 가중치 행렬 형태로 표현된 이미지의 가중치 영역에 해당 이미지 보다 사이즈가 작은 가중치 행렬 형태의 필터(filter)를 순차적으로 이동시키면서 대응되는 원소끼리(element-wise) 곱한 후 합을 구하는 합성곱 연산이 일어난다. 그리고 풀링층에서는 합성곱으로 얻어진 값들에 대해 특정 사이즈 영역별로 순차적으로 최대값이나 평균값 등의 대표값을 추출하는 연산을 수행한다. 합성곱층과 풀링층은 서로 쌍을 이루며 여러번 반복되거나, 합성곱층만 여러번 반복된 후 풀링층이 오는 구조도 가능하다. 이러한 연산을 통해 입력된 이미지로부터 유용한 자질(feature)들이 계층적으로 추출되며, 추출된 자질들은 하나 이상의 전결합층(fully-connected layer)을 통해 입력 데이터를 타겟 클래스로 분류하는 작업에 활용된다. 문장을 단어 벡터의 결합으로 표현한 가중치 행렬에 CNN을 적용하면, 필터(filter)의 윈도우 사이즈에 따라 지역적인 n-그램(n-gram) 자질(feature)들을 추출할 수 있다. 그리고 이러한 자질들을 기반으로 다시 상위 레벨의 자질들을 추출하는 것이 가능하다. 합성곱 연산은 새로운 자질(feature)을 추출하기 위해 단어 윈도우에 필터를 적용하고, 필터는 해당 문장의 모든 가능한 윈도우에 대해 적용되어 피쳐맵(feature map)을 생성한다. 풀링 연산은 이러한 피쳐맵에 대해 전역(global) 최대값을 구하거나 특정 사이즈별로 지역(local) 대표값을 구한다. 윈도우 사이즈가 다른 여러 개의 필터를 적용하면 여러 특성의 자질들을 추출할 수 있고, 이렇게 추출된 자질들은 전결합층으로 연결되어 클래스 분류에 사용된다.CNN is a typical feedforward neural network, also called a multi-layer perceptron, in which units of adjacent layers are fully-connected. In addition to this fully-connected layer, CNN has a convolution layer and a pooling layer in which only specific units of an adjacent layer are connected. In the convolution layer, a convolution operation that calculates the sum after multiplying the corresponding elements element-wise while sequentially moving a filter in the form of a weight matrix smaller in size than the image in the weight region of the image expressed in the form of a weight matrix. This happens. In addition, the pooling layer performs an operation of sequentially extracting representative values such as a maximum value or an average value for each specific size area with respect to values obtained by convolution. The convolutional layer and the pooling layer may be paired with each other and repeated several times, or a structure in which the pooling layer comes after only the convolutional layer is repeated several times is possible. Useful features are hierarchically extracted from the input image through such an operation, and the extracted features are used to classify input data into target classes through one or more fully-connected layers. When CNN is applied to a weight matrix representing a sentence as a combination of word vectors, local n-gram features can be extracted according to the window size of a filter. And based on these qualities, it is possible to extract higher-level qualities again. The convolution operation applies a filter to a word window to extract a new feature, and the filter is applied to all possible windows of the sentence to create a feature map. The pooling operation obtains a global maximum value for this feature map or a local representative value for each specific size. If several filters with different window sizes are applied, features of various features can be extracted, and the features extracted in this way are connected to the pre-combination layer and used for class classification.

그리고, 분석부(320)는 띄어쓰기 오류, 철자 오류, 불용어 제거 등이 수행된 후에 문서를 단어로 나누어주는 과정, 단어 임베딩, TF-IDF 가중치 부여 단계를 수행할 수 있다. 우선, CNN은 2차원 형태의 입력 데이터를 요구한다. 따라서 이를 만족시키기 위해 분석부(320)는 문서를 단어로 나누고 각각의 단어에 번호를 매기는 임베딩 작업(word embedding)을 수행할 수 있다. 이때, 임베딩 방법으로는 단어가 등장한 순서에 따라서 순차적으로 숫자를 매기는 방법을 사용할 수도 있고, 단어를 기본 단위로 하는 단어 수준의 모델 또는 개개의 문자를 기본 단위로 처리하는 문자 수준의 모델을 이용할 수도 있다. 그리고, 분석부(320)는 임베딩된 단어들이 해당 문서에서 얼마나 중요한지를 나타내기 위해 TF-IDF 가중치 값을 구하고 임베딩된 단어들로 이루어진 배열에 이 가중치 값을 붙여 하나의 컨볼루션 뉴럴 네트워크의 입력 값을 만들어 사용할 수 있다. 여기서, TF-IDF는 정보 검색과 텍스트 마이닝에서 이용하는 가중치로, 여러 문서로 이루어진 문서군이 있을 때 어떤 단어가 특정 문서 내에서 얼마나 중요한 것인지를 나타내는 통계적 수치이다. 단어 빈도(Term Frequency)는 특정 단어가 문서 내에 얼마나 자주 등장하는지를 나타내는 값이며, 역문서 빈도(Inverse Document Frequency)는 다른 문서에는 많지 않고 해당 문서에서 자주등장하는 단어를 의미한다. TF-IDF의 값은 단어 빈도와 역문서 빈도의 곱으로 사용한다. 예를 들어 가죽가방이라는 단어는 원피스 질의응답 데이터 셋에서는 잘 나오지 않기 때문에 IDF 값이 커지고 동시에 해당 문서의 키워드가 될 수 있다. 반대로 패션 카테고리에서는 가죽 가방이 많은 쇼핑몰 사이트에서 나타나기 때문에 IDF 값은 낮아지게 된다. TF는 기본적으로 문서 내의 단어 총 빈도수를 사용해 계산할 수 있지만 단어수가 많아질 경우 값이 지속적으로 커질 수 있기 때문에, 불린 빈도(Boolean Frequency)를 이용하여 단어의 출현 여부만으로 TF 값을 0과 1로 정의할 수 있다. 또는, 로그 스케일 빈도(Logarithmically Scaled Frequency)를 이용하여 TF 값에 로그를 취함으로써 문서의 크기 해결도 하고 실제 빈도수도 반영할 수 있다. 마지막으로, 증가 빈도(Augmented Frequency)를 이용함으로써, 문서 길이에 따른 단어의 상대적 빈도를 나타내는 방식으로 최대 스케일이 1을 넘지 않도록 할 수도 있다. 결과적으로, TF-IDF는 단어가 특정 문서 내에서 빈도수가 높고 전체 문서 중 해당 단어가 포함된 문서가 적을수록 높아진다. 이를 통해 모든 문서에서 자주 나타나는 단어들을 걸러낼 수 있다.In addition, the analysis unit 320 may perform a process of dividing the document into words, word embedding, and TF-IDF weighting steps after space error, spelling error, stop word removal, and the like are performed. First, CNN requires input data in a two-dimensional form. Accordingly, in order to satisfy this, the analysis unit 320 may perform word embedding of dividing the document into words and numbering each word. At this time, as the embedding method, a method of sequentially numbering the words according to the order in which they appear, or a word-level model that uses words as a basic unit or a character-level model that processes individual letters as basic units. May be. In addition, the analysis unit 320 obtains a TF-IDF weight value to indicate how important the embedded words are in the corresponding document, attaches the weight value to an array of embedded words, and attaches the weight value to an input value of one convolutional neural network. You can create and use it. Here, TF-IDF is a weight used in information retrieval and text mining, and is a statistical value indicating how important a word is in a specific document when there is a document group consisting of several documents. Word frequency (Term Frequency) is a value indicating how often a specific word appears in a document, and Inverse Document Frequency refers to a word that appears frequently in a document that is not many in other documents. The value of TF-IDF is used as the product of word frequency and reverse document frequency. For example, the word leather bag does not appear well in the one-piece Q&A data set, so the IDF value increases and can be a keyword for the document at the same time. Conversely, in the fashion category, the IDF value decreases because leather bags appear on many shopping mall sites. TF can be calculated using the total frequency of words in the document by default, but since the value can be continuously increased when the number of words increases, TF values are defined as 0 and 1 only by the occurrence of words using Boolean Frequency. can do. Alternatively, the size of the document can be resolved and the actual frequency can be reflected by taking a log of the TF value using a logarithmically scaled frequency. Finally, by using the Augmented Frequency, the maximum scale may not exceed 1 in a manner that indicates the relative frequency of words according to the document length. As a result, TF-IDF increases as the frequency of the word is high in a specific document and the number of documents containing the word in the entire document decreases. This allows you to filter out words that appear frequently in all documents.

수행부(330)는, 분석된 텍스트를 매트릭스로 정형화시킨 후 분류기를 이용하여 모델링하는 텍스트 분석을 수행할 수 있다. 이때, 벡터 공간 모델이 이용될 수 있는데, 벡터 공간 모델은 텍스트 마이닝 분야에서 단어 공간 모델로 불리며 주어진 텍스트 문서를 단어들의 벡터로 나타낸 대수적인 모델이다. 벡터 공간 모델은 정보 필터링과 정보 검색, 색인, 유사도 순위에 사용될 수 있고, 어떤 단어가 문서에 포함되면 해당 단어는 0이 아닌 벡터 값을 갖게 된다. 단어 가중치라 불리는 이 값을 산출하는 방법에는 다양할 수 있으나, 본 발명의 일 실시예에서는 상술한 TF-IDF 방식을 이용한다. 이때, 분류기는 CNN(Convolution Neural Network) 기반 문서분류모델일 수 있지만 이에 한정되지는 않는다. CNN은, 상술한 바와 같이, 인공지능 심층 신경망의 한 종류이며, 2차원 데이터의 학습에 적합한 구조를 가지고 있으며, 역전파(Back propagation algorithm)을 통해 훈련될 수 있으며, 기 구축된 데이터베이스에 포함된 질의응답 데이터 셋을 분석된 텍스트를 이용하여 검색하고 유사도에 근거하여 판매자에게 추천을 해주기 위해 이용될 수 있다.The execution unit 330 may perform text analysis modeled using a classifier after forming the analyzed text into a matrix. In this case, a vector space model may be used. The vector space model is called a word space model in the field of text mining, and is an algebraic model representing a given text document as a vector of words. The vector space model can be used for information filtering, information search, indexing, and similarity ranking. If a word is included in a document, the word has a non-zero vector value. There may be various methods for calculating this value called the word weight, but in an embodiment of the present invention, the TF-IDF method described above is used. In this case, the classifier may be a document classification model based on a Convolution Neural Network (CNN), but is not limited thereto. CNN, as described above, is a kind of artificial intelligence deep neural network, has a structure suitable for learning two-dimensional data, can be trained through a back propagation algorithm, and is included in a pre-built database. It can be used to search the question-and-answer data set using the analyzed text and make recommendations to sellers based on similarity.

추천부(340)는 분석된 텍스트를 기반으로 기 저장된 질의응답 데이터 셋을 유사도에 기초하여 추출하여 판매자 단말(100)로 추천할 수 있다. 예를 들어, "여성 원피스"에 대한 상품상세설명 텍스트가 분석되었고, 분석 결과 "여성 원피스"에 기 저장된 질의응답 셋이 3 개 추출되었고, 각각 유사도가 90%, 80%, 70%였다고 하면, 유사도가 높은 순으로 질의응답 셋을 정렬하여 추천할 수 있다.The recommendation unit 340 may extract a previously stored Q&A data set based on the analyzed text based on the similarity and recommend it to the seller terminal 100. For example, if the product detail text for "Women's One Piece" was analyzed, and as a result of the analysis, three Q&A sets previously stored in "Women's One Piece" were extracted, and that the similarity was 90%, 80%, and 70%, respectively, Q&A sets can be sorted and recommended in the order of high similarity.

OCR 엔진부(350)는, 업로드부(310)에서 판매자 단말(100)로부터 상품을 설명하는 상품 데이터를 업로드받은 후, 상품 데이터에 텍스트가 존재하지 않고 상품 이미지만 존재하는 경우, 상품 이미지를 OCR(Optical Character Recognition) 텍스트 기반 이미지 분석 모델을 적용하여 텍스트로 변환할 수 있다. 대부분의 쇼핑몰 페이지를 보면, 하나의 페이지에 적게는 수 개 내지 수십 개의 상품 이미지를 업로드하고, 각 상품 이미지 내에 문자가 포함되어 있을 뿐, 텍스트가 별도로 설명되어 있는 경우는 거의 없다. 예를 들어, 0061(상품번호)-자스민 원피스(상품명)-FREE(사이즈)-베이지(색상)이 썸네일에 간단히 이미지 형태의 문자로 상품 이미지와 함께 이미지로 업로드되고, 심지어 신축성, 두께감, 비침 정도, 안감, 세탁방법, 소재 등도 모두 이미지화된 상품 이미지에 포함된 경우가 많다. 따라서, OCR 엔진부(350)는, 상품 이미지 + 상품 설명 텍스트가 모두 이미지화된 경우, 텍스트를 이미지 파일로부터 읽어내기 위한 작업을 우선 시행한다. The OCR engine unit 350 uploads product data describing a product from the seller terminal 100 in the upload unit 310, and then when there is no text in the product data and only a product image exists, the product image is OCR. (Optical Character Recognition) Can be converted into text by applying a text-based image analysis model. In most shopping mall pages, at least a few to dozens of product images are uploaded on one page, and only text is included in each product image, and text is rarely described separately. For example, 0061 (product number)-jasmine dress (brand name)-FREE (size)-beige (color) is simply uploaded as an image along with the product image as a text in the form of an image on the thumbnail. , Lining, washing method, and material are often included in the imaged product image. Accordingly, when the product image + product description text are all imaged, the OCR engine unit 350 first performs an operation for reading the text from the image file.

OCR이란 광학 문자 인식의 약자로 광학적으로 처리된 문자를 인식하는 방법이다. 컴퓨터에서 표현되는 문자를 인식하는 온라인 인식과 달리, 광학 인식은 기록또는 인쇄가 완료된 후에 오프라인에서 수행할 수 있는데, 손으로 인쇄하거나 인쇄 한 문자는 모두 인식 할 수 있지만 정확도는 입력 된 문서의 품질에 직접적으로 좌우된다. 광학 스캐너를 사용하여 아날로그 문서를 디지털화하는 것으로 텍스트 영역을 각 심볼로 분할 프로세스를 통해 추출할 수 있고, 전처리과정은 텍스트 추출을 용이하게 하기 위해, 심볼의 노이즈를 제거하기 위한 과정을 진행할 수 있고, 텍스트를 추출하는 과정을 거쳐 각 테스트를 비교하여 원본 텍스트의 단어와 숫자를 재구성하는 후처리과정을 거치게 된다. 이때, OCR 텍스트 기반 이미지 분석 모델은, 크게 텍스트 탐지(Text Detection)와 텍스트 인식(Text Recognition)의 과정을 진행한다. 전자는, 이미지로부터 텍스트가 존재한 위치를 찾아내는 딥러닝의 CTPN(Connectionist Text Proposal Network) 모델을 이용할 수 있다. 여기서, CTPN은 크게 이미지 내 텍스트와 비텍스트 구별을 위한 CNN 모델과 텍스트 추정영역을 구체화하는 RNN(Recurrent Neural Network) 모델로 구성될 수 있으며, 텍스트로 추정되는 영역을 복수의 픽셀로 세로 분할면으로 쪼개 CNN 모델을 통해 각 분할면이 텍스트에 가까운지 판단하고, RNN을 통하여 양 옆의 분할면이 연결된 텍스트인지 확인할 수 있다. 본 발명의 일 실시예에서는, VGG16, BLSTM(Bidirectional Long Short-Term Memory), FC의 구조를 이용할 수 있다.OCR stands for optical character recognition and is a method of recognizing optically processed characters. Unlike online recognition, which recognizes characters expressed on a computer, optical recognition can be performed offline after recording or printing is complete. Hand-printed or printed characters can all be recognized, but accuracy depends on the quality of the entered document. It depends directly. By digitizing an analog document using an optical scanner, the text area can be extracted through the process of dividing the text area into each symbol, and the pre-processing process can proceed to remove the noise of the symbol to facilitate text extraction, After the text is extracted, each test is compared and the words and numbers of the original text are reconstructed. At this time, the OCR text-based image analysis model largely performs a process of text detection and text recognition. The former can use a deep learning CTPN (Connectionist Text Proposal Network) model that finds the location of text from an image. Here, the CTPN can be largely composed of a CNN model for distinguishing between text and non-text in an image, and a recurrent neural network (RNN) model that specifies the text estimation region, and the region estimated as text is divided into a plurality of pixels as a vertical segmentation plane Through the split CNN model, it is possible to determine whether each split plane is close to the text, and check whether the split plane on both sides is connected text through the RNN. In an embodiment of the present invention, a structure of VGG16, Bidirectional Long Short-Term Memory (BLSTM), and FC may be used.

VCC16은 옥스퍼드 대학에서 개발한 프로그램으로, 입력은 RGB 이미지이고, 구조는 8~16개의 컨볼루션 레이어(Convolutional Layer), 3개의 완전 연결된 레이어(3 Fully-Connected Layer)를 포함하는데, 상술한 구조에 한정되지는 않는다. 이때, VCC16은, 이미지의 특징을 추출하는데 이용되고, BLSTM(Bidirectional Long Short-Term Memory)는 순환신경망(RNN)의 일종으로 시계열 데이터를 분류하고 예측하기 위해 사용하는 알고리즘이다. 이때, BLSTM은 좌측에서 우측 방향으로의 순방향 상태 시퀀스와 우측에서 좌측 방향으로의 역방향 상태 시퀀스로 처리하는 2개의 LSTM 출력을 연결함으로써 작업을 수행할 수 있는데, 단방향 LSTM은 과거의 시간 인스턴스에서 온 문맥 정보만 고려되는 반면에, BLSTM은 순방향과 역방향에서 전달하는 과거와 미래의 문맥 정보를 모두 이용하여 학습할 수 있다. 여기서, LSTM은 RNN 방식에서 학습 중에 발생하는 기울기 소실 문제를 해결하기 위한 구조를 가진 순환 신경망의 일종이며, 잠재적인 장시간 기억 의존성을 유지한다. 따라서, 시계열 신호를 분류, 처리 및 예측하기 위해 LSTM은 히스토리로부터 학습할 수 있다. LSTM은 시간이 지남에 따라 그 상태를 유지하기 위해 순환적 은닉 계층에 자체 연결을 갖는 특별한 메모리 셀과 이전 상태를 기억하며 각 계층의 입력과 출력에 정보의 흐름을 제어하는 데 사용되는 3개의 게이트구조(입력 게이트, 망각 게이트 및 출력 게이트)를 가질 수 있으며, 이러한 순환 출력 계층을 갖는 LSTM은 입력 텍스트의 문맥 정보를 함께 포착할 수 있다. 그리고, BLSTM에서 시계열 데이터의 시간 관계를 학습한 결과를 상술한 완전 연결된 레이어를 거쳐 얻어낼 수 있다.VCC16 is a program developed by Oxford University. The input is an RGB image, and the structure includes 8 to 16 convolutional layers and 3 fully-connected layers. It is not limited. At this time, VCC16 is used to extract features of an image, and BLSTM (Bidirectional Long Short-Term Memory) is an algorithm used to classify and predict time series data as a type of cyclic neural network (RNN). At this time, the BLSTM can perform the task by connecting two LSTM outputs that process a forward state sequence from left to right and a reverse state sequence from right to left. One-way LSTM is a context from a past time instance. While only information is considered, BLSTM can learn using both past and future contextual information transmitted in the forward and reverse directions. Here, LSTM is a kind of recurrent neural network with a structure to solve the gradient loss problem that occurs during learning in the RNN method, and maintains the potential long-term memory dependence. Thus, to classify, process, and predict time series signals, LSTMs can learn from history. The LSTM remembers the previous state and special memory cells that have their own connection to the cyclic concealment layer to maintain their state over time, and three gates are used to control the flow of information to the inputs and outputs of each layer. It can have structures (input gate, forget gate and output gate), and an LSTM with such a cyclic output layer can capture the contextual information of the input text together. In addition, a result of learning the temporal relationship of time series data in the BLSTM can be obtained through the above-described fully connected layer.

후자인 텍스트 인식(Text Recognition)은 텍스트 이미지를 텍스트로 추출하는 CRNN(Convolution Recurrent Neural Network) 모델을 이용할 수 있다. 이때, CRNN은 CNN과 RNN을 결합한 딥러닝 구조로 비디오와 같이 공간 정보와 시간 정보가 모두 중요한 데이터를 추출할 때 이용될 수 있다. 예를 들어, CNN을 통해 쇼핑몰 페이지에 업로드된 상품 이미지의 특징정보를 추출한 후, 추출한 값을 RNN의 입력으로 사용하여 데이터의 텍스트 시퀀스를 예측 및 추출하는 구조일 수 있다. 본 발명의 일 실시예에 따른 CRNN은, 컨볼루션층(Convolution Layer), 순환층(Recurrent Layer), 전사층(Transcription Layer)로 이루어질 수 있으며, 컨볼루션층에서 입력받은 이미지로부터 특징 시퀀스를 추출한 후, 순환층에서 특징 시퀀스에서 매개의 프레임에 대하여 값을 예측하며, 예측한 값들의 시퀀스는 전사층을 통하여 정답 레이블로 재작성되어 최종적으로 텍스트를 추출하게 된다. 그리고, CRNN은 CTPN(Connectionist Text Proposal Network)와는 다르게 전체 이미지가 아닌 문자 영역의 이미지를 기반으로 학습하기 때문에, 이미지 속 문자를 추출하면 이를 상품 이미지 속에 존재하는 필드별로 분류하는 개체명 인식 과정이 필요할 수 있는데, BLSTM, CNNs, CRF(Conditional Random Field) 모델을 더 이용할 수 있으며, 문자와 해당 개체명을 기반으로 학습할 수 있다. The latter, text recognition, may use a Convolution Recurrent Neural Network (CRNN) model that extracts text images as text. At this time, the CRNN is a deep learning structure that combines CNN and RNN, and can be used when extracting data where both spatial and temporal information are important, such as video. For example, after extracting feature information of a product image uploaded to a shopping mall page through a CNN, the extracted value may be used as an input of an RNN to predict and extract a text sequence of data. The CRNN according to an embodiment of the present invention may be composed of a convolution layer, a recurrent layer, and a transcription layer, and after extracting a feature sequence from an image input from the convolution layer, , In the circular layer, a value is predicted for each frame in the feature sequence, and the sequence of predicted values is rewritten as a correct answer label through the transfer layer to finally extract the text. And, unlike CTPN (Connectionist Text Proposal Network), since CRNN learns based on the image of the text area, not the whole image, when the text in the image is extracted, an entity name recognition process is required to classify it by field existing in the product image. However, BLSTM, CNNs, and CRF (Conditional Random Field) models can be further used, and learning can be performed based on characters and corresponding entity names.

개체명(Named Entity)은 이전에 미리 정의된 비슷한 속성을 지닌 다른 개체들의 집합에서 하나의 개체를 식별할 수 있는 단어, 또는 문장 내에서 더 이상 분해할 수 없는 어구를 의미한다. 개체명 인식 과정은 문헌 내에 표현된 개체명을 식별하는 과정을 말하며, 개체명 간의 관계를 통한 정보 추출 과정에 선행되어야 하는 과정이다. 상품 페이지에서 상품번호, 상품명, 사이즈, 색상이 일정 위치에 미리 정의된 속성을 가지면서 기재되는 것이 일반적인데, 개체명 인식은 문장 내의 정보에 대한 일종의 연속적인 레이블링(Sequence Labeling) 문제로 분류하여 어떠한 위치에 어떠한 속성을 가진 개체명이 있는지를 학습하면, 하나의 페이지에 수십개의 상품이 등록되는 의류나 패션 페이지에서 각각의 상품의 속성정보를 얻을 수 있다. 일차 선형 체인의 CRF 기반의 개체명 인식 방법을 이용하면, 문자열 정보에 대한 조건부 확률인 Log-Likelihood 값을 최대화하여 개체명 인식의 성능을 향상시킬 수 있다. 이때, 연속적인 레이블링 문제에 좋은 성능을 보이고 있는 심층학습 기법인 LSTM(Long-Short Term Unit)을 양방향으로 활용하고, 이에 조건부 랜덤 필드를 부착하는 형태의 BLSTM-CRF 모델을 구성하여 개체명 인식을 수행할 수도 있다. 개체명 인식에서는 개체 유형별 개체명 정보를 포함하는 개체명 사전 정보를 주요한 자질로 활용한다. 텍스트 기반 엔진에서도 언급된 BLSTM 모델을 기반으로 개체명 사전에 대한 매핑 정보를 이진 벡터 형태로 구성하여 입력 벡터에 병합할 수도 있다.Named Entity refers to a word that can identify an entity from a previously predefined set of other entities with similar properties, or a phrase that can no longer be resolved within a sentence. The entity name recognition process refers to the process of identifying entity names expressed in documents, and is a process that must precede the process of extracting information through the relationship between entity names. In the product page, it is common for the product number, product name, size, and color to be listed with predefined attributes at a certain location. Recognition of entity names is classified as a kind of sequence labeling problem for information in a sentence. By learning which attribute has an entity name in a location, attribute information of each product can be obtained from a clothing or fashion page where dozens of products are registered on one page. If the CRF-based entity name recognition method of a linear linear chain is used, the performance of entity name recognition can be improved by maximizing the Log-Likelihood value, which is a conditional probability for string information. At this time, the deep learning technique that shows good performance in the continuous labeling problem, LSTM (Long-Short Term Unit), is used in both directions, and a BLSTM-CRF model in the form of attaching a conditional random field is constructed to recognize the entity name. You can also do it. In entity name recognition, entity name dictionary information including entity name information for each entity type is used as a major feature. Based on the BLSTM model mentioned in the text-based engine, the mapping information for the entity name dictionary can be configured in a binary vector format and merged into the input vector.

<유사이미지 검색 엔진><Similar image search engine>

유사이미지부(360)는, 업로드부(310)에서 판매자 단말(100)로부터 상품을 설명하는 상품 데이터를 업로드받은 이후, 상품 데이터에 텍스트가 존재하지 않고 상품 이미지만 존재하는 경우, 기 저장된 데이터베이스 중 딥러닝의 학습한 적어도 하나의 모델을 이용하여 상품 이미지의 적어도 하나의 특징(Features)을 추출하고, 추출된 적어도 하나의 특징을 벡터공간(Vector Space)에 표현하고, 유사거리(Similarity Distance)를 계산하는 알고리즘 또는 클러스터링(Clustering) 알고리즘을 이용하여 상품 이미지와 유사한 유사이미지를 검색할 수 있고, 유사이미지부(360)는 검색된 유사이미지에 기 매핑된 텍스트를 추출할 수 있다. 여기서, 딥러닝은 이미지넷(ImageNet)을 이용할 수 있고, 이미지의 특징을 추출할 때, 전이 학습(Transfer Learning)을 이용할 수 있고, DensNet, Resnet50, VGG16, Inceptionv3, Xception 등을 이용할 수 있다. 여기서, 유사거리를 계산하는 알고리즘은 최근접 이웃(K-nearest Neighbor) 알고리즘일 수 있고, 클러스터링 알고리즘은 K-평균 클러스터링 알고리즘일 수 있다. The similar image unit 360, after receiving product data describing the product from the seller terminal 100 in the upload unit 310, does not have text in the product data, and only product images, among the previously stored database. At least one feature of a product image is extracted using at least one model learned by deep learning, the extracted at least one feature is expressed in a vector space, and a similarity distance is calculated. A similar image similar to a product image may be searched using a calculating algorithm or a clustering algorithm, and the similar image unit 360 may extract a text previously mapped to the searched similar image. Here, for deep learning, ImageNet can be used, transfer learning can be used to extract features of an image, and DensNet, Resnet50, VGG16, Inceptionv3, Xception, etc. can be used. Here, the algorithm for calculating the similarity distance may be a K-nearest Neighbor algorithm, and the clustering algorithm may be a K-means clustering algorithm.

이때, 전이학습이란 딥러닝을 특징 추출자로 사용하고, 추출된 특징만을 가지고 다른 모델을 학습하는 것으로, 기존의 만들어진 모델을 사용하여 새로운 모델을 만들 때 학습을 빠르게 하며 예측을 높이는 방법이며, 이미 학습이 완료된 모델(Pre-Training Model)을 가지고, 목표하는 학습에 미세조정을 주어 학습시키는 방법이 전이학습이며, 신경망의 이러한 재학습 과정을 세부 조정(fine-tuning)이라 하고, 실제로 CNN을 구축하는 경우 대부분 처음부터(random initialization) 학습하지는 않게 되며, 이미지넷과 같은 대형 데이터셋을 사용할 수 있다. 정리하면, 전이학습은, 합성곱 레이어의 무작위 초기화 대신, 많은 양의 영상자료로 학습된 값으로 초기화하여 부족한 훈련자료의 한계를 극복하고 효율적으로 특징맵을 추출할 수 있는 학습 방법이다. At this time, transfer learning is a method that uses deep learning as a feature extractor and learns another model with only the extracted features. When creating a new model using an existing model, it is a method of speeding up learning and improving prediction. With this completed model (Pre-Training Model), transfer learning is a method of learning by giving fine adjustments to the target learning, and this re-learning process of a neural network is called fine-tuning, and the actual CNN is constructed. In most cases, training is not performed from the beginning (random initialization), and large datasets such as ImageNet can be used. In summary, transfer learning is a learning method that overcomes the limitations of insufficient training data and efficiently extracts feature maps by initializing with values learned from a large amount of image data instead of random initialization of the convolutional layer.

K-최근접 이웃 알고리즘은 최근접하는 개수의 이웃을 이용한다는 의미이다. 이 방법은 학습 데이터 집합에 있는 표본들 간의 유사도에 따라 라벨이 붙여져 있지 않는 표본들을 분류하는 매우 직관적인 방법이라고 할 수 있다. 즉, 라벨이 없는 표본이 주어질 경우, 학습 데이터 집합에서 가장 가까운 라벨이 있는 표본을 찾아내고, 부분 집합 내에 가장 빈도가 많이 나타나는 클래스에 할당하는 방법이다. K-최근접 이웃 알고리즘은 단지 상수, 라벨이 있는 학습 데이터 집합의 표본, 거리 척도 만이 필요해서 간단하다. 이때, 분류 문제의 각 데이터가 대표하는 같은 클래스 내 데이터의 부분 집합은 유클리디안(Euclidean) 거리로 계산할 수 있는데, 동일 클래스 내 가장 큰 거리값과 다른 클래스 중 가장 작은 거리값을 갖는 거리의 중간값으로 거리를 결정할 수 있다.The K-nearest neighbor algorithm means that the nearest neighbors are used. This method is a very intuitive way to classify unlabeled samples according to the similarity between samples in the training data set. In other words, if a sample without a label is given, it finds the sample with the closest label in the training data set and assigns it to the class with the most frequency in the subset. The K-nearest neighbor algorithm is simple because it only needs a constant, a sample of a labeled training data set, and a distance measure. At this time, the subset of data in the same class represented by each data in the classification problem can be calculated as a Euclidean distance, the middle of the distance having the largest distance value in the same class and the smallest distance value among other classes. You can determine the distance by value.

<카테고리 분석 엔진><Category Analysis Engine>

카테고리 분석은, 쇼핑몰에서 판매자가 제품의 상세내용에 텍스트 또는 이미지를 제공하지 않고 제품의 이미지마저도 없는 경우가 존재할 수 있는 경우를 전제한다. 이 경우에는 쇼핑몰에서 사용하는 제품명(상품명) 자체가 제품의 카테고리를 추출할 수 있고, 카테고리를 추출할 수 있으면 해당 카테고리에 매핑된 질의응답 데이터 셋을 추출할 수 있다는 점에서 착안한다. 이를 위하여, 카테고리 분석부(370)는, 업로드부(310)에서 판매자 단말(100)로부터 상품을 설명하는 상품 데이터를 업로드받은 후, 상품 데이터에 텍스트 및 이미지가 존재하지 않고, 상품을 지칭하는 상품명만 존재하는 경우, 상품명에 포함된 불용어(Stop-word)를 제거를 포함한 전처리를 수행할 수 있다. 그리고, 카테고리 분석부(370)는, 전처리가 수행된 상품명에 포함된 텍스트를 음절을 기준으로 원-핫 인코딩(One-hot Encoding)을 수행하여 벡터로 변경된 상품명에 포함된 텍스트를 머신러닝의 학습 데이터로 입력하고, 학습 데이터로 입력된 텍스트에 기초하여 분류모델을 통하여 카테고리 예측을 수행할 수 있으며, 예측된 카테고리에 기 매핑되어 저장된 텍스트를 추출할 수 있다. 여기서, 분류모델은 심층 신경망(Deep Neural Network) 분류모델일 수 있다.The category analysis is based on the premise that there may be a case in which the seller does not provide text or images in the detailed content of the product in the shopping mall and there is no image of the product. In this case, the point is that the product name (product name) used in the shopping mall itself can extract the category of the product, and if the category can be extracted, the Q&A data set mapped to the corresponding category can be extracted. To this end, the category analysis unit 370 uploads product data describing the product from the seller terminal 100 in the upload unit 310, and then the text and image do not exist in the product data, and the product name indicating the product. If only exists, preprocessing including removal of the stop-word included in the product name can be performed. In addition, the category analysis unit 370 performs one-hot encoding on the text included in the product name on which the pre-processing has been performed, based on the syllable, and learns the text included in the product name changed into a vector by machine learning. A category prediction may be performed through a classification model based on a text input as data and input as training data, and a text stored by pre-mapped to the predicted category may be extracted. Here, the classification model may be a deep neural network classification model.

이때, 심층 신경망(DNN)은, 입력층(Input Layer)와 출력층(Output Layer)사이에 복수의 은닉층(Hidden Layer)로 이루어진 인공신경망(Artificial Neural Network, ANN)이다. DNN은 일반적인 인공신경망과 마찬가지로 복잡한 비선형 관계(Non-linear Relationship)을 모델링할 수 있고, 신경망 작동 방법은 FFN(Feed-Forward Network), 역전파(Backpropagation), 및 RNN을 포함할 수 있다. 예를 들어, 데이터에는 수치형 데이터와 텍스트 데이터나 범주형 데이터가 있다. 머신러닝이나 딥러닝 알고리즘은 수치로 된 데이터만 이해할 수 있기 때문에, 기계가 이해할 수 있는 형태로 데이터를 변환해 주는 반면, 범주형 데이터는 원-핫 인코딩 형태로 변환해 준다. 원-핫 인코딩이란 해당되는 하나의 데이터만 1로 변경해 주고 나머지는 0으로 채워주는 것을 의미하는데, 예를 들어 과일이라는 카테고리에 사과, 배, 감이 들어있다고 가정하면, 각각의 과일인 사과, 배, 감으로 컬럼을 만들어주고 해당 되는 과일에만 1로 표기를 해주고 나머지 과일은 0으로 표기해 주는 것이다. 예를 들어, 상품명으로부터 개체명인식을 수행한다고 가정하면, 추출된 개체명 중에는 지식이나 의미를 내포하고 있다고 보기 어려운 단어들도 많으며, 종목별로 추출된 개체명 개수 차이 때문에 벡터화 과정에서 차원이 달라지는 문제가 있다. 따라서 종목별 발생빈도가 높은 상위 몇 개의 개체명으로 제한하여 차원을 일치시키는 제약조건을 설정하면, 결과적으로 몇 개 종목에서 추출된 개체명을 원-핫 인코딩을 통해 벡터화시킬 수 있으며, 학습을 위한 입력 데이터가 완성된다. In this case, the deep neural network (DNN) is an artificial neural network (ANN) composed of a plurality of hidden layers between an input layer and an output layer. DNN can model a complex non-linear relationship like a general artificial neural network, and a neural network operation method may include a feed-forward network (FFN), backpropagation, and an RNN. For example, data includes numeric data and text data or categorical data. Since machine learning or deep learning algorithms can only understand numerical data, they transform the data into a form that machines can understand, while categorical data transforms it into a one-hot encoding form. One-hot encoding means that only one data is changed to 1 and the rest are filled with 0. For example, assuming that apples, pears, and persimmons are included in the fruit category, each fruit is apple and pear. Create a column with, persimmon, and mark only the fruits as 1, and mark the remaining fruits as 0. For example, assuming that entity name recognition is performed from a product name, there are many words that are difficult to say that it contains knowledge or meaning among the extracted entity names, and the dimension varies in the vectorization process due to the difference in the number of entity names extracted for each category. There is. Therefore, if you set a constraint that matches the dimension by limiting to the high-order number of entity names with high frequency of occurrence for each category, as a result, entity names extracted from a number of categories can be vectorized through one-hot encoding, and input for learning. The data is complete.

이때, 테스트 데이터를 학습된 DNN 모델에 입력했을 때, 신경망의 뉴런에서 출력되는 결과 값을 기준으로 뉴런의 행동을 두 가지 부류로 나눌 수 있다. 첫 번째는 결과 값이 예상되는 범위에 속할 경우이며 이것을 메인 케이스(main case) 영역에 위치해 있다고 한다. 두 번째는 결과 값이 예상되지 않는 범위에 속할 경우이며 이를 코너 케이스 영역에 위치해 있다고 한다. 여기서, DNN의 뉴런은 입력 데이터, 가중치(Weight), 바이어스(Bias)를 이용하여 활성화 함수(Activation Function)을 통해 결과 값을 도출하는 구조를 가지고 있다. 학습된 DNN 모델의 경우 가중치와 바이어스는 고정되고, 입력 데이터만 뉴런의 결과값에 영향을 줄 수 있고, 입력 데이터에 의해 뉴런이 반응할 때, 즉 결과값이 달라질 때 뉴런이 활성화된다고 하고, 입력 데이터에 활성화되는 경우 카테고리를 예측할 수 있게 된다. 다시 말하면, 입력 데이터에 의해 뉴런의 결과값이 나눠진 구간에 속한다면, 그 구간이 활성화된다고 간주할 수 있으므로, 카테고리를 예측할 수 있는 것이다. 여기서, 뉴런이 활성화되지 않는 경우, 즉 코너 케이스는 배제한다.At this time, when test data is input to the trained DNN model, the behavior of neurons can be divided into two categories based on the result value output from the neurons of the neural network. The first is when the result value falls within the expected range and is said to be located in the main case area. The second is when the result value falls within the unexpected range, and it is said to be located in the corner case area. Here, the neuron of the DNN has a structure in which a result value is derived through an activation function using input data, weight, and bias. In the case of a trained DNN model, the weight and bias are fixed, and only the input data can affect the neuron's result value, and it is said that the neuron is activated when the neuron reacts by the input data, that is, when the result value changes. When activated on data, categories can be predicted. In other words, if it belongs to the section where the result value of the neuron is divided by the input data, it can be considered that the section is activated, so that the category can be predicted. Here, when neurons are not activated, that is, corner cases are excluded.

상술한 4 개의 엔진들은 상호 보완적인 역할을 수행하면서 유기적으로 연결되고, 판매자가 어떠한 종류의 상품 데이터를 업로드하던지간에, 심지어는 상품명만 적는 경우라도 적절한 Q&A를 추천해줄 수 있고 판매자는 일일이 답변을 하지 않아도 되며, 소비자도 리스트형으로 나열된 상품문의 및 답변에서 자신이 원하는 답변을 찾고자 노력을 하지 않아도 된다.The four engines described above are organically connected while performing a complementary role, and regardless of what kind of product data the seller uploads, even if he just writes the product name, it can recommend appropriate Q&A, and the seller does not answer one by one. There is no need, and consumers do not have to try to find the answers they want from product inquiries and answers listed in a list type.

자동진화부(380)는, 추천부(340)에서 분석된 텍스트를 기반으로 기 저장된 질의응답 데이터 셋을 유사도에 기초하여 추출하여 판매자 단말(100)로 추천한 후, 판매자 단말(100)에서 질의응답 데이터를 기 저장된 질의응답 데이터 셋에 추가하거나, 기 저장된 질의응답 데이터를 변형 또는 삭제하는 경우, 추가, 변형 또는 삭제된 데이터를 텍스트 기반 모델을 재학습하는 입력 데이터로 설정할 수 있다. 그리고, 자동진화부(380)는, 입력 데이터를 텍스트 기반 모델에 입력하여 주기적 또는 실시간으로 재학습시킬 수 있다.The automatic evolution unit 380 extracts a pre-stored Q&A data set based on the text analyzed by the recommendation unit 340 and recommends it to the seller terminal 100, and then makes a query from the seller terminal 100. When the response data is added to the previously stored Q&A data set or the previously stored Q&A data is transformed or deleted, the added, modified, or deleted data can be set as input data for retraining the text-based model. In addition, the automatic evolution unit 380 may periodically or real-time retrain by inputting the input data into the text-based model.

이하, 상술한 도 2의 질의응답 추천 서비스 제공 서버의 구성에 따른 동작 과정을 도 3을 예로 들어 상세히 설명하기로 한다. 다만, 실시예는 본 발명의 다양한 실시예 중 어느 하나일 뿐, 이에 한정되지 않음은 자명하다 할 것이다.Hereinafter, the operation process according to the configuration of the Q&A recommendation service providing server of FIG. 2 will be described in detail with reference to FIG. 3 as an example. However, it will be apparent that the embodiment is only any one of various embodiments of the present invention, and is not limited thereto.

도 3을 참조하면, (a) 질의응답 추천 서비스 제공 서버(300)는 판매자 단말(100)로부터 상품 데이터가 업로드되는 경우, (b) 상품 데이터에 포함된 텍스트 또는 이미지를 분석하고, 기 저장되어 데이터베이스화된 Q&A를 검색하여, 유사도가 높은순으로 판매자 단말(100)로 추천해준다. 이때, 판매자 단말(100)은 이를 그대로 이용할 수도 있고, 자신이 판매하는 상품에 맞게 편집(추가, 삭제, 변형)하여 사용할 수도 있다. 이때, 판매자 단말(100)에서 편집이 발생한 경우, 판매자 단말(100)의 상품 데이터 분석 결과와 편집된 내용을 재학습시킴으로써, 더욱 풍부해진 Q&A 추천 리스트를 제공할 수 있다. (c)와 같이 패션 분야 뿐만 아니라, 각 카테고리나 분야별로 Q&A 데이터 셋을 축적 및 학습함으로써 자동으로 진화하는 데이터베이스를 구축할 수 있으며, 사용자인 판매자의 직접 참여로 자생적으로 데이터베이스의 질과 양이 늘어날 수 있다.Referring to FIG. 3, (a) when product data is uploaded from the seller terminal 100, the Q&A recommendation service providing server 300 (b) analyzes text or images included in the product data, and is stored in advance. Databaseized Q&A is searched and recommended to the seller terminal 100 in the order of similarity. At this time, the seller terminal 100 may use it as it is, or may edit (add, delete, or modify) according to the product it sells. In this case, when editing occurs in the seller terminal 100, the product data analysis result and the edited contents of the seller terminal 100 are re-learned, thereby providing a richer Q&A recommendation list. As shown in (c), it is possible to build a database that automatically evolves by accumulating and learning not only the fashion field, but also the Q&A data set for each category or field, and the quality and quantity of the database will increase spontaneously with the direct participation of the user, the seller. I can.

이와 같은 도 2 및 도 3의 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1을 통해 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.For information that is not explained about the method of providing a shopping mall Q&A recommendation service using the Q&A data set evolving through the learning of Figs. 2 and 3, a shopping mall using a Q&A data set that evolves through learning through Fig. Since the Q&A recommendation service providing method is the same as the description or can be easily inferred from the description, the following description will be omitted.

도 4는 본 발명의 일 실시예에 따른 도 1의 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 시스템에 포함된 각 구성들 상호 간에 데이터가 송수신되는 과정을 나타낸 도면이다. 이하, 도 4를 통해 각 구성들 상호간에 데이터가 송수신되는 과정의 일 예를 설명할 것이나, 이와 같은 실시예로 본원이 한정 해석되는 것은 아니며, 앞서 설명한 다양한 실시예들에 따라 도 4에 도시된 데이터가 송수신되는 과정이 변경될 수 있음은 기술분야에 속하는 당업자에게 자명하다.FIG. 4 is a diagram illustrating a process of transmitting and receiving data between components included in a system for providing a Q&A recommendation service using a Q&A data set evolving through learning of Fig. 1 according to an embodiment of the present invention. Hereinafter, an example of a process in which data is transmitted/received between each component will be described with reference to FIG. 4, but the present application is not limitedly interpreted as such an embodiment, and is illustrated in FIG. 4 according to various embodiments described above. It is apparent to those skilled in the art that the process of transmitting and receiving data may be changed.

도 4를 참조하면, 질의응답 추천 서비스 제공 서버(300)는, 판매자 단말(100)로부터 상품 데이터를 업로드받으면(S4100), 텍스트가 포함되어 있는지를 확인하고(S4200), 텍스트가 없는 경우 상품 이미지가 존재하는지를 확인하여(S4300), 상품 이미지 내에 텍스트가 존재하는 경우(S4400), OCR 이미지 텍스트 분석 모델로 이미지로부터 텍스트를 추출한다(S4710). 그리고, 상품 이미지는 존재하나 텍스트는 이미지 형태라도 존재하지 않는 경우, 질의응답 추천 서비스 제공 서버(300)는 유사이미지 검색 모델을 이용하여 기 구축된 이미지 중 가장 유사한 이미지를 검색함으로써, 이에 매핑된 텍스트를 검출한다(S4600, S4720). 또한, 질의응답 추천 서비스 제공 서버(300)는, 상품 텍스트도 없고 상품 이미지도 없어서 Q&A를 추천할 아무런 정보가 존재하지 않는 경우 상품명을 카테고리 분석을 이용하여 분석한다(S4700).Referring to FIG. 4, when the Q&A recommendation service providing server 300 receives product data from the seller terminal 100 (S4100), it checks whether text is included (S4200), and if there is no text, the product image It is checked whether is present (S4300), and if the text exists in the product image (S4400), text is extracted from the image using the OCR image text analysis model (S4710). And, if the product image exists but the text does not exist in the form of an image, the Q&A recommendation service providing server 300 searches for the most similar image among the previously established images using the similar image search model, Is detected (S4600, S4720). In addition, the Q&A recommendation service providing server 300 analyzes the product name using category analysis when there is no product text and no product image, and thus no information to recommend Q&A exists (S4700).

한편, S4200 단계에서 텍스트가 포함된 것으로 정의되거나 S4300 단계에서 상품 이미지가 존재하여 텍스트가 추출되는 경우, S4800 단계로 진행하여 전처리를 수행하고(S4800), 품사태깅 및 형태소 분석을 진행한 후(S4810), 텍스트 분석 알고리즘을 이용하여(S4820), 텍스트, 이미지, 및 카테고리 중 어느 하나에 매핑된 질의응답 데이터셋이 존재하는지를 확인하는 단계로 귀결된다(S4940). 또, 텍스트가 이미지화되되어 없거나 상품명만 존재하는 것으로부터 데이터 셋을 추출했다면 S4840과 유사한 단계로 진행하게 된다. 만약, 질의응답 추천 서비스 제공 서버(300)에서 텍스트에 매핑된 질의응답 데이터 셋을 발견하지 못했거나, 유사도가 기 설정된 기준값을 만족하지 못하는 경우 판매자에게 새로운 데이터를 요청하거나, 관리자 단말(미도시)로 알려 부족한 사항을 채워넣도록 하며(S4850), 새로이 입력된 값은 학습데이터로 입력되어 재학습을 진행하는 데이터로 사용된다(S4851). 이때, S4850에서 추천을 하지 못하는 경우 판매자가 처음부터 끝까지 Q&A를 작성해야 하므로, 유사도가 기준치를 만족하지 못하더라도 판매자에게 제공하는 것을 배제하는 것은 아니다.On the other hand, if the text is defined as containing text in step S4200 or the product image exists in step S4300 and text is extracted, proceed to step S4800 to perform preprocessing (S4800), and after performing POS tagging and morpheme analysis (S4810). ), using the text analysis algorithm (S4820), it results in a step of checking whether a Q&A dataset mapped to any one of text, image, and category exists (S4940). In addition, if the data set is extracted from the text that is not imaged or only the product name exists, the process proceeds to a similar step to S4840. If the Q&A recommendation service providing server 300 does not find the Q&A data set mapped to the text, or if the similarity does not satisfy a preset reference value, request new data from the seller or a manager terminal (not shown) The information is notified and the insufficient information is filled in (S4850), and the newly entered value is input as learning data and used as data for re-learning (S4851). At this time, if the S4850 fails to recommend, the seller has to write a Q&A from start to finish, so even if the similarity does not satisfy the reference value, it is not excluded to provide it to the seller.

한편, 질의응답 추천 서비스 제공 서버(300)는, 판매자 단말(100)에서 추천된 질의응답 데이터 셋을 추가, 삭제 또는 변형하는 경우(S4910), 학습 데이터로 입력시켜서 학습 모델을 재학습시킴으로써 데이터베이스를 더욱 풍부하게, 더 나아가 스스로 진화하도록 할 수 있다(S4920).On the other hand, when the Q&A recommendation service providing server 300 adds, deletes, or transforms the Q&A data set recommended from the seller terminal 100 (S4910), the database is retrained by inputting it as learning data. It can be enriched and further evolved by itself (S4920).

상술한 단계들(S4100~S4920)간의 순서는 예시일 뿐, 이에 한정되지 않는다. 즉, 상술한 단계들(S4100~S4920)간의 순서는 상호 변동될 수 있으며, 이중 일부 단계들은 동시에 실행되거나 삭제될 수도 있다.The order between the above-described steps S4100 to S4920 is only an example and is not limited thereto. That is, the order of the above-described steps (S4100 to S4920) may be mutually changed, and some of the steps may be executed or deleted at the same time.

이와 같은 도 4의 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1 내지 도 3을 통해 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters that are not described about the method of providing a shopping mall Q&A recommendation service using the Q&A data set evolving through the learning of FIG. 4 are previously described in Figs. 1 to 3, a shopping mall using a Q&A data set that evolves through learning. Since the Q&A recommendation service providing method is the same as the description or can be easily inferred from the description, the following description will be omitted.

도 5는 본 발명의 일 실시예에 따른 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법을 설명하기 위한 동작 흐름도이다. 도 5를 참조하면, 질의응답 추천 서비스 제공 서버는, 판매자 단말로부터 상품을 설명하는 상품 데이터를 업로드받고(S5100), 상품 데이터에 포함된 텍스트를 전처리를 수행하고, 품사 태깅을 통한 형태소 분석 후 최소 단위의 어절을 생성하고, 생성된 어절을 분석한다(S5200).5 is a flowchart illustrating a method of providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning according to an embodiment of the present invention. 5, the Q&A recommendation service providing server receives product data describing a product from a seller terminal (S5100), performs pre-processing of the text included in the product data, and performs a morpheme analysis through part-of-speech tagging. A unit word is generated, and the generated word is analyzed (S5200).

질의응답 추천 서비스 제공 서버는, 분석된 텍스트를 매트릭스로 정형화시킨 후 분류기를 이용하여 모델링하는 텍스트 분석을 수행하고(S5300) 및 분석된 텍스트를 기반으로 기 저장된 질의응답 데이터 셋을 유사도에 기초하여 추출하여 판매자 단말로 추천한다(S5400).The Q&A recommendation service providing server forms the analyzed text into a matrix, then performs text analysis modeled using a classifier (S5300), and extracts a previously stored Q&A data set based on the analyzed text based on similarity. It is recommended to the seller terminal (S5400).

이와 같은 도 5의 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1 내지 도 4를 통해 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters that are not described about the method of providing a shopping mall Q&A recommendation service using the Q&A data set evolving through the learning of FIG. 5 are described in the shopping mall using the Q&A data set evolving through learning through FIGS. 1 to 4 above. Since the Q&A recommendation service providing method is the same as the description or can be easily inferred from the description, the following description will be omitted.

도 5를 통해 설명된 일 실시예에 따른 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The method for providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning according to an embodiment described with reference to FIG. 5 includes instructions executable by a computer such as an application or program module executed by a computer. It can also be implemented in the form of a recording medium. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable medium may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 일 실시예에 따른 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있음)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 본 발명의 일 실시예에 따른 학습을 통하여 진화하는 질의응답 데이터 셋을 이용한 쇼핑몰 질의응답 추천 서비스 제공 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다.The method for providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning according to an embodiment of the present invention described above includes an application basically installed in a terminal (this is included in a platform or operating system basically installed in the terminal). It can be executed by an application (that is, a program) installed directly on the master terminal through an application providing server such as an application store server, an application, or a web server related to the service. have. In this sense, the method for providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning according to an embodiment of the present invention described above is an application (i.e., a program) installed in a terminal or directly installed by a user. And recorded on a computer-readable recording medium such as a terminal.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains can understand that it is possible to easily transform it into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

Claims

In the Q&A recommendation service provision method executed in the Q&A recommendation service provision server,
Receiving product data describing the product from the seller terminal;
Pre-processing the text included in the product data, analyzing the morpheme through part-of-speech tagging, generating a word of a minimum unit, and analyzing the generated word;
Performing text analysis modeling using a classifier after forming the analyzed text into a matrix; And
Including; extracting a previously stored Q&A data set based on the analyzed text based on similarity and recommending it to the seller terminal; Including,
After the step of uploading product data describing a product from the seller terminal, when there is no text in the product data and only a product image exists, an OCR (Optical Character Recognition) text-based image analysis model is applied to the product image. And converting the text into text; the OCR text-based image analysis model further comprises: text detection for finding a location of text from an image and text recognition for extracting a text image as text Including,
After the step of uploading product data describing a product from the seller terminal, when there is no text in the product data and only a product image exists, the product image using at least one model of deep learning from a previously stored database Extracting at least one feature of; Expressing the extracted at least one feature in a vector space and searching for a similar image similar to the product image using an algorithm or a clustering algorithm for calculating a similarity distance; The method of providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning, further comprising: extracting a text pre-mapped to the searched similar image.

delete

The method of claim 1,
After the step of uploading product data describing a product from the seller terminal,
Performing pre-processing including removing a stop-word included in the product name when text and image are not present in the product data and only a product name indicating the product exists;
Performing one-hot encoding of the text included in the pre-processed product name based on the syllable, and inputting the text included in the product name changed into a vector as machine learning training data;
Performing category prediction through a classification model based on the text input as the training data;
Extracting previously mapped and stored text to the predicted category;
The method of providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning.

The method of claim 1,
After the step of extracting a previously stored Q&A data set based on the analyzed text based on similarity and recommending it to the seller terminal,
Input data for retraining a text-based model from the added, modified or deleted data when the seller terminal adds Q&A data to the previously stored Q&A data set or transforms or deletes the previously stored Q&A data Setting to;
Inputting the input data into the text-based model and retraining periodically or in real time;
The method of providing a shopping mall Q&A recommendation service using a Q&A data set evolving through learning.

delete