KR102214754B1

KR102214754B1 - Method and apparatus for generating product evaluation criteria

Info

Publication number: KR102214754B1
Application number: KR1020190026902A
Authority: KR
Inventors: 김종우; 민은주; 김준호; 정상형; 여운영; 이지현
Original assignee: 한양대학교 산학협력단
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2021-02-09
Also published as: KR20200107571A

Abstract

상품 평가 기준 생성 방법 및 그 장치가 개시된다. 상품 평가 기준 생성 방법은, (a) 소분류에 포함된 상품들에 대한 리뷰 데이터를 크롤링하는 단계; (b) 상기 크롤링된 리뷰 데이터를 형태소 분석하여 미리 설정된 각 품사에 대한 단어를 각각 추출하는 단계; (c) 상기 추출된 각 단어를 토픽 모델링하여 평가 기준 후보군을 추출하는 단계; 및 (d) 기생성된 대분류에 대한 평가 기준 사례 베이스 모델을 이용하여 상기 추출된 평가 기준 후보군에서 평가 기준 항목을 선별하여 최종 평가 기준 항목으로 추출하는 단계를 포함한다.Disclosed are a method for generating product evaluation criteria and an apparatus therefor. The method for generating product evaluation criteria includes: (a) crawling review data for products included in sub-classification; (b) morphologically analyzing the crawled review data and extracting words for each preset part of speech; (c) subject modeling each of the extracted words to extract an evaluation criterion candidate group; And (d) selecting an evaluation criterion item from the extracted evaluation criterion candidate group using the evaluation criterion case base model for the parasitic large classification and extracting it as a final evaluation criterion item.

Description

TECHNICAL FIELD [Method and apparatus for generating product evaluation criteria]

본 발명은 상품 평가 기준 생성 방법 및 그 장치에 관한 것이다. The present invention relates to a method and apparatus for generating product evaluation criteria.

종래에는 기존 이커머스가 사용하는 사전 기반의 추출 기술과 빈도수 기반 토픽 추출 기술을 이용하여 상품 평가 기준을 선별하였다. In the past, product evaluation criteria were selected using dictionary-based extraction technology and frequency-based topic extraction technology used by existing e-commerce.

종래의 사전 기반 추출의 경우, 일반적으로 미리 구축된 사전을 활용하여 평가 기준을 추출하는 것으로, 단어나 후보군과 관계가 있는 단어들이 리뷰에 존재할 경우 해당 평가 기준을 추출하는 방식이다. 예를 들어, 가습기 리뷰에서, ‘성능’, ‘기능’, ‘파워’ 등의 단어가 리뷰에 존재하는 경우, 해당 단어들이 미리 구축된 사전에서 검색하여 "성능"이라는 단어가 존재하는 경우, 이를 평가 기준으로 추출하는 방식이다. In the case of conventional dictionary-based extraction, in general, an evaluation criterion is extracted by using a dictionary built in advance, and when words or words related to a candidate group exist in a review, the evaluation criterion is extracted. For example, in a humidifier review, if words such as'performance','function', and'power' exist in the review, the words are searched in a dictionary built in advance, and the word "performance" exists, It is a method of extracting as an evaluation criterion.

또한, 빈도수 기반 토픽 추출 기술은 리뷰에 존재하는 명사들의 빈도수를 기반으로 토픽을 추출하는 방식이다. Also, the frequency-based topic extraction technology is a method of extracting a topic based on the frequency of nouns present in the review.

그러나, 이들 종래 기술은 평가 기준을 임의로 선정하거나, 빈도수 추출 방식으로 평가 기준이 적절히 도출되지 않으며, 리뷰에 내재된 소비자의 평가 기준을 반영하기 어려운 문제점이 있다. However, these conventional techniques have a problem in that the evaluation criteria are not arbitrarily selected or the evaluation criteria are not properly derived by the frequency extraction method, and it is difficult to reflect the evaluation criteria of consumers inherent in reviews.

본 발명은 리뷰에 내재된 소비자의 평가 기준을 토픽 모델링을 통해 추출한 후 이를 정제할 수 있는 상품 평가 기준 생성 방법 및 그 장치를 제공하기 위한 것이다. An object of the present invention is to provide a method and apparatus for generating a product evaluation standard capable of refining the evaluation criteria of a consumer embedded in a review through topic modeling.

본 발명의 일 측면에 따르면, 소비자의 의도가 반영된 상품 평가 기준을 생성할 수 있는 상품 평가 기준 생성 방법이 제공된다. According to an aspect of the present invention, there is provided a method for generating product evaluation criteria capable of generating product evaluation criteria reflecting consumer intentions.

본 발명의 일 실시예에 따르면, (a) 소분류에 포함된 상품들에 대한 리뷰 데이터를 크롤링하는 단계; (b) 상기 크롤링된 리뷰 데이터를 형태소 분석하여 미리 설정된 각 품사에 대한 단어를 각각 추출하는 단계; (c) 상기 추출된 각 단어를 토픽 모델링하여 평가 기준 후보군을 추출하는 단계; 및 (d) 기생성된 대분류에 대한 평가 기준 사례 베이스 모델을 이용하여 상기 추출된 평가 기준 후보군에서 평가 기준 항목을 선별하여 최종 평가 기준 항목으로 추출하는 단계를 포함하는 상품 평가 기준 생성 방법이 제공될 수 있다. According to an embodiment of the present invention, the steps of: (a) crawling review data on products included in sub-classifications; (b) morphologically analyzing the crawled review data and extracting words for each preset part of speech; (c) subject modeling each of the extracted words to extract an evaluation criterion candidate group; And (d) selecting an evaluation criterion item from the extracted evaluation criterion candidate group using an evaluation criterion case base model for a parasitic large category, and extracting it as a final evaluation criterion item. I can.

상기 (d) 단계는, 상기 평가 기준 사례 베이스 모델을 기반으로 상기 추출된 평가 기준 후보군에 포함된 평가 기준 항목 각각에 대한 태그 정보를 예측하는 단계; 및 상기 태그 정보가 예측된 각 평가 기준 항목들 중 언급 빈도(또는 언급 비율)에 따라 최종 평가 기준 항목을 추출하는 단계를 포함할 수 있다. The step (d) may include predicting tag information for each evaluation criterion item included in the extracted evaluation criterion candidate group based on the evaluation criterion case base model; And extracting a final evaluation criterion item according to a mention frequency (or a mention rate) of each evaluation criterion item for which the tag information is predicted.

상기 최종 평가 기준 항목을 추출하는 단계는, 상기 태그 정보가 예측된 각 평가 기준 항목들 중 언급 빈도(또는 언급 비율)에 따라 최종 평가 기준 항목을 추출함에 있어, 상기 태그 정보에 따라 상이한 기준을 적용하여 언급 빈도(또는 언급 비율)에 따라 최종 평가 기준 항목을 추출할 수 있다. In the step of extracting the final evaluation criterion item, in extracting the final evaluation criterion item according to the mention frequency (or mention ratio) among the evaluation criterion items for which the tag information is predicted, different criteria are applied according to the tag information. Thus, the final evaluation criteria items can be extracted according to the frequency of mention (or rate of mention).

상기 평가 기준 후보군에 포함된 각 평가 기준 항목들 중 상기 태그 정보가 제1 값으로 예측된 평가 기준 항목들은 상기 최종 평가 기준 항목으로 추출되지 않으며, 상기 태그 정보가 제2 값으로 예측된 평가 기준 항목들은 제1 기준을 적용하여 언급 비율에 따라 선별되며, 상기 태그 정보가 제3 값으로 예측된 평가 기준 항목들은 제2 기준을 적용하여 언급 비율에 따라 선별될 수 있다. Among the evaluation criteria items included in the evaluation criteria candidate group, evaluation criteria items in which the tag information is predicted as a first value are not extracted as the final evaluation criteria item, and evaluation criteria items in which the tag information is predicted as a second value Are selected according to the mention rate by applying the first criterion, and evaluation criterion items in which the tag information is predicted as the third value may be selected according to the mention rate by applying the second criterion.

상기 (d) 단계 이전에, 상기 평가 기준 후보군에 포함된 각 평가 기준 항목을 벡터화하여 워드 임베딩하는 단계를 더 포함할 수 있다. Prior to step (d), the step of vectorizing each evaluation criterion item included in the evaluation criterion candidate group and embedding a word may be further included.

상기 평가 기준 사례 베이스 모델의 생성은 대분류에 상응하여 리뷰 데이터를 크롤링하는 단계; 상기 크롤링된 리뷰 데이터를 형태소 분석하여 각 단어를 추출하는 단계; 상기 추출된 각 단어를 토픽 모델링하여 사전 평가 기준 후보군을 추출하는 단계; 상기 사전 평가 기준 후보군에 포함된 각 평가 기준 항목에 미리 설정된 기준에 따라 태그 정보를 부여하는 단계; 상기 사전 평가 기준 후보군에 포함된 각 평가 기준 항목을 벡터화하여 워드 임베딩하는 단계; 및 상기 벡터화된 사전 평가 기준 후보군과 각각의 태그 정보를 KNN(K-Nearest Neighbor) 알고리즘에 적용하여 대분류에 따른 평가 기준 사례 베이스 모델을 생성하는 단계를 포함할 수 있다. The generation of the evaluation criterion case base model comprises: crawling review data corresponding to the major classification; Extracting each word by morphologically analyzing the crawled review data; Extracting a pre-evaluation criterion candidate group by topic modeling the extracted words; Assigning tag information to each evaluation criterion item included in the pre-evaluation criterion candidate group according to a preset criterion; Vectorizing each evaluation criterion item included in the preliminary evaluation criterion candidate group and embedding a word; And applying the vectorized pre-evaluation criterion candidate group and respective tag information to a K-Nearest Neighbor (KNN) algorithm to generate an evaluation criterion case base model according to the major classification.

본 발명의 다른 측면에 따르면, 소비자의 의도가 반영된 상품 평가 기준을 생성할 수 있는 장치가 제공된다.According to another aspect of the present invention, there is provided an apparatus capable of generating a product evaluation criterion reflecting the intention of a consumer.

본 발명의 일 실시예에 따르면, 대분류에 상응하는 리뷰 데이터를 기반으로 평가 기준 사례 베이스 모델을 생성하는 전처리 모델부; 및 상기 평가 기준 사례 베이스 모델을 이용하여 소분류에 상응하는 리뷰 데이터에 포함된 각 단어에 대한 태그 정보를 예측한 후 태그 정보에 따라 상이한 기준을 적용하여 선별된 단어를 최종 평가 기준 항목으로 추출하는 평가 기준 추출 모델부를 포함하는 평가 기준 생성 장치가 제공될 수 있다. According to an embodiment of the present invention, there is provided a preprocessing model unit for generating an evaluation reference case base model based on review data corresponding to a large classification; And an evaluation of predicting tag information for each word included in the review data corresponding to the sub-classification using the evaluation criterion case base model, and then extracting the selected word as a final evaluation criterion item by applying different criteria according to the tag information. An evaluation criterion generating device including a criterion extraction model unit may be provided.

상기 평가 기준 추출 모델부는, 상기 소분류에 포함된 상품들에 대한 리뷰 데이터를 크롤링하는 수집부; 상기 크롤링된 리뷰 데이터를 형태소 분석하여 미리 설정된 각 품사에 대한 단어를 각각 추출하는 분석부; 상기 추출된 각 단어를 토픽 모델링하여 평가 기준 후보군을 추출하는 토픽 모델링부; 및 상기 평가 기준 사례 베이스 모델을 이용하여 상기 추출된 평가 기준 후보군에서 평가 기준 항목을 선별하여 최종 평가 기준 항목으로 추출하는 평가 기준 추출부를 포함할 수 있다. The evaluation criterion extraction model unit may include: a collection unit for crawling review data on products included in the sub-classification; An analysis unit that morphemes the crawled review data and extracts a word for each preset part of speech; A topic modeling unit for modeling each of the extracted words and extracting an evaluation criterion candidate group; And an evaluation criterion extracting unit that selects an evaluation criterion item from the extracted evaluation criterion candidate group using the evaluation criterion case base model and extracts it as a final criterion item.

상기 평가 기준 추출부는, 상기 태그 정보에 따라 상이한 기준을 적용하여 상기 평가 기준 후보군에서 최종 평가 기준 항목을 추출할 수 있다. The evaluation criterion extracting unit may extract a final evaluation criterion item from the evaluation criterion candidate group by applying different criteria according to the tag information.

상기 평가 기준 추출부는, 상기 평가 기준 후보군에 포함된 각 평가 기준 항목들 중 상기 태그 정보가 제1 값으로 예측된 평가 기준 항목들은 상기 최종 평가 기준 항목으로 추출하지 않으며, 상기 태그 정보가 제2 값으로 예측된 평가 기준 항목들은 제1 기준을 적용하여 언급 비율에 따라 선별하고, 상기 태그 정보가 제3 값으로 예측된 평가 기준 항목들은 제2 기준을 적용하여 언급 비율에 따라 선별할 수 있다. The evaluation criterion extracting unit does not extract evaluation criterion items whose tag information is predicted as a first value among the evaluation criterion items included in the evaluation criterion candidate group as the final evaluation criterion item, and the tag information is a second value Evaluation criteria items predicted as are selected according to the mention ratio by applying the first criterion, and evaluation criteria items predicted with the third value of the tag information may be selected according to the mention ratio by applying the second criterion.

상기 평가 기준 추출 모델부는, 상기 평가 기준 후보군에 포함된 각 평가 기준 항목을 벡터화하여 워드 임베딩하는 워드 임베딩부를 더 포함할 수 있다. The evaluation criterion extraction model unit may further include a word embedding unit that vectorizes each evaluation criterion item included in the evaluation criterion candidate group and embeds a word.

상기 전처리 모델부는, 대분류에 상응하여 리뷰 데이터를 크롤링하는 수집부; 상기 크롤링된 리뷰 데이터를 형태소 분석하여 각 단어를 추출하는 분석부; 상기 추출된 각 단어를 토픽 모델링하여 사전 평가 기준 후보군을 추출하는 토픽 모델링부; 상기 사전 평가 기준 후보군에 포함된 각 평가 기준 항목에 미리 설정된 기준에 따라 태그 정보를 부여하는 가중치 적용부; 상기 사전 평가 기준 후보군에 포함된 각 평가 기준 항목을 벡터화하여 워드 임베딩하는 워드 임베딩부; 및 상기 벡터화된 사전 평가 기준 후보군과 각각의 태그 정보를 KNN(K-Nearest Neighbor) 알고리즘에 적용하여 대분류에 따른 평가 기준 사례 베이스 모델을 생성하는 모델 생성부를 포함할 수 있다. The pre-processing model unit may include a collection unit for crawling review data corresponding to a large classification; An analysis unit for extracting each word by morpheme analysis of the crawled review data; A topic modeling unit that models each of the extracted words to topic and extracts a candidate group of pre-evaluation criteria; A weight application unit that assigns tag information to each evaluation criterion item included in the pre-evaluation criterion candidate group according to a preset criterion; A word embedding unit for word embedding by vectorizing each evaluation criterion item included in the pre-evaluation criterion candidate group; And a model generator that applies the vectorized pre-evaluation criterion candidate group and respective tag information to a K-Nearest Neighbor (KNN) algorithm to generate an evaluation criterion case base model according to the major classification.

본 발명의 일 실시예에 따른 상품 평가 기준 생성 방법 및 그 장치를 제공함으로써, 리뷰에 내재된 소비자의 평가 기준을 토픽 모델링을 통해 추출한 후 이를 정제할 수 있다. By providing a method and apparatus for generating product evaluation criteria according to an embodiment of the present invention, it is possible to extract consumer evaluation criteria embedded in reviews through topic modeling and then refine them.

도 1은 본 발명의 일 실시예에 따른 상품 평가 기준 생성 장치의 구성을 개략적으로 도시한 도면.
도 2는 상품 분류 카테고리를 설명하기 위해 도시한 도면.
도 3은 본 발명의 일 실시예에 따른 상품 평가 기준 생성 방법을 나타낸 순서도.
도 4는 본 발명의 일 실시예에 따른 전처리 과정을 나타낸 순서도.
도 5는 본 발명의 일 실시예에 따른 후처리 과정인 소분류에 상응하는 평가 기준 추출 과정을 나타낸 순서도.
도 6은 본 발명의 일 실시예에 따른 태그 정보에 따라 상이한 기준으로 추출된 평가 기준 예시도.
도 7은 본 발명의 일 실시예에 따른 평가 기준 후보군에 상이한 태그 정보를 부여하는 방법을 설명하기 위해 도시한 도면. 1 is a diagram schematically showing a configuration of an apparatus for generating product evaluation criteria according to an embodiment of the present invention.
2 is a diagram illustrating a product classification category.
3 is a flowchart illustrating a method of generating product evaluation criteria according to an embodiment of the present invention.
4 is a flow chart showing a pre-processing process according to an embodiment of the present invention.
5 is a flow chart showing a process of extracting evaluation criteria corresponding to sub-classification, which is a post-processing process according to an embodiment of the present invention.
6 is an exemplary view of evaluation criteria extracted with different criteria according to tag information according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating a method of assigning different tag information to an evaluation criterion candidate group according to an embodiment of the present invention.

본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.Singular expressions used in the present specification include plural expressions unless the context clearly indicates otherwise. In the present specification, terms such as “consisting of” or “comprising” should not be construed as necessarily including all of the various elements or various steps described in the specification, and some of the elements or some steps It may not be included, or it should be interpreted that it may further include additional elements or steps. In addition, terms such as "... unit" and "module" described in the specification mean units that process at least one function or operation, which may be implemented as hardware or software, or as a combination of hardware and software. .

이하, 첨부된 도면들을 참조하여 본 발명의 실시예를 상세히 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 상품 평가 기준 생성 장치의 구성을 개략적으로 도시한 도면이며, 도 2는 상품 분류 카테고리를 설명하기 위해 도시한 도면이다. 1 is a diagram schematically illustrating a configuration of an apparatus for generating product evaluation criteria according to an exemplary embodiment of the present invention, and FIG. 2 is a diagram illustrating a product classification category.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 상품 평가 기준 생성 장치(100)는 전처리 모델부(110) 및 평가 기준 추출 모델부(120)를 포함하여 구성된다. As shown in FIG. 1, the product evaluation criterion generating apparatus 100 according to an embodiment of the present invention includes a preprocessing model unit 110 and an evaluation criterion extraction model unit 120.

전처리 모델부(110)는 대분류에 상응하는 평가 기준 사례 베이스 모델을 생성한다. 전처리 모델부(110)는 대분류에 상응하여 추출된 평가 기준 후보군에 포함된 각 평가 기준 후보에 대한 평가 기준 가능성 가중치에 대한 정보를 포함한다. The preprocessing model unit 110 generates an evaluation reference case base model corresponding to the major classification. The preprocessing model unit 110 includes information on an evaluation criterion probability weight for each evaluation criterion candidate included in the evaluation criterion candidate group extracted corresponding to the large classification.

평가 기준 사례 베이스 모델에는 평가 기준 후보군에 포함된 각 평가 기준 후보에 대해 각각 평가 기준 가능성 가중치가 태깅되어 있다. 따라서, 평가 기준 추출 모델부(120)에서 소분류에 대한 평가 기준 후보군들 중 평가 기준 선별시, 해당 평가 기준 사례 베이스 모델을 이용하여 평가 기준 후보에 따른 다른 기준을 적용하여 최종 평가 기준을 생성할 수 있다. 이에 대해서는 하기의 설명에 의해 보다 명확하게 이해될 것이다. In the evaluation criteria case base model, each evaluation criteria probability weight is tagged for each evaluation criteria candidate included in the evaluation criteria candidate group. Therefore, when the evaluation criteria extraction model unit 120 selects the evaluation criteria among the evaluation criteria candidate groups for sub-classification, the final evaluation criteria can be generated by applying other criteria according to the evaluation criteria candidate using the corresponding evaluation criteria case base model. have. This will be more clearly understood by the following description.

우선 전처리 모델부(110)의 상세 기능에 대해 설명하기로 한다. First, detailed functions of the preprocessing model unit 110 will be described.

전처리 모델부(110)는 도 1에서 보여지는 바와 같이, 수집부(111), 분석부(112), 토픽 모델링부(113), 가중치 적용부(114), 워드 임베딩부(115) 및 모델 생성부(116)를 포함하여 구성된다. As shown in FIG. 1, the preprocessing model unit 110 includes a collection unit 111, an analysis unit 112, a topic modeling unit 113, a weight application unit 114, a word embedding unit 115, and a model generation. It is configured to include a section 116.

수집부(111)는 리뷰 데이터를 크롤링하기 위한 수단이다. The collection unit 111 is a means for crawling review data.

전처리 모델부(110)는 이미 전술한 바와 같이, 평가 기준 추출 모델부(120)에서 이용되는 평가 기준 사례 베이스 모델을 생성하기 위한 구성이다. 따라서, 수집부(111)는 대분류에 상응하는 리뷰 데이터를 크롤링할 수 있다. As already described above, the preprocessing model unit 110 is a component for generating an evaluation reference case base model used in the evaluation criteria extraction model unit 120. Accordingly, the collection unit 111 may crawl review data corresponding to the major classification.

보다 상세하게, 수집부(111)는 대분류에 포함된 각각의 중분류에 포함된 상품군들 중 상위 n(n은 자연수)개의 상품에 대한 리뷰 데이터를 크롤링할 수 있다. 예를 들어, 상품 분류 카테고리가 도 2와 같이, 대분류, 중분류 및 소분류로 구성되어 있다고 가정하기로 한다. 도 2에 도시된 상품 분류 카테고리는 이해와 설명의 편의를 위해 도시한 것으로, 상품 분류 카테고리는 상이할 수 있음은 당연하다. 복수의 계층 구조를 가지는 상품 분류 카테고리를 포함하는 경우, 모두 제한 없이 적용될 수 있다. In more detail, the collection unit 111 may crawl review data for top n (n is a natural number) products among product groups included in each middle category included in the major category. For example, it is assumed that a product classification category is composed of a large classification, a medium classification, and a small classification, as shown in FIG. 2. The product classification categories shown in FIG. 2 are illustrated for convenience of understanding and explanation, and it is natural that the product classification categories may be different. When a product classification category having a plurality of hierarchical structures is included, all may be applied without limitation.

도 2를 참조하면, 계층 구조를 가지는 상품 분류 카테고리는 각각의 대분류에는 복수의 중분류가 포함되며, 각각의 중분류에는 각각 복수의 소분류가 포함된다. Referring to FIG. 2, in a product classification category having a hierarchical structure, each major classification includes a plurality of sub-classes, and each sub-class includes a plurality of sub-classes.

본 발명의 일 실시예에 따른 전처리 모델부(110)에서 생성된 평가 기준 사례 베이스 모델은 의뢰된 소분류 상품에 대한 평가 기준 생성에 이용되는 정보로써, 대분류에 상응하는 평가 기준 후보군에 대한 가중치를 포함할 수 있다. 따라서, 평가 기준 사례 베이스 모델은 각 소분류에 대한 상품 평가 기준 생성에 이용되는 포괄적인 정보들을 포함할 수 있게 된다. The evaluation criterion case base model generated by the preprocessing model unit 110 according to an embodiment of the present invention is information used to generate an evaluation criterion for a requested sub-classified product, and includes a weight for an evaluation criterion candidate group corresponding to the large category. can do. Accordingly, the evaluation criteria case base model may include comprehensive information used to generate product evaluation criteria for each subclass.

따라서, 전처리 모델부(110)에 포함되는 수집부(111)는 대분류에 상응하는 평가 기준 후보군에 대한 가중치에 대한 정보를 포함할 수 있도록 대분류에 상응하는 리뷰 데이터를 크롤링할 수 있다. 예를 들어, 대분류 "디지털/가전/컴퓨터"에 대해 리뷰 데이터를 크롤링하는 경우, "디지털/가전/컴퓨터"에 포함된 각각의 중분류에 포함된 상품들중 상위 n개의 상품에 대한 리뷰 데이터를 각각 크롤링할 수 있다. Accordingly, the collection unit 111 included in the pre-processing model unit 110 may crawl review data corresponding to the large classification so as to include information on the weight of the evaluation criteria candidate group corresponding to the large classification. For example, in the case of crawling review data for the major category "digital/home appliance/computer", review data for the top n products in each middle category included in "digital/home appliance/computer" Can crawl.

분석부(112)는 크롤링된 리뷰 데이터를 형태소 분석하여 각각의 단어를 추출하기 위한 수단이다. 예를 들어, 분석부(112)는 크롤링된 리뷰 데이터에 대해 형태소 분석을 통해 미리 설정된 품사에 대한 단어를 각각 추출할 수 있다. 즉, 미리 설정된 품사가 명사, 형용사, 부사, 동사라고 가정하기로 한다. 분석부(112)는 리뷰 데이터를 형태소 단위로 분석한 후 명사, 형용사, 부사 및 동사에 해당하는 단어들을 각각 추출할 수 있다. 미리 설정된 품사에 해당하지 않는 조사, 대명사 등은 단어 추출시 제외될 수 있다. The analysis unit 112 is a means for extracting each word by morphologically analyzing the crawled review data. For example, the analysis unit 112 may extract a word for a preset part-of-speech through morpheme analysis of the crawled review data. In other words, it is assumed that the preset parts of speech are nouns, adjectives, adverbs, and verbs. The analysis unit 112 may analyze the review data in morpheme units and then extract words corresponding to nouns, adjectives, adverbs, and verbs, respectively. Investigations, pronouns, etc. that do not correspond to preset parts of speech may be excluded when extracting words.

예를 들어, 크롤링된 리뷰 데이터가 다음과 같다고 가정하기로 한다. For example, it is assumed that the crawled review data is as follows.

"아직 사용 전이지만 포장도 꼼꼼하고 가격도 저렴하여 만족합니다. 다만 재고 부족으로 배송이 조금 느리네요.", "성능도 우수하고 디자인과 색상도 예쁩니다. 가격대비 아주 만족합니다.", "인식 잘되고 작고 배송 하루만에 오고 좋아요. 추천합니다", "배송속도는 만족합니다. 제품 사이즈도 만족합니다. 하지만 제품 쓰기속도가 너무 안좋네요."와 같다고 가정하기로 한다. "It's still before use, but I'm satisfied because the packaging is meticulous and the price is cheap. However, the delivery is a little slow due to lack of stock.", "The performance is excellent and the design and color are beautiful. I'm very satisfied with the price." It's small and it comes in a day and it's good. It's recommended," and "I'm satisfied with the delivery speed. I'm satisfied with the product size. But the writing speed of the product is too bad."

형태소 분석을 통해 추출된 각 단어는 "‘아직’, ‘사용’, ‘전이‘, ‘포장‘, ‘꼼꼼하다‘, ‘가격‘, ‘저렴하다‘, ‘만족하다‘, ‘다만‘, ‘재고‘, ‘부족‘, ‘배송‘, ‘조금’, ‘느리다’, ‘성능’, ‘우수하다‘, ‘디자인‘, ‘색상‘, ‘예쁘다‘, ‘가격‘, ‘대비‘, ‘아주‘, ‘만족하다‘, ‘인식’, ‘되다‘, ‘작고‘, ‘배송‘, ‘하루‘, ‘오다‘, ‘좋다‘, ‘추천‘, ‘하다‘, ‘배송’, ‘속도‘, ‘만족하다‘, ‘제품‘, ‘사이즈‘, ‘만족하다‘, ‘제품’, ‘쓰기‘, ‘속도‘, ‘너무‘, ‘안좋다’"와 같이 추출될 수 있다. Each word extracted through morpheme analysis is "'still','use','transfer','packaging','exactly','price','cheap','satisfied','only', ' Stock','shortage','delivery','a little','slow','performance','excellent','design','color','pretty','price','contrast','very ','satisfied','recognition','become','small high','delivery','day','come','good','recommend','do','delivery','speed' ,'Satisfied','Product','Size','Satisfied','Product','Writing','Speed','Too','Not good'".

토픽 모델링부(113)는 추출된 각 단어를 토픽 모델링하여 평가 기준 후보군을 추출한다. 예를 들어, 토픽 모델링부(113)는 LDA(Latent Dirichlet Allocation)로 토픽을 반복하여 추출하여 평가 기준 후보군을 추출할 수 있다. LDA 알고리즘을 통해 토픽 모델링하는 방법 자체는 당업자에게는 자명한 사항이므로 이에 대한 별도의 설명은 생략하기로 한다. The topic modeling unit 113 extracts an evaluation criterion candidate group by topic modeling each extracted word. For example, the topic modeling unit 113 may extract an evaluation criterion candidate group by repeatedly extracting a topic using Latent Dirichlet Allocation (LDA). Since the method of modeling a topic through the LDA algorithm itself is obvious to those skilled in the art, a separate description thereof will be omitted.

가중치 적용부(114)는 토픽 모델링부(113)에 의해 추출된 평가 기준 후보군들에 대한 가중치를 미리 설정된 기준에 따라 적용하기 위한 수단이다. The weight application unit 114 is a means for applying the weights for the evaluation criteria candidate groups extracted by the topic modeling unit 113 according to a preset criterion.

예를 들어, 가중치 적용부(114)는 평가 기준 후보군을 복수의 서브 그룹으로 그룹핑한 후 각 서브 그룹에 대해 상이한 가중치를 적용할 수 있다. 예를 들어, 가중치 적용부(114)는 평가 기준 후보군을 도 7에 도시된 바와 같이 평가 기준 항목으로 적합한 제1 서브 그룹, 평가 기준 항목으로 적합하지 않은 제3 서브 그룹군을 생성할 수 있다. 또한, 제1 서브 그룹에 포함된 평가 기준 항목의 하위 개념이거나 특정 상품에만 적용되는 항목인 경우 제2 서브 그룹으로 분류될 수 있다. For example, the weight application unit 114 may group the evaluation criteria candidate group into a plurality of subgroups and then apply a different weight to each subgroup. For example, the weight application unit 114 may generate a first subgroup suitable for the evaluation criteria candidate group as an evaluation criteria item and a third subgroup group not suitable for the evaluation criteria item, as shown in FIG. 7. In addition, if an item is a sub-concept of an evaluation criterion item included in the first sub-group or is applied only to a specific product, it may be classified as a second sub-group.

가중치 적용부(114)는 제1 서브 그룹에 대해 가장 높은 제1 가중치를 적용하며, 제3 서브 그룹에 포함된 평가 기준 항목들에 대해서는 추후 평가 기준 추출시 제외될 수 있도록 제3 가중치("0")를 적용할 수 있다. 이어, 제2 서브 그룹에 대해서는 제1 가중치와 제3 가중치 사이의 가중치를 적용할 수 있다. 예를 들어, 제1 서브 그룹에 대해서는 "2"를 적용하고, 제2 서브 그룹에 대해서는 "1"을 가중치로 적용하고, 제3 서브 그룹에 대해서는 "0"을 가중치로 설정할 수 있다. The weight application unit 114 applies the highest first weight to the first subgroup, and the third weight ("0") so that evaluation criteria items included in the third subgroup can be excluded when extracting the evaluation criteria later. ") can be applied. Subsequently, a weight between the first weight and the third weight may be applied to the second subgroup. For example, "2" may be applied to the first subgroup, "1" may be applied as a weight to the second subgroup, and "0" may be set as the weight to the third subgroup.

이에 따라, 가중치 적용부(114)에 의해 평가 기준 후보군을 미리 설정된 기준에 따라 서브 그룹으로 그룹핑한 후 각각의 가중치가 설정되며, 이를 각 평가 기준 항목에 대해 태깅할 수 있다. Accordingly, after grouping the evaluation criterion candidate group into subgroups according to a preset criterion by the weight application unit 114, each weight is set, and this may be tagged for each evaluation criterion item.

워드 임베딩부(115)는 평가 기준 후보군에 포함된 항목들(평가 기준 후보 단어)를 FastText 알고리즘을 이용하여 벡터화하여 벡터 공간에 임베딩하기 위한 수단이다. FastText 알고리즘은 Word2Vec에 기반한 알고리즘으로 당업자에게는 자명한 사항이므로 이에 대한 상세한 설명은 생략하기로 한다. The word embedding unit 115 is a means for vectorizing items (evaluation criteria candidate words) included in the evaluation criteria candidate group using a FastText algorithm and embedding them in a vector space. The FastText algorithm is an algorithm based on Word2Vec and is obvious to a person skilled in the art, so a detailed description thereof will be omitted.

또한, 본 발명의 일 실시예에 따른 워드 임베딩부(115)를 통해 FastText 알고리즘을 이용하여 평가 기준 후보군에 포함된 각 항목들을 벡터화하여 벡터 공간에 임베딩함으로써 단어간의 유사도 정보를 이용할 수도 있다. Word2Vec 기반의 FastText 알고리즘을 통해 벡터화된 평가 기준 후보군에 포함된 평가 기준 항목들간의 유사도를 도출하는 방법 자체는 당업자에게는 자명한 사항이므로 이에 대한 설명은 생략하기로 한다. In addition, the word embedding unit 115 according to an embodiment of the present invention may use FastText algorithm to vectorize each item included in the evaluation criterion candidate group and embed it in a vector space, thereby using similarity information between words. The method of deriving the degree of similarity between the evaluation criteria items included in the vectorized evaluation criteria candidate group through the Word2Vec-based FastText algorithm is obvious to those skilled in the art, so a description thereof will be omitted.

모델 생성부(116)는 벡터화된 평가 기준 후보군과 태깅 정보를 이용하여 평가 기준 사례 베이스 모델을 생성한다. The model generation unit 116 generates an evaluation criterion case base model using the vectorized evaluation criterion candidate group and tagging information.

예를 들어, 모델 생성부(116)는 벡터화된 평가 기준 후보군과 태깅 정보를 KNN 알고리즘에 적용하여 평가 기준 사례 베이스 모델을 생성할 수 있다. 여기서, K은 홀수로 정해질 수 있다. 예를 들어, K=3으로 정한 후 벡터화된 평가 기준 후보군과 태깅 정보를 KNN 알고리즘에 적용하여 평가 기준 사례 베이스 모델을 생성할 수 있다. For example, the model generation unit 116 may apply the vectorized evaluation criterion candidate group and tagging information to the KNN algorithm to generate an evaluation criterion case base model. Here, K may be determined as an odd number. For example, after setting K=3, a vectorized evaluation criterion candidate group and tagging information can be applied to the KNN algorithm to generate an evaluation criterion case base model.

KNN 알고리즘을 적용하여 벡터화된 평가 기준 후보군과 태깅 정보를 이용한 평가 기준 사례 베이스 모델을 생성함으로써, 후처리 모델인 평가 기준 후보군의 태깅 정보(가중치) 분류에 적용될 수 있다. By applying the KNN algorithm to generate an evaluation criterion case base model using vectorized evaluation criterion candidate group and tagging information, it can be applied to the classification of tagging information (weight) of the evaluation criterion candidate group, which is a post-processing model.

KNN 알고리즘을 적용하여 벡터화된 평가 기준 후보군과 태깅 정보를 이용한 평가 기준 사례 베이스 모델을 생성함으로써, 후처리 모델인 평가 기준 추출 모델부(120)에서 평가 기준 후보군 중 최종 평가 기준에 레퍼런스로서 활용될 수 있다. By applying the KNN algorithm to generate an evaluation criteria case base model using vectorized evaluation criteria candidate groups and tagging information, the evaluation criteria extraction model unit 120, a post-processing model, can be used as a reference for the final evaluation criteria among the evaluation criteria candidate groups. have.

이와 같이, 전처리 모델부(110)는 대분류에 상응하여 크롤링된 리뷰 데이터를 이용하여 평가 기준 사례 베이스 모델을 생성할 수 있다. 이를 이용하여 소분류에 대한 최종 평가 기준을 생성하는 후처리 모델인 평가 기준 추출 모델부(120)의 동작에 대해 설명하기로 한다.In this way, the preprocessing model unit 110 may generate an evaluation reference case base model by using the review data crawled corresponding to the major classification. Using this, the operation of the evaluation criteria extraction model unit 120, which is a post-processing model for generating the final evaluation criteria for sub-classification, will be described.

평가 기준 추출 모델부(120)는 전처리 모델부(110)에서 생성된 평가 기준 사례 베이스 모델을 이용하여 소분류에 대한 리뷰 데이터를 기반으로 평가 기준을 생성하기 위한 수단이다. The evaluation criterion extraction model unit 120 is a means for generating an evaluation criterion based on review data for a small classification using the evaluation criterion case base model generated by the preprocessing model unit 110.

이러한 평가 기준 추출 모델부(120)는 수집부(121), 분석부(122), 토픽 모델링부(123), 워드 임베딩부(124), 평가 기준 추출부(125)를 포함한다. The evaluation criterion extraction model unit 120 includes a collection unit 121, an analysis unit 122, a topic modeling unit 123, a word embedding unit 124, and an evaluation criterion extracting unit 125.

수집부(121)는 소분류에 대한 리뷰 데이터를 크롤링하기 위한 수단이다. 수집부(121)는 평가 기준 생성이 의뢰된 소분류에 포함된 상품들 중 리뷰 데이터가 많은 n개의 상품에 대한 리뷰 데이터를 크롤링할 수 있다. The collection unit 121 is a means for crawling review data for sub-classification. The collection unit 121 may crawl review data for n products having a large amount of review data among products included in the sub-class for which the generation of evaluation criteria is requested.

분석부(122)는 크롤링된 리뷰 데이터를 형태소 분석하여 각각의 단어를 추출하기 위한 수단이다. 분석부(122)는 크롤링된 리뷰 데이터를 형태소 분석하여 미리 정해진 품사(예를 들어, 명사, 형용사, 부사 및 동사)에 해당하는 단어들을 각각 추출할 수 있다. The analysis unit 122 is a means for extracting each word by morphologically analyzing the crawled review data. The analysis unit 122 may morphemely analyze the crawled review data to extract words corresponding to predetermined parts of speech (eg, nouns, adjectives, adverbs, and verbs).

본 발명의 일 실시예에서는 수집부(121), 분석부(122) 등이 전처리 모델부(110)와 상이하게 별도로 구성되는 것으로 도시되어 있으나, 이는 이해와 설명의 편의를 위해 도시한 것으로 전처리 모델부(110)와 후처리 모델인 평가 기준 추출 모델부(120)에 동일하게 포함되는 각 구성(예를 들어, 수집부, 분석부, 토픽 모델링부, 워드 임베딩부)는 각각 공유될 수도 있다. In an embodiment of the present invention, the collection unit 121, the analysis unit 122, and the like are shown to be configured separately from the preprocessing model unit 110, but this is illustrated for convenience of understanding and explanation, and the preprocessing model Each component (eg, a collection unit, an analysis unit, a topic modeling unit, and a word embedding unit) included in the unit 110 and the evaluation criterion extraction model unit 120 which is a post-processing model may be shared.

토픽 모델링부(123)는 추출된 각 단어를 토픽 모델링하여 평가 기준 후보군을 추출한다. 전처리 모델부(110)에서는 대분류에 포함된 각각의 중분류에 포함된 상품군들 중 리뷰가 많은 n개의 상품에 대한 리뷰 데이터를 크롤링하여 토픽 모델링하는 반면, 후처리 모델인 평가 기준 추출 모델부(120)에서는 의뢰된 소분류에 포함된 상품군들 중 리뷰가 많은 n개의 상품에 대한 리뷰 데이터를 크롤링한 후 이들에서 추출된 각 단어를 토픽 모델링한다. The topic modeling unit 123 extracts an evaluation criterion candidate group by topic modeling each extracted word. The pre-processing model unit 110 crawls review data for n products with many reviews among the product groups included in each of the sub-classifications and performs topic modeling, while the evaluation criteria extraction model unit 120 which is a post-processing model In the article, after crawling review data for n products with many reviews among product groups included in the requested sub-classification, each word extracted from them is modeled as a topic.

워드 임베딩부(124)는 추출된 평가 기준 후보군에 포함된 각 평가 기준 항목(단어)을 벡터화한 후 벡터 공간에 임베딩한다. The word embedding unit 124 vectorizes each evaluation criterion item (word) included in the extracted evaluation criterion candidate group and embeds it in the vector space.

평가 기준 추출부(125)는 평가 기준 사례 베이스를 기반으로 벡터화된 각 평가 기준 후보군들 중 최종 평가 기준 항목(단어)들을 추출한다. The evaluation criteria extraction unit 125 extracts final evaluation criteria items (words) from the vectorized evaluation criteria candidate groups based on the evaluation criteria case base.

예를 들어, 평가 기준 추출부(125)는 평가 기준 사례 베이스를 기반으로 벡터화된 각 평가 기준 항목(단어) 각각에 대한 태그 정보를 획득할 수 있다. 이어, 평가 기준 추출부(125)는 각 평가 기준 항목(단어)들을 언급 비율(빈도)가 높은 순으로 정렬할 수 있다. 이때, 평가 기준 추출부(125)는 정렬된 각 평가 기준 항목들을 각 태그 정보에 따라 상이한 기준으로 선별하여 최종 평가 기준 항목을 추출할 수 있다. For example, the evaluation criterion extraction unit 125 may obtain tag information for each evaluation criterion item (word) vectorized based on the evaluation criterion case base. Subsequently, the evaluation criterion extracting unit 125 may sort each evaluation criterion item (word) in the order of the highest mention ratio (frequency). In this case, the evaluation criterion extracting unit 125 may extract a final evaluation criterion item by selecting each sorted evaluation criterion item according to different criteria according to each tag information.

예를 들어, 평가 기준 후보군에 포함된 각 평가 기준 항목의 태그 정보는 "0", "1, "2"로 예측/분류될 수 있다. 이때, 평가 기준 추출부(125)는 각 태그 정보에 따라 상이한 기준으로 정렬된 각 평가 기준 항목들 중 복수의 평가 기준 항목을 최종적으로 추출할 수 있다. 즉, 평가 기준 항목의 태그 정보가 "0"인 경우, 평가 기준 항목으로 추출되지 않을 수 있다. 또한, 평가 기준 항목의 태그 정보가 "1"인 경우, 평가 기준 추출부(125)는 언급 비율이 10% 이상인 평가 기준 항목들을 선별하며, 평가 기준 항목의 태그 정보가 "2"인 경우 언급 비율이 "1%" 이상인 평가 기준 항목들을 선별하여 최종 평가 기준을 생성할 수 있다. For example, tag information of each evaluation criterion item included in the evaluation criterion candidate group may be predicted/classified as “0”, “1,” “2.” In this case, the evaluation criterion extracting unit 125 is in each tag information. Accordingly, a plurality of evaluation criteria items may be finally extracted from among the evaluation criteria items sorted by different criteria, that is, if the tag information of the evaluation criteria items is “0”, it may not be extracted as the evaluation criteria items. In addition, when the tag information of the evaluation criterion item is "1", the evaluation criterion extraction unit 125 selects the evaluation criterion items with a mention ratio of 10% or more, and if the tag information of the evaluation criterion item is "2", the mention ratio A final evaluation criterion can be created by selecting the evaluation criteria items of this "1%" or more.

도 3은 본 발명의 일 실시예에 따른 상품 평가 기준 생성 방법을 나타낸 순서도이다. 3 is a flowchart showing a method of generating product evaluation criteria according to an embodiment of the present invention.

단계 310에서 상품 평가 기준 생성 장치(100)는 대분류에 상응하는 리뷰 데이터를 평가 기준 사례 베이스 모델을 생성하는 전처리 단계를 수행한다. In step 310, the product evaluation criterion generating apparatus 100 performs a pre-processing step of generating an evaluation criterion case base model for review data corresponding to the large category.

평가 기준 사례 베이스 모델은 소분류에 상응하는 평가 기준 후보군에 대한 태그 예측에 이용되는 레퍼런스 정보로, 소분류보다 광범위한 정보를 포함할 수 있다. 따라서, 평가 기준 사례 베이스 모델은 특정 소분류가 포함된 대분류에 상응하여 생성될 수 있다. 이에 대해서는 도 4를 참조하여 보다 상세히 설명하기로 한다. The evaluation criterion case base model is reference information used to predict a tag for an evaluation criterion candidate group corresponding to a sub-classification, and may include more extensive information than a sub-classification. Thus, the evaluation criteria case base model can be created corresponding to a large category in which a specific subclass is included. This will be described in more detail with reference to FIG. 4.

단계 315에서 상품 평가 기준 생성 장치(100)는 평가 기준 사례 베이스 모델을 이용하여 의뢰된 소분류에 상응하는 리뷰 데이터에 포함된 각 단어에 대한 태그 정보를 예측한 후 태그 정보에 따라 상이한 기준을 적용하여 선별된 단어를 최종 평가 기준 항목으로 추출한다. In step 315, the product evaluation criteria generating apparatus 100 predicts tag information for each word included in the review data corresponding to the requested subclass using the evaluation criteria case base model, and then applies different criteria according to the tag information. The selected words are extracted as final evaluation criteria items.

이에 대해서는 하기에서 도 5를 참조하여 보다 상세히 설명하기로 한다. This will be described in more detail below with reference to FIG. 5.

도 4는 본 발명의 일 실시예에 따른 전처리 과정을 나타낸 순서도이다. 4 is a flow chart showing a pre-processing process according to an embodiment of the present invention.

단계 410에서 상품 평가 기준 생성 장치(100)는 대분류에 포함된 각 중분류에 포함된 각 상품들 리뷰가 많은 상위 n개의 리뷰 데이터를 각각 크롤링한다. In step 410, the product evaluation criterion generating apparatus 100 crawls the top n review data with many reviews of each product included in each middle category included in the large category.

단계 415에서 상품 평가 기준 생성 장치(100)는 크롤링된 리뷰 데이터를 형태소 분석하여 미리 정해진 품사에 해당하는 각 단어를 추출한다. In step 415, the product evaluation criterion generating apparatus 100 morphemes the crawled review data to extract each word corresponding to a predetermined part of speech.

단계 420에서 상품 평가 기준 생성 장치(100)는 추출된 각 단어를 토픽 모델링하여 평가 기준 후보군을 추출한다. In step 420, the product evaluation criterion generating apparatus 100 extracts an evaluation criterion candidate group by topic modeling each extracted word.

단계 425에서 상품 평가 기준 생성 장치(100)는 추출된 평가 기준 후보군에 포함된 각 평가 기준 항목(단어)를 벡터화하여 벡터 공간에 워드 임베딩한다. 예를 들어, FastText 알고리즘을 아용하여 각 평가 기준 항목(단어)를 벡터화할 수 있다. In step 425, the product evaluation criterion generating apparatus 100 vectorizes each evaluation criterion item (word) included in the extracted evaluation criterion candidate group and embeds a word in the vector space. For example, the FastText algorithm can be used to vectorize each criterion item (word).

단계 430에서 상품 평가 기준 생성 장치(100)는 평가 기준 후보군에 포함된 각 평가 기준 항목을 미리 설정된 기준에 따라 태그 정보를 부여한다. 이는 이미 전술한 바와 같으므로 중복되는 설명은 생략하기로 한다. In step 430, the product evaluation criterion generating apparatus 100 assigns tag information to each evaluation criterion item included in the evaluation criterion candidate group according to a preset criterion. Since this has already been described above, a duplicate description will be omitted.

단계 435에서 상품 평가 기준 생성 장치(100)는 벡터화된 평가 기준 후보군과 각각의 태그 정보를 KNN 알고리즘 적용하여 대분류에 따른 평가 기준 사례 베이스 모델을 생성한다. In step 435, the product evaluation criterion generating apparatus 100 generates an evaluation criterion case base model according to the major classification by applying the vectorized evaluation criterion candidate group and each tag information with the KNN algorithm.

이에 대한 상세한 설명은 도 1에서 설명한 바와 동일하므로 중복되는 설명은 생략하기로 한다. A detailed description thereof is the same as that described in FIG. 1, and thus, a duplicate description will be omitted.

도 5는 본 발명의 일 실시예에 따른 후처리 과정인 소분류에 상응하는 평가 기준 추출 과정을 나타낸 순서도이다. 5 is a flowchart illustrating a process of extracting evaluation criteria corresponding to sub-classification, which is a post-processing process according to an embodiment of the present invention.

단계 510에서 상품 평가 기준 생성 장치(100)는 의뢰된 소분류에 상응하는 리뷰 데이터를 크롤링한다. 이미 전술한 바와 같이, 소분류에 포함된 상품 중 리뷰가 많은 n개의 상품에 대한 리뷰 데이터를 크롤링한다. In step 510, the product evaluation criterion generating apparatus 100 crawls review data corresponding to the requested sub-classification. As already described above, review data for n products with many reviews among products included in the sub-class are crawled.

단계 515에서 상품 평가 기준 생성 장치(100)는 크롤링된 리뷰 데이터를 형태소 단위로 분석하여 미리 설정된 품사에 해당하는 단어를 각각 추출한다. In step 515, the product evaluation criteria generating apparatus 100 analyzes the crawled review data in morpheme units and extracts words corresponding to preset parts of speech.

단계 520에서 상품 평가 기준 생성 장치(100)는 추출된 각 단어를 토픽 모델링하여 평가 기준 후보군을 추출한다. 즉, 상품 평가 기준 생성 장치(100)는 LDA 알고리즘을 적용하여 각 단어를 토픽 모델링함으로써 평가 기준 후보군을 추출할 수 있다. In step 520, the product evaluation criterion generating apparatus 100 extracts an evaluation criterion candidate group by topic modeling each extracted word. That is, the product evaluation criterion generating apparatus 100 may extract an evaluation criterion candidate group by topic modeling each word by applying an LDA algorithm.

단계 525에서 상품 평가 기준 생성 장치(100)는 평가 기준 후보군에 포함된 각 평가 기준 항목(단어)를 벡터화하여 워드 임베딩한다. FastText 알고리즘, word2vec 등을 이용하여 각 단어를 벡터화하여 벡터 공간에 임베딩하는 방법 자체는 당업자에게 자명한 사항이므로 이에 대한 별도의 설명은 생략하기로 한다. In step 525, the product evaluation criterion generating apparatus 100 vectorizes each evaluation criterion item (word) included in the evaluation criterion candidate group and embeds a word. A method of vectorizing each word using FastText algorithm, word2vec, etc. and embedding it in a vector space is self-evident to those skilled in the art, so a separate description thereof will be omitted.

단계 530에서 상품 평가 기준 생성 장치(100)는 평가 기준 사례 베이스 모델을 이용하여 벡터화된 평가 기준 후보군에 포함된 각 단어(즉, 평가 기준 항목)에 대한 태그 정보를 예측한다. In step 530, the product evaluation criterion generating apparatus 100 predicts tag information for each word (ie, evaluation criterion item) included in the vectorized evaluation criterion candidate group using the evaluation criterion case base model.

단계 535에서 상품 평가 기준 생성 장치(100)는 예측된 태그 정보에 따라 상이한 기준을 적용하여 평가 기준 후보군 중 최종 평가 기준 항목을 추출한다. 예를 들어, 상품 평가 기준 생성 장치(100)는 태그 정보에 따라 평가 기준 후보군 중 일부는 평가 기준 후보에서 제외시키며, 태그 정보에 따라 상이한 기준으로 언급 비율(빈도)에 따라 평가 기준 항목을 추출하여 최종 평가 기준 항목으로 추출할 수 있다. In step 535, the product evaluation criteria generating apparatus 100 extracts a final evaluation criteria item from the evaluation criteria candidate group by applying different criteria according to the predicted tag information. For example, the product evaluation criteria generating apparatus 100 excludes some of the evaluation criteria candidates from the evaluation criteria candidates according to the tag information, and extracts the evaluation criteria items according to the mention ratio (frequency) with different criteria according to the tag information. It can be extracted as a final evaluation criteria item.

예를 들어, 도 6의 610과 같이 평가 기준 후보군이 추출되었다고 가정하기로 한다. 이때, 평가 기준 후보군에 포함된 각 평가 기준 항목이 "0", "1", "2" 중 어느 하나의 태그 정보로 예측된 것을 가정하기로 한다. For example, it is assumed that an evaluation criterion candidate group has been extracted as shown in 610 of FIG. 6. In this case, it is assumed that each evaluation criterion item included in the evaluation criterion candidate group is predicted as tag information of any one of "0", "1", and "2".

상품 평가 기준 생성 장치(100)는 평가 기준 후보군에 포함된 각 평가 기준 항목 중 태그 정보가 "0"인 평가 기준 항목은 최종 선별에서 제외할 수 있다. 또한, 상품 평가 기준 생성 장치(100)는 평가 기준 후보군에 포함된 각 평가 기준 항목 중 태그 정보가 "1"인 평가 기준 항목은 언급 비율에 따라 정렬한 후 언급 비율이 10% 이상인 경우 최종 평가 기준 항목으로 추출될 수 있다. The product evaluation criterion generating apparatus 100 may exclude an evaluation criterion item having tag information of "0" among each evaluation criterion item included in the evaluation criterion candidate group from the final selection. In addition, the product evaluation criterion generating device 100 sorts the evaluation criterion items with tag information of "1" among the evaluation criterion items included in the evaluation criterion candidate group according to the mention ratio, and the final evaluation criterion when the mention ratio is 10% or more. Can be extracted as an item.

또한, 상품 평가 기준 생성 장치(100)는 평가 기준 후보군에 포함된 각 평가 기준 항목 중 태그 정보가 "2"인 평가 기준 항목은 언급 비율에 따라 정렬한 후 언급 비율이 1% 이상인 경우 최종 평가 기준 항목으로 추출될 수 있다. In addition, the product evaluation criterion generating device 100 arranges the evaluation criterion items with tag information of "2" among the evaluation criterion items included in the evaluation criterion candidate group according to the mention ratio, and if the mention ratio is 1% or more, the final evaluation criterion Can be extracted as an item.

이와 같이, 상품 평가 기준 생성 장치(100)는 평가 기준 사례 베이스 모델을 통해 평가 기준 후보군에 포함된 각 평가 기준 항목에 대한 태그 정보를 예측한 후 태그 정보에 따라 상이한 기준을 적용하여 최종 평가 기준 항목을 추출할 수 있다. As described above, the product evaluation criteria generating apparatus 100 predicts tag information for each evaluation criteria item included in the evaluation criteria candidate group through the evaluation criteria case base model, and then applies different criteria according to the tag information to determine the final evaluation criteria items. Can be extracted.

본 발명의 실시 예에 따른 장치 및 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 통상의 기술자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The apparatus and method according to an embodiment of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. Program instructions recorded on a computer-readable medium may be specially designed and configured for the present invention, or may be known to and usable by those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes hardware devices specially configured to store and execute program instructions such as magneto-optical media and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler but also high-level language codes that can be executed by a computer using an interpreter or the like.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The above-described hardware device may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at around the embodiments. Those of ordinary skill in the art to which the present invention pertains will be able to understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from an illustrative point of view rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

100: 평가 기준 생성 장치
110: 전처리 모델부
120: 평가 기준 추출 모델부100: evaluation criteria generating device
110: pre-processing model unit
120: evaluation criteria extraction model unit

Claims

In the product evaluation standard generation method in the evaluation standard generation device,
(a) crawling review data on products included in the sub-classification;
(b) morphologically analyzing the crawled review data and extracting words for each preset part of speech;
(c) subject modeling each of the extracted words to extract an evaluation criterion candidate group; And
(d) including the step of selecting an evaluation criterion item from the extracted evaluation criterion candidate group using an evaluation criterion case base model for a parasitic large category and extracting it as a final evaluation criterion item,
The generation of the evaluation criteria case base model,
Crawling review data according to the major classification; Extracting each word by morphologically analyzing the crawled review data;
Extracting a pre-evaluation criterion candidate group by topic modeling the extracted words;
Assigning tag information to each evaluation criterion item included in the preliminary evaluation criterion candidate group according to a preset criterion;
Vectorizing each evaluation criterion item included in the preliminary evaluation criterion candidate group and embedding a word; And
And applying the vectorized pre-evaluation criterion candidate group and each tag information to a KNN (K-Nearest Neighbor) algorithm to generate an evaluation criterion case base model according to the major classification.

The method of claim 1,
The step (d),
Predicting tag information for each evaluation criterion item included in the extracted evaluation criterion candidate group based on the evaluation criterion case base model; And
And extracting a final evaluation criterion item according to a frequency of mention among the evaluation criterion items for which the tag information is predicted.

The method of claim 2,
Extracting the final evaluation criteria item,
In extracting the final evaluation criterion item according to the mention frequency among the evaluation criterion items for which the tag information is predicted, a final evaluation criterion item is extracted according to the mention frequency by applying different criteria according to the tag information. How to create product evaluation criteria.

The method of claim 3,
Among the evaluation criteria items included in the evaluation criteria candidate group, evaluation criteria items in which the tag information is predicted as a first value are not extracted as the final evaluation criteria items,
Evaluation criteria items in which the tag information is predicted as a second value are selected according to the mention ratio by applying the first criterion,
The evaluation criterion items predicted as the third value of the tag information are selected according to the mention ratio by applying the second criterion.

The method of claim 1,
Before step (d),
The method of generating a product evaluation criterion further comprising the step of word embedding by vectorizing each evaluation criterion item included in the evaluation criterion candidate group.

delete

A computer-readable recording medium on which a program code for performing the method according to claim 1 is recorded.

A preprocessing model unit that generates an evaluation reference case base model based on review data corresponding to the major classification; And
Evaluation criteria for extracting selected words as final evaluation criteria items by predicting tag information for each word included in review data corresponding to sub-classification using the evaluation criteria case base model, and then applying different criteria according to the tag information Including an extraction model part,
The pre-processing model unit,
A collection unit that crawls review data corresponding to the major classification;
An analysis unit for extracting each word by morpheme analysis of the crawled review data;
A topic modeling unit that models each of the extracted words to topic and extracts a candidate group of pre-evaluation criteria;
A weight application unit that assigns tag information to each evaluation criterion item included in the pre-evaluation criterion candidate group according to a preset criterion;
A word embedding unit for word embedding by vectorizing each evaluation criterion item included in the pre-evaluation criterion candidate group; And
And a model generator configured to generate an evaluation criterion case base model according to a major classification by applying the vectorized preliminary evaluation criterion candidate group and each tag information to a K-Nearest Neighbor (KNN) algorithm.

The method of claim 8,
The evaluation criteria extraction model unit,
A collection unit for crawling review data on products included in the sub-classification;
An analysis unit that morphemes the crawled review data and extracts a word for each preset part of speech;
A topic modeling unit for modeling each of the extracted words and extracting an evaluation criterion candidate group; And
And an evaluation criterion extracting unit that selects an evaluation criterion item from the extracted evaluation criterion candidate group using the evaluation criterion case base model and extracts an evaluation criterion item as a final evaluation criterion item.

The method of claim 9,
The evaluation criteria extraction unit,
An apparatus for generating an evaluation criterion, comprising extracting a final evaluation criterion item from the evaluation criterion candidate group by applying different criteria according to the tag information.

The method of claim 10,
The evaluation criteria extraction unit,
Among the evaluation criteria items included in the evaluation criteria candidate group, evaluation criteria items in which the tag information is predicted as a first value are not extracted as the final evaluation criteria items,
Evaluation criteria items in which the tag information is predicted as a second value are selected according to the mention ratio by applying a first criterion, and evaluation criteria items in which the tag information is predicted as a third value are applied to the mention ratio by applying the second criterion. Evaluation criteria generating device, characterized in that the selection according to.

The method of claim 9,
The evaluation criteria extraction model unit,
And a word embedding unit for word embedding by vectorizing each evaluation criterion item included in the evaluation criterion candidate group.

delete