KR102457455B1

KR102457455B1 - Device and Method for Artwork Price Prediction Using Artificial intelligence

Info

Publication number: KR102457455B1
Application number: KR1020210100237A
Authority: KR
Inventors: 김다산; 김상수; 김민석; 이승경
Original assignee: (주)위세아이텍
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2022-10-21

Abstract

A device for predicting the price of an artwork comprises: an information collection unit which collects artwork information including the image, artist, and price of each of a plurality of artworks; a learning data generating unit which generates artificial intelligence learning data corresponding to the plurality of artworks based on the artwork information; a model generating unit which generates a price prediction model from the artificial intelligence learning data by using at least one artificial intelligence learning algorithm; and a price prediction unit which predicts the price of an artwork input by using the price prediction model.

Description

Device and Method for Artwork Price Prediction Using Artificial intelligence}

본원은 인공지능을 활용한 국내 미술품 가격 예측 장치 및 방법에 관한 것이다.This application relates to an apparatus and method for predicting the price of domestic art works using artificial intelligence.

미술품은 유일무이한 특성이 있다. 이러한 미술품의 유일무이한 특성은 재테크 수단으로서 활용되고 있다. A work of art has a unique characteristic. The unique characteristics of these works of art are being utilized as a means of financial technology.

특히, 미술품은 주로 경매를 통해 미술품 시장을 형성시키고 있다. 이와 관련하여, 미술품은 미술품의 거래 이후 일정기간 후에 경매에서 재판매가 이루어지고 있다.In particular, works of art form the art market mainly through auctions. In this regard, art works are re-sold at auctions after a certain period of time after the art works are traded.

재판매 가격은 가격 결정 과정에서 다양한 요인으로 영향을 받는다. 미술품의 낙찰 추정가는 경매 전 미술품 전문가를 통해 다양한 요인을 분석하여 산정된다.The resale price is influenced by a number of factors in the pricing process. The estimated successful bid price of an artwork is calculated by analyzing various factors through an art expert before the auction.

그러나, 미술품 경매 낙찰 추정가의 산정은 이해관계자의 니즈와 견해에 따라 결정된다. 이로 인해, 전문가의 낙찰 추정가에서도 정확률이 낮은 단점이 있다. 그 결과, 경매의 높은 구매 수수료율과 더불어 미술품이 투자로서 손실로 이어질 수 밖에 없다.However, the estimation of the successful bid price for an art auction is determined according to the needs and opinions of stakeholders. For this reason, there is a disadvantage that the accuracy rate is low even in the expert's estimate of the successful bid. As a result, in addition to the high purchase commission rate of the auction, the artwork inevitably leads to a loss as an investment.

기존의 미술품의 가격을 예측하기 위한 방법으로는 반복매매 회귀분석이나 헤도닉 가격 모형을 적용할 수 있다. 그러나 미술품의 특성상 동일한 작품이 현저히 적고 반복적인 매매가 발생하지 않아 낮은 정확률의 한계를 보이고 있다.As a method for predicting the price of an existing artwork, repeat sales regression analysis or hedonic price model can be applied. However, due to the characteristics of art works, the number of identical works is significantly less and repeated sales do not occur, so the low accuracy rate is limited.

이에, 본원은 미술품 가격 측정에서 정성적 기법이 아닌 미술품 가격의 영향을 미치는 데이터들을 수집한 후, 객관적으로 분석하여 인공지능으로 국내 미술품 가격을 예측하는데 목적이 있다. Therefore, the purpose of this study is to predict the domestic art price with artificial intelligence by objectively analyzing data that is not a qualitative technique in art price measurement and then objectively analyzing it.

본원의 배경이 되는 기술은 한국특허공개공보 제10-2020-0052431호에 개시되어 있다.The technology that is the background of the present application is disclosed in Korean Patent Application Laid-Open No. 10-2020-0052431.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 결과적으로 미술품 가격의 신뢰도 저하를 가져오는 부족한 국내 미술 데이터와 폐쇄성을 가진 미술품 정보를 다양한 관점으로 분석하는 것을 목적으로 한다.The purpose of the present application is to solve the problems of the prior art described above, and as a result, to analyze insufficient domestic art data and art information with closedness, which leads to a decrease in the reliability of art prices, from various viewpoints.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 미술품의 가격에 영향을 끼치는 다양한 요인을 정의하여 미술품의 가격에 영향을 끼치는 데이터를 수집할 수 있다. 이와 관련해 국내 미술품 가격을 예측하는 인공지능을 적용한 장치 및 방법을 제공하려는 것을 목적으로 한다.The present application is intended to solve the problems of the prior art described above, and it is possible to collect data affecting the price of an artwork by defining various factors affecting the price of the artwork. In this regard, the purpose is to provide a device and method to which artificial intelligence is applied to predict the price of domestic artworks.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 미술품 가격 예측 시 과거 거래 정보와 경제 지표, 자연어 처리, 이미지 처리가 된 데이터를 선별하여 학습을 수행함으로써, 향후 미술품 거래 가격을 예측하는 장치 및 방법을 제공하려는 것을 목적으로 한다. The present application is to solve the problems of the prior art described above, and by performing learning by selecting past transaction information, economic indicators, natural language processing, and image processing data when predicting the price of an artwork, a device for predicting the transaction price of an artwork in the future; The purpose is to provide a method.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들도 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problems to be achieved by the embodiment of the present application are not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시 예에 따른 미술품의 가격을 예측하는 장치는 복수의 미술품 각각의 이미지, 작가 및 가격을 포함하는 미술품 정보를 수집하는 정보 수집부, 상기 미술품 정보에 기초하여 상기 복수의 미술품에 대응하는 인공지능 학습데이터를 생성하는 학습데이터 생성부, 적어도 하나 이상의 인공지능 학습 알고리즘을 이용하여 상기 인공지능 학습데이터로부터 가격 예측 모델을 생성하는 모델 생성부, 상기 가격 예측 모델을 이용하여 입력된 미술품의 가격을 예측하는 가격 예측부를 포함할 수 있다. As a technical means for achieving the above technical problem, the apparatus for predicting the price of an artwork according to an embodiment of the present application includes an information collection unit that collects artwork information including images, artists, and prices of each of a plurality of artworks; A learning data generating unit for generating artificial intelligence learning data corresponding to the plurality of works of art based on the art information, a model generating unit for generating a price prediction model from the artificial intelligence learning data using at least one or more artificial intelligence learning algorithms, It may include a price prediction unit for predicting the price of the input art work by using the price prediction model.

또한, 본원의 일 실시 예에 따르면, 상기 학습데이터 생성부는 상기 미술품 정보를 이용하여 상기 복수의 미술품 각각의 가격에 영향을 주는 복수의 인자를 추출하는 영향 인자 추출부, 상기 미술품 정보 및 상기 복수의 인자에 기초하여 상기 복수의 미술품에 대응하는 인공지능 학습데이터를 생성하는 생성부를 포함할 수 있다. In addition, according to an embodiment of the present application, the learning data generation unit is an influence factor extractor for extracting a plurality of factors affecting the price of each of the plurality of artworks by using the artwork information, the artwork information and the plurality of It may include a generation unit that generates artificial intelligence learning data corresponding to the plurality of artworks based on the factor.

또한, 본원의 일 실시 예에 따르면, 상기 생성부는 상기 미술품 정보에 포함된 텍스트를 수치화하여 제 1 특징을 생성하는 텍스트 처리부, 상기 미술품 정보에 포함된 이미지로부터 추출한 색상을 이용하여 제 2 특징을 생성하는 이미지 처리부, 상기 제 1 특징 및 상기 제 2 특징을 이용하여 인공지능 학습데이터를 생성하는 학습데이터 처리부를 포함할 수 있다. In addition, according to an embodiment of the present application, the generation unit generates a second characteristic using a text processing unit that digitizes text included in the artwork information to generate a first characteristic, and a color extracted from an image included in the artwork information. and an image processing unit that generates artificial intelligence learning data by using the first characteristic and the second characteristic.

또한, 본원의 일 실시 예에 따르면, 상기 생성부는 상기 복수의 인자들 중 적어도 둘 이상 상호간의 중요도를 결정하고, 결정한 중요도에 기초하여 상기 복수의 미술품에 대응하는 인공지능 학습데이터를 생성할 수 있다.In addition, according to an embodiment of the present application, the generator may determine the mutual importance of at least two or more of the plurality of factors, and generate artificial intelligence learning data corresponding to the plurality of artworks based on the determined importance. .

또한, 본원의 일 실시 예에 따르면, 상기 텍스트는 벡터화될 수 있다. Also, according to an embodiment of the present application, the text may be vectorized.

또한, 본원의 일 실시 예에 따르면, 상기 이미지 처리부는 상기 미술품 정보에 포함된 이미지로부터 추출한 복수의 색상들 각각의 우선순위에 기초하여 상기 복수의 색상들 중 적어도 하나 이상의 주요 색상을 선택하고, 선택한 적어도 하나의 주요 색상에 기초하여 제 2 특징을 생성할 수 있다.In addition, according to an embodiment of the present application, the image processing unit selects at least one main color among the plurality of colors based on the priority of each of the plurality of colors extracted from the image included in the artwork information, The second characteristic may be generated based on the at least one primary color.

또한, 본원의 일 실시 예에 따르면, 상기 이미지 처리부는 상기 추출한 복수의 색상들의 히스토그램에 기초하여 상기 우선순위를 결정할 수 있다. Also, according to an embodiment of the present disclosure, the image processing unit may determine the priority based on the histogram of the plurality of colors extracted.

또한, 본원의 일 실시 예에 따르면, 상기 이미지 처리부는 상기 미술품 정보에 포함된 이미지로부터 추출한 색상, 채도, 명도를 이용하여 제 2 특징을 생성할 수 있다. In addition, according to an embodiment of the present application, the image processing unit may generate the second feature by using the hue, saturation, and brightness extracted from the image included in the artwork information.

또한, 본원의 일 실시 예에 따르면, 상기 이미지 처리부는 상기 미술품 정보에 포함된 이미지의 복수의 픽셀들 각각의 색상을 이용하여 제 2 특징을 생성할 수 있다. Also, according to an embodiment of the present disclosure, the image processing unit may generate a second feature by using a color of each of a plurality of pixels of an image included in the artwork information.

또한, 본원의 일 실시 예에 따르면, 상기 모델 생성부는 제 1 인공지능 알고리즘을 이용하여 상기 인공지능 학습데이터로부터 제 1 가격 예측 모델을 생성하되, 제 2 인공지능 알고리즘을 이용하여 상기 인공지능 학습데이터로부터 제 2 가격 예측 모델을 생성할 수 있다. In addition, according to an embodiment of the present application, the model generating unit generates a first price prediction model from the artificial intelligence learning data using a first artificial intelligence algorithm, but using a second artificial intelligence algorithm, the artificial intelligence learning data A second price prediction model can be generated from

또한, 본원의 일 실시 예에 따르면, 상기 생성부는 상기 복수의 인자 중 제 1 그룹에 포함된 적어도 하나의 인자에 기초하여 상기 복수의 미술품에 대응하는 제 1 인공지능 학습데이터를 생성하고, 상기 복수의 인자 중 제 2 그룹에 포함된 적어도 하나의 인자에 기초하여 상기 복수의 미술품에 대응하는 제 2 인공지능 학습데이터를 생성하되, 상기 모델 생성부는 상기 인공지능 알고리즘을 이용하여 상기 제 1 인공지능 학습데이터로부터 제 1 가격 예측 모델을 생성하되, 상기 인공지능 알고리즘을 이용하여 상기 제 2 인공지능 학습데이터로부터 제 2 가격 예측 모델을 생성할 수 있다.In addition, according to an embodiment of the present application, the generating unit generates first AI learning data corresponding to the plurality of artworks based on at least one factor included in the first group among the plurality of factors, and the plurality of Generates second AI learning data corresponding to the plurality of artworks based on at least one factor included in the second group among factors of, wherein the model generator learns the first AI using the AI algorithm A first price prediction model may be generated from data, and a second price prediction model may be generated from the second AI training data using the AI algorithm.

또한, 본원의 일 실시 예에 따르면, 상기 가격 예측부는 상기 제 1 가격 예측 모델의 제 1 예측 결과와 상기 제 2 가격 예측 모델의 제 2 예측 결과를 비교하여 상기 미술품의 가격을 예측할 수 있다. Also, according to an embodiment of the present disclosure, the price prediction unit may predict the price of the artwork by comparing the first prediction result of the first price prediction model and the second prediction result of the second price prediction model.

또한, 본원의 일 실시 예에 따르면, 상기 생성부는 상기 복수의 인자들 각각에 양의 가중치 또는 음의 가중치를 적용하고, 상기 적용 결과에 따라 상기 복수의 인자들 중 적어도 둘 이상 상호간의 중요도를 결정하고, 결정한 중요도에 기초하여 상기 복수의 미술품에 대응하는 인공지능 학습데이터를 생성할 수 있다.Also, according to an embodiment of the present disclosure, the generator applies a positive weight or a negative weight to each of the plurality of factors, and determines the mutual importance of at least two or more of the plurality of factors according to the application result. and, based on the determined importance, artificial intelligence learning data corresponding to the plurality of artworks may be generated.

또한, 본원의 일 실시 예에 따르면, 상기 모델 생성부는, 상기 결정한 중요도에 기초하여 상기 인공지능 학습데이터로부터 가격 예측 모델을 생성할 수 있다. Also, according to an embodiment of the present application, the model generator may generate a price prediction model from the artificial intelligence learning data based on the determined importance.

또한, 본원의 일 실시 예에 따르면, 상기 복수의 인자는 작가 생존 유무, 작가의 기존 미술품 개수, 미술품 공개 기간, 작가의 미술품들의 우선순위, 작가 경력, 작가의 미술품들의 가격 변화범위, 작가의 전공, 작가의 수상경력, 작가의 인지도 지수, 전시된 미술관의 등급, 관련 기사 정보, 미술품 크기 및 전시된 미술관의 방문객 정보 중 적어도 하나 이상을 포함할 수 있다.In addition, according to an embodiment of the present application, the plurality of factors include the existence of the artist, the number of the artist's existing artworks, the duration of the artwork, the priority of the artist's artworks, the writer's career, the price change range of the artist's artworks, and the artist's major , the artist's award history, the artist's recognition index, the grade of the displayed museum, related article information, the size of the artwork, and visitor information of the displayed museum may include at least one or more.

또한, 본원의 일 실시 예에 따르면, 미술품의 가격을 예측하는 단계는 복수의 미술품 각각의 이미지, 작가 및 가격을 포함하는 미술품 정보를 수집하는 단계, 상기 미술품 정보에 기초하여 상기 복수의 미술품에 대응하는 인공지능 학습데이터를 생성하는 단계, 적어도 하나 이상의 인공지능 학습 알고리즘을 이용하여 상기 인공지능 학습데이터로부터 가격 예측 모델을 생성하는 단계, 상기 가격 예측 모델을 이용하여 입력된 미술품의 가격을 예측할 수 있다.In addition, according to an embodiment of the present application, predicting the price of the artwork is a step of collecting artwork information including images, artists, and prices of each of the plurality of artworks, and corresponding to the plurality of artworks based on the artwork information generating an artificial intelligence learning data to, generating a price prediction model from the artificial intelligence learning data using at least one artificial intelligence learning algorithm, and predicting the price of an input artwork using the price prediction model .

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present application. In addition to the exemplary embodiments described above, additional embodiments may exist in the drawings and detailed description.

전술한 본원의 과제 해결 수단에 의하면, 본원은 전통적 가격 예측 방법의 낮은 정확도와 전문가에 의존하는 미술품 경매 낙찰 추정가의 정성적 기법의 한계를 극복할 수 있다.According to the above-described problem solving means of the present application, the present application can overcome the low accuracy of the traditional price prediction method and the limitation of the qualitative technique of the art auction successful bid estimate depending on the expert.

전술한 본원의 과제 해결 수단에 의하면, 본원은 미술품 가치 판단요인을 분석하여 데이터를 정량화하여 인공지능 알고리즘 적용을 통해 부족한 데이터를 기술적으로 해결할 수 있으며, 이를 통해 미술품 가치에 대한 예측 정확도를 확보 및 신뢰도를 향상시킬 수 있다.According to the above-mentioned problem solving means of the present application, the present institute can technically solve the insufficient data through the application of artificial intelligence algorithm by analyzing the factors determining the value of the art work and quantifying the data, and through this, the prediction accuracy for the value of the art work is secured and reliability can improve

도 1은 본원의 일 실시예에 따른 인공지능 기반의 미술품 가격 예측 방법의 개략적인 시스템이다.
도 2는 본원의 일 실시예에 따른 도1의 학습데이터 생성부(1200) 내 에 해당하는 개략적인 시스템이다.
도 3은 본원의 일 실시예에 따른 도2의 생성부(1220) 내에 해당하는 개략적인 시스템이다.
도 4는 본원의 일 실시예에 따른 인공지능 기반의 미술품 가격 예측 방법의 흐름도이다.
도 5는 본원의 일 실시예에 따른 인공지능 기반의 미술품 가격 예측의 일예다.1 is a schematic system of an artificial intelligence-based art price prediction method according to an embodiment of the present application.
2 is a schematic system corresponding to the learning data generation unit 1200 of FIG. 1 according to an embodiment of the present application.
3 is a schematic system corresponding to the generator 1220 of FIG. 2 according to an embodiment of the present application.
4 is a flowchart of an artificial intelligence-based art price prediction method according to an embodiment of the present application.
5 is an example of artificial intelligence-based art price prediction according to an embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art to which the present application pertains can easily implement them. However, the present application may be implemented in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present application in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is said to be "connected" with another part, it includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when it is said that a member is positioned "on", "on", "on", "under", "under", or "under" another member, this means that a member is located on the other member. It includes not only the case where they are in contact, but also the case where another member exists between two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

도 1은 본원의 일 실시예에 따른 개략적인 인공지능 기반의 미술품 가격 예측 장치이다. 도 1을 참조하면, 미술품 가격 예측 장치(1000)는 정보 수집부(1100), 학습데이터 생성부(1200), 모델 생성부(1300), 가격 예측부(1400)를 포함할 수 있다. 1 is a schematic artificial intelligence-based art price prediction apparatus according to an embodiment of the present application. Referring to FIG. 1 , the art price prediction apparatus 1000 may include an information collection unit 1100 , a learning data generation unit 1200 , a model generation unit 1300 , and a price prediction unit 1400 .

본원의 일 실시예에 따른 정보 수집부 (1100)는 복수의 미술품 각각의 이미지, 작가 및 가격을 포함하는 미술품 정보를 수집할 수 있다. 이 때, 여기서 미술품 정보는 미리 설정된 기간 내에서 수집 될 수 있다. 또한, 미술품 정보는 가격에 영향을 미치는 정보를 포함할 수 있다. 또한, 미술품 정보는 외부의 데이터가 업데이트가 될 때, 미술품 정보도 함께 업데이트 될 수 있다. 예를 들어, 미술품 정보는 기존의 경매 정보 이외에도 최근 경매에서 이루어진 정보도 포함할 수 있다. The information collection unit 1100 according to an embodiment of the present application may collect artwork information including images, artists, and prices of a plurality of artworks. At this time, the art information here may be collected within a preset period. In addition, the art information may include information affecting the price. Also, when external data is updated, the art information may be updated together with the art information. For example, the art information may include information made in a recent auction in addition to the existing auction information.

정보 수집부(1100)는 데이터 분석에 필요한 데이터 셋트를 추출하기 위해 사용자로부터 입력받은 특정 키워드 및 기사 게재기간을 조건으로 수집 가능한 기사를 검색하고, 검색된 모든 기사의 텍스트 데이터를 수집할 수 있다. 정보 수집부(1100)는 사용자가 입력한 키워드 및 옵션 값들을 기반으로 URL을 생성하고, 생성한 URL을 사용하여 웹에 게재된 기사를 검색할 수 있다. 이 후 검색된 기사의 목록 및 기사 본문을 포함하는 HTML 파일을 웹으로부터 가져올 수 있다. 또한, 정보 수집부(1100)는 한글 지원을 위한 인코딩을 수행하고, HTML 파일 내용 중 기사의 제목 및 내용에 해당하는 텍스트 데이터를 추출하여, 추출된 텍스트 데이터들을 데이터베이스(미도시)에 저장할 수 있다.The information collection unit 1100 may search for collectible articles based on a specific keyword input from a user and an article publication period in order to extract a data set required for data analysis, and collect text data of all the searched articles. The information collection unit 1100 may generate a URL based on keywords and option values input by the user, and search for articles published on the web using the generated URL. Thereafter, an HTML file containing a list of retrieved articles and the article body can be imported from the web. In addition, the information collection unit 1100 may perform encoding for Korean support, extract text data corresponding to the title and content of an article from among HTML file contents, and store the extracted text data in a database (not shown). .

본원의 일 실시예에 따른 학습데이터 생성부(1200)는 미술품 정보에 기초하여 복수의 미술품에 대응하는 인공지능 학습데이터를 생성할 수 있다. 일 예로, 인공지능 학습데이터는 작가의 정보를 기초로 미술품 이름, 가격, 거래 날짜, 거래 장소 중 하나를 포함할 수 있다. 또한, 학습데이터 생성부(1200)는 미술품 가치 판단요인을 분석하여 데이터를 정량화할 수 있다. The learning data generating unit 1200 according to an embodiment of the present application may generate artificial intelligence learning data corresponding to a plurality of works of art based on the art information. As an example, the artificial intelligence learning data may include one of an artwork name, a price, a transaction date, and a transaction location based on the artist's information. In addition, the learning data generating unit 1200 may quantify the data by analyzing the art value judgment factors.

학습데이터 생성부(1200)는 미리 설정된 복수의 인자에 기초하여 크롤링한 웹 페이지를 전처리 할 수 있다. 보다 구체적으로, 학습데이터 생성부(1200)는 미리 설정된 복수의 인자에 기초하여 크롤링한 웹 페이지에 포함된 기사의 텍스트 데이터를 전처리할 수 있다. 학습데이터 생성부(1200)는 미리 설정된 복수의 인자를 포함하는 기사의 텍스트 데이터와 복수의 인자를 포함하지 않는 기사의 텍스트 데이터를 분류할 수 있다. 또한, 학습데이터 생성부(1200)는 미리 설정된 복수의 인자를 제외 한 기사의 텍스트 데이터에서 텍스트 요소를 제거할 수 있다.The learning data generator 1200 may pre-process the crawled web page based on a plurality of preset factors. More specifically, the learning data generation unit 1200 may pre-process text data of articles included in the crawled web page based on a plurality of preset factors. The learning data generator 1200 may classify text data of articles including a plurality of preset factors and text data of articles not including a plurality of factors. In addition, the learning data generating unit 1200 may remove a text element from the text data of the article except for a plurality of preset factors.

학습데이터 생성부(1200)는 전처리된 기사의 텍스트 데이터를 기반으로 데이터 분석에서 불필요한 조사, 형용사, 부사, 등을 제외한 복수의 인자와 관련된 체언을 추출할 수 있다. 본원의 일 실시예에 따르면, 학습데이터 생성부(1200)는 체언 추출 과정을 통해 추출된 체언으로 구성된 단어 셋트 중 두 글자 이상의 단어를 분류하여 데이터베이스(미도시)에 저장 할 수 있다.The learning data generating unit 1200 may extract proverbs related to a plurality of factors excluding unnecessary investigations, adjectives, adverbs, and the like in data analysis based on the text data of the pre-processed article. According to an embodiment of the present application, the learning data generating unit 1200 may classify words of two or more letters from among the word sets composed of the dialects extracted through the dialect extraction process and store them in a database (not shown).

학습데이터 생성부(1200)는 데이터베이스(미도시)에 포함된 데이터 셋에서 데이터를 선택하고 전처리를 수행할 수 있다. 학습데이터 생성부(1200)는 데이터베이스(미도시)에 저장된 데이터 셋의 복수의 칼럼 중 데이터 군집화를 수행할 칼럼을 선택하여 결정할 수 있다. The learning data generator 1200 may select data from a data set included in a database (not shown) and perform pre-processing. The learning data generator 1200 may select and determine a column to perform data clustering among a plurality of columns of a data set stored in a database (not shown).

학습데이터 생성부(1200)는 선택된 칼럼에 복수의 인자를 포함하지 않는 데이터를 제거하는 전처리를 수행할 수 있다. 달리 말해, 학습데이터 생성부(1200)는 결정된 칼럼의 데이터의 중복처리 및 Null 값(용어를 포함하지 않는 데이터)을 제거하는 전처리 과정을 수행할 수 있다.The training data generating unit 1200 may perform preprocessing of removing data that does not include a plurality of factors in the selected column. In other words, the training data generating unit 1200 may perform a preprocessing process of redundant processing of data of the determined column and removing null values (data not including terms).

학습데이터 생성부(1200)는 형태가 완전히 일치하는 용어의 경우 군집화가 불필요하며, 공백에 해당하는 Null값(용어를 포함하지 않는 데이터) 또한 용어 군집화가 불필요한 데이터이므로, 학습데이터 생성부(1200)는 Null값을 제거할 수 있다. 또한, 사용자는 필요에 따라 Null 값(용어를 포함하지 않는 데이터)을 다른 용어로 대체할 수 있다. 달리 말해, 학습데이터 생성부(1200)는 사용자로부터 제공받은 대체 용어를 포함하지 않는 데이터에 입력할 수 있다.The training data generation unit 1200 does not require clustering in the case of terms that completely match the form, and null values corresponding to blanks (data that do not include the term) are also data that do not require term clustering, so the training data generation unit 1200 can remove null values. In addition, the user can substitute a null value (data that does not contain a term) with another term as needed. In other words, the learning data generator 1200 may input data that does not include an alternative term provided by the user.

학습데이터 생성부(1200)는 분리된 형태소의 빈도와 가중치를 이용하여 군집화 할 용어를 선정할 수 있다. 또한, 학습데이터 생성부(1200)는 가중치와 순위화한 빈도수를 이용하여 군집화 할 용어의 우선 순위를 결정할 수 있다. 또한, 학습데이터 생성부(1200)는 순위화한 형태소와 형태소의 길이를 이용한 가중치를 이용하여 추천 용어의 우선 순위를 결정할 수 있다.The learning data generator 1200 may select a term to be clustered by using the frequency and weight of the separated morphemes. Also, the learning data generation unit 1200 may determine the priority of terms to be clustered by using the weight and the ranked frequency. Also, the learning data generating unit 1200 may determine the priority of the recommended term using the ranked morpheme and a weight using the length of the morpheme.

학습데이터 생성부(1200)는 원본 용어의 음운을 분리하고, 음운으로 분리된 각각의 원본 용어 간의 유사도 연산을 수행할 수 있다. 학습데이터 생성부(1200)는 원본 용어가 한글일 경우, 초성, 중성, 종성으로 한글 자모에 따른 음운으로 분리하고, 인공지능 기반의 알고리즘을 이용하여 유사도를 연산할 수 있다. 학습데이터 생성부(1200)는 Fuzzy Data Matching 알고리즘을 사용하여 각각의 원본 용어 간의 유사도 연산을 수행할 수 있으나, 이에 한정되는 것은 아니다. Fuzzy Data Matching 알고리즘은 편집거리(레펜슈타인, Levenshtein Distance)를 기반으로 계산된 결과값을 사용하여 데이터 간에 매칭을 수행하는 알고리즘이다.The learning data generator 1200 may separate the phonemes of the original terms, and may perform a similarity calculation between the respective original terms separated by phonemes. When the original term is Korean, the learning data generating unit 1200 may divide the original term into phonemes according to the Korean alphabet, including the initial, middle, and final consonants, and calculate the degree of similarity using an AI-based algorithm. The training data generator 1200 may perform a similarity calculation between each original term using a fuzzy data matching algorithm, but is not limited thereto. The Fuzzy Data Matching algorithm is an algorithm that performs matching between data using the result calculated based on the editing distance (Levenshtein Distance).

학습데이터 생성부(1200)는 사용자가 선택한 비정형 데이터 간의 유사도를 계산하여 군집화할 수 있다. 또한, 학습데이터 생성부(1200)는 음운으로 분리된 용어들 간의 유사도를 계산하여 일정 유사도 값을 넘게 되면 용어들의 우선순위에 따라 복수의 인자 용어에 군집화할 수 있다. 달리 말해, 학습데이터 생성부(1200)는 유사도 연산 값이 미리 설정된 임계치 이상인 원본 용어를 군집화할 수 있다. 임계치는 사용자의 편의에 따라 유사도 값의 임계치를 수정 및 변화될 수 있다. The learning data generator 1200 may calculate and group similarities between the unstructured data selected by the user. In addition, the learning data generating unit 1200 may calculate the degree of similarity between the terms separated by phonemes, and when the similarity value exceeds a certain level, the learning data generating unit 1200 may be clustered into a plurality of factor terms according to the priority of the terms. In other words, the learning data generating unit 1200 may cluster original terms whose similarity calculation value is equal to or greater than a preset threshold. The threshold may be modified or changed according to the user's convenience.

본원의 일 실시예에 따른 모델 생성부(1300)는 적어도 하나 이상의 인공지능 학습 알고리즘을 이용하여 인공지능 학습데이터로부터 가격 예측 모델을 생성할 수 있다. 여기서 학습 알고리즘은 다수의 결정 트리들로 조합해서 앙상블로서 학습을 이루는 RandomForest, XGBoost 알고리즘일 수 있다. 이를 통해, 모델 생성부(1300)는 인공지능 학습 알고리즘로서 부족한 미술품 데이터를 기술적으로 해결할 수 있다. 또한, 모델 생성부(1300)는 미술품 가치에 대한 예측 정확도를 확보 및 신뢰도를 향상시킬 수 있다.The model generator 1300 according to an embodiment of the present application may generate a price prediction model from artificial intelligence learning data by using at least one or more artificial intelligence learning algorithms. Here, the learning algorithm may be a RandomForest or XGBoost algorithm that performs learning as an ensemble by combining a plurality of decision trees. Through this, the model generator 1300 can technically solve the insufficient art data as an artificial intelligence learning algorithm. In addition, the model generator 1300 may secure the prediction accuracy for the value of the artwork and improve the reliability.

본원의 일 실시예에 따른 가격 예측부(1400)는 가격 예측 모델을 이용하여 입력된 미술품의 가격을 예측할 수 있다. 이를 통해, 미술품 가격 예측 장치(1000)는 전통적 가격 예측 방법의 낮은 정확도와 전문가에 의존하는 미술품 경매 낙찰 추정가의 정성적 기법의 한계를 극복할 수 있다. The price prediction unit 1400 according to an embodiment of the present application may predict the price of the input art work by using the price prediction model. Through this, the art price prediction apparatus 1000 can overcome the low accuracy of the traditional price prediction method and the limitation of the qualitative technique of the art auction successful bid estimate relying on experts.

도 2는 본원의 일 실시예에 따른 도1의 학습데이터 생성부(1200) 내 해당하는 개략적인 시스템이다.2 is a schematic system corresponding to the learning data generation unit 1200 of FIG. 1 according to an embodiment of the present application.

도 2는 추출부(1210)는 미술품 정보를 이용하여 복수의 미술품 각각의 가격에 영향을 주는 복수의 인자를 추출할 수 있다. 여기서 인자는 가격의 영향을 미치는 요소일 수 있다. 즉 복수의 인자는 가격의 영향을 미치는 복수의 요소일 수 있다. 예를 들어, 복수의 인자 중 하나인 작가의 경력은 높을수록 인자에 영향을 미치는 가격의 상관관계도가 높을 수 있다. In FIG. 2 , the extraction unit 1210 may extract a plurality of factors affecting the price of each of the plurality of artworks by using the artwork information. Here, the factor may be a factor influencing the price. That is, the plurality of factors may be a plurality of factors affecting the price. For example, as the career of an artist, which is one of the plurality of factors, increases, the degree of correlation between prices affecting the factors may be high.

일 예로, 복수의 인자는 작가 생존 유무, 작가의 기존 미술품 개수, 미술품 공개 기간, 작가의 미술품들의 우선순위, 작가 경력, 작가의 미술품들의 가격 변화범위, 작가의 전공, 작가의 수상경력, 작가의 인지도 지수, 전시된 미술관의 등급, 관련 기사 정보, 미술품 크기 및 전시된 미술관의 방문객 정보 중 적어도 하나 이상을 포함할 수 있다. 여기서 복수의 인자는 가격에 영향을 줄 때, 추출 될 수 있다. 또한, 복수의 인자 각각은 가격에 영향을 높게 영향을 줄수록, 각각의 가중치 값은 올라갈 수 있다. 복수의 인자 중 대표적인 예를 들어 작가가 살아있지 못했을 경우, 미술품의 가격은 작품의 희소성으로 상승할 수 있다. 또한, 대표적인 일례로, 미술품의 가격은 통상적으로 크기에 비례하여 상승할 수 있다. 또한, 작가의 인지도는 올라갈수록 미술품의 수요의 상승으로 인해 가격이 상승할 수 있다. 또한, 미술품의 가격은 동일한 미술품의 개수가 중복적으로 존재할 경우 하락할 수 있다. 다른 복수의 인자들은 이와 비슷한 이유로 추출될 수 있다. For example, the plurality of factors include the existence of an artist, the number of the artist's existing artworks, the period of art publication, the priority of the artist's artworks, the artist's career, the price change range of the artist's artworks, the artist's major, the artist's award history, the artist's It may include at least one of a recognition index, a grade of an exhibited museum, related article information, a size of an artwork, and visitor information of the displayed museum. Here, a plurality of factors can be extracted when influencing the price. In addition, as each of the plurality of factors has a high influence on the price, the respective weight values may increase. As a representative example of the plurality of factors, if the artist is not alive, the price of the artwork may rise due to the rarity of the artwork. In addition, as a representative example, the price of a work of art may generally increase in proportion to its size. In addition, as the artist's awareness increases, the price may rise due to an increase in the demand for artworks. Also, the price of the artwork may decrease when the number of the same artwork is duplicated. A plurality of other factors may be extracted for similar reasons.

생성부(1220)는 미술품 정보 및 복수의 인자에 기초하여 복수의 미술품에 대응하는 인공지능 학습데이터를 생성할 수 있다. 여기서, 인공지능 학습데이터는 복수의 인자를 기반으로 한 학습데이터 일 수 있다.The generator 1220 may generate artificial intelligence learning data corresponding to the plurality of artworks based on the artwork information and the plurality of factors. Here, the artificial intelligence learning data may be learning data based on a plurality of factors.

생성부(1220)는 복수의 인자 중에도 가격과 상관이 높은 인자의 경우, 상대적으로 가격과 상관이 높은 인자를 높은 중요도로 가질 수 있다. 예를 들어, 복수의 인자 중 작가 생존 유무는 작가의 기존 미술품 개수보다 높은 중요도로 부여될 수 있다. 생성부(1220)는 복수의 인자의 중요도를 고려함으로써 보다 정확한 미술품 가격 예측 모델을 구축할 수 있다.In the case of a factor having a high correlation with a price among a plurality of factors, the generator 1220 may have a factor having a relatively high correlation with the price with high importance. For example, among the plurality of factors, whether the artist is alive or not may be assigned a higher importance than the number of the artist's existing works of art. The generator 1220 may build a more accurate art price prediction model by considering the importance of a plurality of factors.

구체적으로, 생성부(1220)는 복수의 인자들 중 적어도 둘 이상 상호간의 중요도를 결정하고, 결정한 중요도에 기초하여 복수의 미술품에 대응하는 인공지능 학습데이터를 생성할 수 있다. 생성부(1220)는 산출된 각 특징의 작가의 관련성을 기반으로 복수개의 특징의 순위를 산출하고, 산출된 순위를 기반으로 복수개의 특징 중 상위 미리 정해진 개수의 특징을 복수의 주요 특징으로서 선택할 수 있다. 달리 표현해, 생성부(1220)는 복수개의 특징 중 작가와 관련성에 관한 값이 큰 순으로 미리 정해진 개수의 특징을 복수의 주요 특징으로서 선택할 수 있다.Specifically, the generator 1220 may determine the mutual importance of at least two or more of the plurality of factors, and generate artificial intelligence learning data corresponding to the plurality of artworks based on the determined importance. The generator 1220 may calculate a rank of a plurality of features based on the calculated relevance of the author of each feature, and select a top predetermined number of features from among the plurality of features as a plurality of main features based on the calculated rank. have. In other words, the generating unit 1220 may select a predetermined number of features as the plurality of main features in the order of the greatest value related to the author among the plurality of features.

생성부(1220)는 복수의 인자들 각각에 양의 가중치 또는 음의 가중치를 적용하고, 적용 결과에 따라 복수의 인자들 중 적어도 둘 이상 상호간의 중요도를 결정하고, 결정한 중요도에 기초하여 복수의 미술품에 대응하는 인공지능 학습데이터를 생성할 수 있다. 예를 들어, 상관성 분포도가 가격과 인지도에서 정비례 관계를 이룰 수록, 생성부(1220)는 가중치 값을 올릴 수 있다. 이외에도 가중치의 중요도는 상관성 분포도가 비례적인 관계가 띄워진 만큼 높일 수 있다. 반대로 가중치 값은 상관성 분포도가 산재한 만큼, 낮아 질 수 있다. 이 때, 가격과 각각의 관계성을 띄는 상관성 분포도는 앞서 언급한 정비례 관계, 산재 관계 외에도 존재할 수 있다.The generator 1220 applies a positive weight or a negative weight to each of the plurality of factors, determines the mutual importance of at least two or more of the plurality of factors according to the application result, and determines a plurality of artworks based on the determined importance. It is possible to generate artificial intelligence learning data corresponding to For example, as the correlation distribution is directly proportional to the price and the awareness, the generator 1220 may increase the weight value. In addition, the importance of the weight can be increased as much as the correlation distribution is proportional to the relationship. Conversely, the weight value may be lowered as the correlation distribution is scattered. In this case, the correlation distribution chart showing each relationship with the price may exist in addition to the aforementioned direct proportional relationship and scattering relationship.

도 3은 본원의 일 실시예에 따른 도2의 생성부(1220) 내에 해당하는 개략적인 시스템이다. 3 is a schematic system corresponding to the generator 1220 of FIG. 2 according to an embodiment of the present application.

도 3을 참조하면, 생성부(1220)는 텍스트 처리부(1221), 이미지 처리부(1222), 학습데이터 처리부(1223)를 포함할 수 있다.Referring to FIG. 3 , the generation unit 1220 may include a text processing unit 1221 , an image processing unit 1222 , and a learning data processing unit 1223 .

본원의 일 이 실시예에 따른 텍스트 처리부(1221)는 미술품 정보에 포함된 텍스트를 수치화하여 제 1 특징을 생성할 수 있다. 텍스트는 미술품이나 작가와 관련된 문서일 수 있다. 예를 들어, 미리 설정된 기간에서 이중섭의 소에 관한 기사가 될 수 있다. 구체적으로, 제 1특징은 기간 건수에 비례하여 작가와 관련된 텍스트의 가중치를 결정된 것 일 수 있다. 예를 들어 가중치는 이중섭의 소에서 관련된 민족정신, 고단한 정서와 같은 키워드가 반복적으로 노출될 경우 이중섭 작가에 관한 가중치가 증가될 수 있다. 또는 키워드의 관한 가중치가 증가 될 수 있다. 다른 말로, 고단한 정서와 관련된 키워드의 가중치가 증가 될 수 있다.The text processing unit 1221 according to this embodiment of the present application may digitize the text included in the artwork information to generate the first feature. The text may be an artwork or a document related to an artist. For example, it may be an article about Lee Jung-seop's cattle in a preset period. Specifically, the first characteristic may be that the weight of the text related to the author is determined in proportion to the number of periods. For example, if keywords such as national spirit and hard emotion related to Lee Jung-seop's cow are repeatedly exposed, the weight of Lee Jung-seop writer may be increased. Alternatively, the weight related to the keyword may be increased. In other words, the weight of keywords related to hard emotions may be increased.

텍스트 수치화는 벡터화되는 것을 의미할 수 있다. 다른 말로, 텍스는 수치화는 원핫 매트릭스(One-Hot Matrix)를 의미할 수 있다. 또한, 텍스트 수치화는 Word2Vec로 벡터화되는 것을 의미 할 수 있다. 여기서, 텍스트 수치화는 미술품 정보에 포함된 텍스트를 벡터화 된 것을 의미할 수 있다. Text digitization may mean being vectorized. In other words, text may mean a one-hot matrix. Also, text digitization may mean being vectorized with Word2Vec. Here, the text digitization may mean that the text included in the artwork information is vectorized.

구체적으로, 텍스트 처리부(1221)는 추출한 단어에 대하여 텍스트 분석 알고리즘(예를 들어, TF-IDF기법)을 적용하여 각 단어를 벡터화할 수 있다. 일예로, 텍스트 처리부(1221)는 벡터 추출 패키지(예를 들어, Scikit-Learn의 feature_extraction의 서브패키지 중 하나인 feature_extraction.text)를 이용하여 주요 단어를 벡터화할 수 있다. 벡터 추출 패키지(예를 들어, Scikit-Learn의 feature_extraction의 서브패키지 중 하나인 feature_extraction.text)는 문서 전처리 클래스를 제공하며 이 클래스의 함수 중 빈도수를 기반으로 단어를 벡터화하는 함수인 텍스트 분석 함수(예를 들어, TfidfTransformer 함수)를 이용하여 주요 명사들을 벡터화할 수 있다.Specifically, the text processing unit 1221 may vectorize each word by applying a text analysis algorithm (eg, TF-IDF technique) to the extracted word. For example, the text processing unit 1221 may vectorize the main word using a vector extraction package (eg, feature_extraction.text, which is one of the subpackages of feature_extraction of Scikit-Learn). The vector extraction package (e.g. feature_extraction.text, one of the subpackages of feature_extraction by Scikit-Learn) provides a document preprocessing class, among the functions of this class, a text analysis function (e.g., a function that vectorizes words based on their frequency) For example, the TfidfTransformer function) can be used to vectorize the main nouns.

본원의 일 이 실시예에 따른 이미지 처리부(1222)는 미술품 정보에 포함된 이미지로부터 추출한 색상을 이용하여 제 2 특징을 생성할 수 있다. 여기서, 추출한 색상은 적색(R), 녹색(G), 청색(B)의 색상의 비율로 나타날 수 있다. 다른 말로, 추출한 색상은 주요 색상으로 구별되어 나타날 수 있다. 예시적으로, 청색과 녹색의 합은 청록색으로 나타날 수 있습니다. 구체적으로, 주요 색상은 적어도 빨강색, 주황색, 노랑색, 초록색, 파랑색, 남색, 보라색, 녹색, 청록색, 검정색, 회색, 자주색, 연두색, 다홍색, 주황색, 귤색, 노란연두색, 청록색, 바다색, 남보라색, 붉은보라색, 연지색 중 하나가 될 수 있다. The image processing unit 1222 according to this embodiment of the present application may generate the second feature by using a color extracted from an image included in the artwork information. Here, the extracted color may be represented by a ratio of colors of red (R), green (G), and blue (B). In other words, the extracted color may be distinguished from the main color. Illustratively, the sum of blue and green may appear as cyan. Specifically, the main colors are at least red, orange, yellow, green, blue, indigo, purple, green, turquoise, black, gray, purple, light green, magenta, orange, tangerine, yellowish green, turquoise, sea blue, and indigo purple. , can be one of reddish purple, light pink.

구체적으로, 이미지 처리부(1222) 미술품 정보에 포함된 이미지로부터 추출한 색상, 채도, 명도를 이용하여 제 2 특징을 생성할 수 있다. 여기서 채도는 특정한 색상의 진함의 정도를 의미할 수 있다. 다시 말해, 색상이 선명할수록 채도가 높다는 것을 의미할 수 있다. 구체적으로, HSV 공간에서 채도값은 적어도 적색(R), 녹색(G), 청색(B) 중 하나를 혼합한 최대 크기 값에서 적어도 적색(R), 녹색(G), 청색(B) 중 하나를 혼합한 최소 크기 값을 빼 적어도 적색(R), 녹색(G), 청색(B) 중 하나를 혼합한 최대 크기 값을 나눈 것을 의미할 수 있다. 또한, 여기서 명도는 밝은 정도를 의미 할 수 있다. 명도는 HSV 공간에서 적색(R), 녹색(G), 청색(B)의 합이 큰 정도를 의미할 수 있다. 예를 들어, HSV 공간에서 적색(R), 녹색(G), 청색(B)의 합이 100이면, 명도값은 검정색을 의미할 수 있다. Specifically, the image processing unit 1222 may generate the second feature by using the color, saturation, and brightness extracted from the image included in the artwork information. Here, the saturation may mean the degree of depth of a specific color. In other words, it may mean that the more vivid the color, the higher the saturation. Specifically, the saturation value in the HSV space is at least one of red (R), green (G), and blue (B) at the maximum value of mixing at least one of red (R), green (G), and blue (B). It may mean dividing the maximum size value obtained by mixing at least one of red (R), green (G), and blue (B) by subtracting the minimum size value obtained by mixing . Also, here, the brightness may mean a degree of brightness. The brightness may mean a degree to which the sum of red (R), green (G), and blue (B) is large in the HSV space. For example, if the sum of red (R), green (G), and blue (B) is 100 in the HSV space, the brightness value may mean black.

구체적으로, 이미지 처리부(1222)는 미술품 정보에 포함된 이미지로부터 추출한 복수의 색상들 각각의 우선순위에 기초하여 상기 복수의 색상들 중 적어도 하나 이상의 주요 색상을 선택하고, 선택한 적어도 하나의 주요 색상에 기초하여 제 2 특징을 생성할 수 있다. 여기서 우선순위는 추출한 색상 중 미리 설정된 기간에서 적어도 색상이 평균적으로 비율이 높은 복수의 기준으로 정해질 수 있다. 일 예로, 이미지 처리부(1222)는 미리 설정된 기간에서 평균적으로 밝은 계통의 색깔인 적어도 노랑색, 강색, 주황색, 연두색 중 하나가 적어도 어두운 계통인 검정색, 회색 보다 비율이 높을 경우 밝은 계통 색상에 가중치를 더 부여할 수 있다. 다른 말로 이미지 처리부(1222)는 밝은 계통 색상에 중요도를 더 높일 수 있다. 또는 이미지 처리부(1222)는 각 색상의 채도로서 평균적으로 높은 비율을 차지하는 색상의 복수의 채도에 가중치의 중요도를 다르게 할 수 있다. 또는 이미지 처리부(1222)는 각 색상의 명도로서 평균적으로 높은 비율을 차지하는 색상의 복수의 명도에 가중치의 중요도를 다르게 할 수 있다.Specifically, the image processing unit 1222 selects at least one primary color from among the plurality of colors based on the priority of each of the plurality of colors extracted from the image included in the art information, and selects at least one primary color. A second characteristic may be generated based on it. Here, the priority may be determined based on a plurality of criteria in which at least a color has a high average ratio in a preset period among the extracted colors. As an example, the image processing unit 1222 adds a weight to a light color when the ratio of at least one of at least yellow, strong color, orange, and yellow green, which are bright colors on average, is higher than black and gray, which are dark colors, in a preset period. can be given In other words, the image processing unit 1222 may further increase the importance of bright colors. Alternatively, the image processing unit 1222 may change the importance of a weight to a plurality of saturations of colors occupying a high average ratio as saturation of each color. Alternatively, the image processing unit 1222 may vary the importance of the weight to the brightness of a plurality of colors occupying an average high ratio as the brightness of each color.

이미지 처리부(1222)는 미술품 정보에 포함된 이미지의 복수의 픽셀들 각각의 색상을 이용하여 제 2 특징을 생성할 수 있다. 또한, 복수의 픽셀은 각각 채도, 명도, 색상의 관한 정보를 포함할 수 있다. 이 때 제 2특징은 채도, 명도, 색상에 관한 특징을 포함할 수 있다.The image processing unit 1222 may generate the second feature by using a color of each of a plurality of pixels of an image included in the artwork information. Also, each of the plurality of pixels may include information regarding saturation, brightness, and color. In this case, the second characteristic may include characteristics related to saturation, brightness, and hue.

구체적으로, 이미지 처리부(1222)는 추출한 복수의 색상들의 히스토그램에 기초하여 우선순위를 결정할 수 있다. 여기서 히스토그램은 추출한 색상을 수치화하여 미리 설정된 기간에서 막대그래프로서 시각화하여 나타날 수 있다. 다른 말로, 추출한 색상의 히스토그램은 복수의 미술품에서 각각의 픽셀에 대응하는 색상으로 막대그래프로 통해 나타낸 그림일 수 있다. Specifically, the image processing unit 1222 may determine the priority based on the histogram of the plurality of colors extracted. Here, the histogram may be visualized as a histogram in a preset period by digitizing the extracted color. In other words, the histogram of the extracted color may be a picture represented by a histogram with a color corresponding to each pixel in a plurality of artworks.

본원의 일 이 실시예에 따른 학습데이터 처리부(1223)는 인공지능 학습데이터를 생성할 수 있다. 여기서 인공지능 학습데이터는 제1특징 및 제2특징을 정규화를 한 데이터 일 수 있다. 구체적으로, 학습데이터 처리부(1223)는 최대-최소 정규화(min-max normalization)를 통해 모델 별로 회귀계수 벡터 또는 특징 중요도 벡터를 정규화할 수 있다. 여기서 정규화는 인공지능 학습데이터의 벡터의 크기를 0과 1사이로 매핑할 수 있다. The learning data processing unit 1223 according to this embodiment of the present application may generate artificial intelligence learning data. Here, the artificial intelligence learning data may be data obtained by normalizing the first characteristic and the second characteristic. Specifically, the learning data processing unit 1223 may normalize the regression coefficient vector or the feature importance vector for each model through min-max normalization. Here, normalization can map the size of the vector of AI training data between 0 and 1.

구체적으로 학습데이터 처리부(1223)는 주요 특징을 선택하기 위해 복수개의 특징 선택 모델을 통해 특징(feature) 선택에 영향을 미치는 특징 (feature)의 계수 또는 특징(feature) 중요도 벡터를 추출할 수 있다. 달리 표현하면, 학습데이터 처리부(1223)는 복수개의 특징 선택 모델을 이용해 복수개의 특징 각각의 회귀계수 벡터 또는 특징에 관한 중요도 벡터를 추출할 수 있다. 여기서, 복수개의 특징 선택 모델로는, 일예로 적어도 RandomForest, XGBoost 중 하나가 사용될 수 있다.Specifically, the learning data processing unit 1223 may extract a coefficient of a feature or a feature importance vector that affects feature selection through a plurality of feature selection models in order to select a main feature. In other words, the learning data processing unit 1223 may extract a regression coefficient vector of each of the plurality of features or an importance vector related to the feature by using the plurality of feature selection models. Here, as the plurality of feature selection models, for example, at least one of RandomForest and XGBoost may be used.

구체적으로, 트리 기반 회귀 모델은 모델 생성에 있어서 특징 (feature)들의 중요도를 결과로 도출하도록 생성될 수 있다. 이에 따르면, 트리 기반 회귀 모델의 경우에는 Tree 객체의 'feature_importance_' 필드를 통해 특징 (feature)들의 중요도 확인(즉, 특징 중요도 벡터의 추출)이 이루어질 수 있다.Specifically, a tree-based regression model can be generated to derive the importance of features in model creation as a result. Accordingly, in the case of the tree-based regression model, the importance of features (ie, extraction of feature importance vectors) may be confirmed through the 'feature_importance_' field of the Tree object.

본원의 일 이 실시예에 따른 모델 생성부(1300)는 제 1 인공지능 알고리즘을 이용하여 인공지능 학습데이터로부터 제 1 가격 예측 모델을 생성하되, 제 2 인공지능 알고리즘을 이용하여 인공지능 학습데이터로부터 제 2 가격 예측 모델을 생성할 수 있다. 구체적으로 제1인공지능 알고리즘 및 제 2인공지능 알고리즘은 적어도RandomForest, XGBoost 중 하나로 생성될 수 있다. 예를 들어, 랜덤 포레스트(Random Forest) 모델은, 각 노드(node)마다 특징(feature)을 랜덤하게 추출하여 서브 트리(sub tree)를 만들고, 이 중에서 최선의 결과값을 찾는 인공지능 알고리즘 모델을 의미할 수 있다. 랜덤 포레스 트 모델은 서로 다른 특징(feature)으로 오버피팅(overfitting)된 트리를 앙상블함으로써, 결정 트리(Decision Tree)의 고유 성질인 오버피팅(overfitting)을 회피할 수 있다. 랜덤 포레스트 모델은 최대 특징(max_feature) 파라미터를 통해 랜덤으로 추출한 특징(feature)의 개수를 제한할 수 있다. 이에 따르면, 최대 특징(max_feature) 값이 클수록, 각 서브 트리(sub tree)는 서로 비슷해지며, 가장 두드러진 특징(feature)을 가진 데이터 예측에 용이하기 적용될 수 있다. 한편, 최대 특징(max_feature)값이 작을수록, 서브 트리들이 서로 달라지며, 각 트리는 예측을 위해 깊이가 깊어질 수 있다.The model generating unit 1300 according to this embodiment of the present application generates a first price prediction model from artificial intelligence learning data using a first artificial intelligence algorithm, but from artificial intelligence learning data using a second artificial intelligence algorithm. A second price prediction model may be generated. Specifically, the first artificial intelligence algorithm and the second artificial intelligence algorithm may be generated as at least one of RandomForest and XGBoost. For example, a random forest model creates a sub tree by randomly extracting features from each node, and an artificial intelligence algorithm model that finds the best result among them. can mean The random forest model can avoid overfitting, which is an inherent property of a decision tree, by ensembles overfitting trees with different features. The random forest model may limit the number of features randomly extracted through a maximum feature (max_feature) parameter. According to this, as the value of the maximum feature (max_feature) increases, each sub tree becomes similar to each other, and can be easily applied to data prediction having the most prominent feature. Meanwhile, as the maximum feature (max_feature) value is smaller, sub-trees are different from each other, and each tree may have a deeper depth for prediction.

구체적으로 모델 생성부(1300)는 복수의 인자 중 제 1그룹에 포함된 적어도 하나의 인자에 기초하여 복수의 미술품에 대응하는 제 1인공지능 학습데이터를 생성하고, 복수의 인자 중 제 2 그룹에 포함된 적어도 하나의 인자에 기초하여 복수의 미술품에 대응하는 제 2인공지능 학습데이터를 생성할 수 있다. 예를 들어, 제1인공지능 학습데이터는 복수의 인자 중 중요도로 미리 정해진 개수로 기반으로 한 적어도 작가 생존 유무, 미술품 공개기간, 작가의 미술품들의 우선순위, 작가 경력 중 하나가 될 수 있다. 또한, 제 2인공지능 학습데이터는 복수의 인자 중 중요도로 미리 정해진 개수로 기반으로 적어도 작가 생존 유무, 전시된 미술관의 등급, 관련 기사 정보, 미술품 크기, 전시된 미술관의 방문객 정보 중 하나가 될 수 있다. Specifically, the model generating unit 1300 generates first AI learning data corresponding to the plurality of artworks based on at least one factor included in the first group among the plurality of factors, and assigns the first AI learning data to the second group of the plurality of factors. Second AI learning data corresponding to the plurality of artworks may be generated based on the included at least one factor. For example, the first AI learning data may be one of at least the existence of an artist based on a predetermined number of importance among a plurality of factors, an art publication period, a priority of the artist's artworks, and an artist's career. In addition, the second artificial intelligence learning data may be at least one of the existence of an artist, the grade of the displayed museum, related article information, the size of the artwork, and visitor information of the displayed museum based on a predetermined number of importance among a plurality of factors. have.

모델 생성부(1300)는 결정한 중요도에 기초하여 인공지능 학습데이터로부터 가격 예측 모델을 생성할 수 있다. 구체적으로 모델 생성부(1300)는 트리의 각각의 노드에서 정보 획득량을 최대화하는 방향으로 가격 예측 모델을 생성할 수 있다. 즉, 복수의 모델에 각각에서 복수의 중요한 특징 순으로 정보 획득량이 최대화를 할 수 있다. 예를 들어, 제1노드는 작가의 사후가 될 수 있다. 예를 들어 살아있는 작가와 살아있지 못한 작가의 관한 노드가 될 수 있다. 제 2노드는 작가의 전공이 될 수 있다. 작가의 전공이 미술인지 아닌지에 관한 노드가 될 수 있다. 이외에의 노드들도 복수의 인자의 특징에 따라 모델이 생성 될 수 있다. 여기서 복수의 노드는 중요도를 우선 순위로 정해질 수 있다. 또한, 모델 생성부(1300)는 앞선 언급한 복수의 모델을 생성함으로써 앙상블을 할 수 있다. The model generator 1300 may generate a price prediction model from the AI training data based on the determined importance. In more detail, the model generator 1300 may generate a price prediction model in a direction that maximizes the amount of information obtained at each node of the tree. That is, the amount of information acquisition can be maximized in the order of a plurality of important features in each of the plurality of models. For example, the first node may be after the author's death. For example, it could be a node about living and non-living writers. The second node can be the writer's major. It can be a node as to whether the artist's major is art or not. In addition to the nodes, a model can be created according to the characteristics of a plurality of factors. Here, the plurality of nodes may be prioritized based on importance. Also, the model generating unit 1300 may perform an ensemble by generating a plurality of the aforementioned models.

모델 생성부(1300)는 인공지능 알고리즘을 이용하여 제 1인공지능 학습데이터로부터 제 1가격 예측 모델을 생성하되, 인공지능 알고리즘을 이용하여 제 2인공지능 학습데이터로부터 제 2 가격 예측 모델을 생성할 수 있다. 구체적으로, 모델 생성부(1300)는 제 1가격 예측 모델과 제 2가격 예측 모델을 앙상블(Ensemble) 할 수 있다. 예를 들어 모델 생성부(1300)는 제1가격 예측 모델의 예측값과 제 2가격의 예측 모델의 동일한 값이 중복으로 나온 값을 선택 할 수 있다. 다른 예시로, 모델 생성부(1300)는 제1가격 예측 모델의 예측값과 제 2가격의 예측 모델의 평균값을 선택 할 수 있다. The model generator 1300 generates a first price prediction model from the first AI learning data using an AI algorithm, and generates a second price prediction model from the second AI training data using the AI algorithm. can Specifically, the model generator 1300 may ensemble the first price prediction model and the second price prediction model. For example, the model generator 1300 may select a value in which the predicted value of the first price prediction model and the same value of the prediction model of the second price are duplicated. As another example, the model generator 1300 may select an average value of the prediction value of the first price prediction model and the prediction model of the second price.

가격 예측부(1400)는 제 1 가격 예측 모델의 제 1 예측 결과와 제 2 가격 예측 모델의 제 2 예측 결과를 비교하여 미술품의 가격을 예측할 수 있다. 또는, 제 1예측 결과의 미술품 가격와 제 2예측 결과의 미술품의 가격의 평균값을 의미 할 수 있다. The price prediction unit 1400 may predict the price of the artwork by comparing the first prediction result of the first price prediction model and the second prediction result of the second price prediction model. Alternatively, it may mean an average value of the price of the artwork of the first prediction result and the price of the artwork of the second prediction result.

가격 예측부(1400)는 실제값과 예측값의 차이 제곱의 합(MSE, Mean Squared Error)를 최소로 하는 모델로 생성되며, 일반적으로 예측된 변수의 분산 비율의 비(R-square)로 모델의 성능이 측정될 수 있다.The price prediction unit 1400 is generated as a model that minimizes the sum of the squares of the difference between the actual value and the predicted value (MSE, Mean Squared Error), and is generally generated as a ratio of the variance ratio of the predicted variable (R-square) of the model. Performance can be measured.

이하에서는 상기에 자세히 설명된 내용을 기반으로, 본원의 동작 흐름을 간단히 살펴보기로 한다.Hereinafter, based on the details described above, the operation flow of the present application will be briefly reviewed.

도 4는 본원의 일 실시예에 따른 인공지능 기반의 미술품 가격 예측 방법의 흐름도이다.4 is a flowchart of an artificial intelligence-based art price prediction method according to an embodiment of the present application.

단계 S11에서, 미술품 가격 예측 장치(1000)는 복수의 미술품 각각의 이미지, 작가 및 가격을 포함하는 미술품 정보를 수집할 수 있다.In step S11, the art price prediction apparatus 1000 may collect art information including images, artists, and prices of each of a plurality of art works.

단계 S12에서, 미술품 가격 예측 장치(1000)는 미술품 정보에 기초하여 복수의 미술품에 대응하는 인공지능 학습데이터를 생성할 수 있다. In step S12 , the art price prediction apparatus 1000 may generate artificial intelligence learning data corresponding to a plurality of works of art based on the art information.

단계 S13에서, 미술품 가격 예측 장치(1000)는 적어도 하나 이상의 인공지능 학습 알고리즘을 이용하여 인공지능 학습데이터로부터 가격 예측 모델을 생성할 수 있다.In step S13 , the art price prediction apparatus 1000 may generate a price prediction model from AI learning data using at least one AI learning algorithm.

단계 S14에서, 미술품 가격 예측 장치(1000)는 가격 예측 모델을 이용하여 입력된 미술품의 가격을 예측할 수 있다.In step S14 , the art price prediction apparatus 1000 may predict the price of the input art work using the price prediction model.

도 5는 본원의 일 실시예에 따른 인공지능 기반의 미술품 가격 예측의 일 예다. 예를 들어 미술품 가격 예측 장치(1000)는 도 5에 있는 이중섭의 소 작품을 2021년 기준으로 2022년에 47억으로 예측할 수 있다. 또한, 미술품 가격 예측 장치(1000)는 다른 시점에서도 원하는 시점의 가격을 예측 할 수 있다. 5 is an example of artificial intelligence-based art price prediction according to an embodiment of the present application. For example, the art price prediction apparatus 1000 may predict the small work of Lee Jung-seop in FIG. 5 at 4.7 billion in 2022 based on 2021. In addition, the art price prediction apparatus 1000 may predict the price at a desired point in time from another point of view.

전술한 미술품 가격 예측 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.The above-described art price prediction method may be implemented in the form of a computer program or application executed by a computer stored in a recording medium.

본원의 일 실시예에 따른 미술품 가격 예측 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본원을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본원의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. The art price prediction method according to an embodiment of the present application may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present application, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations herein, and vice versa.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present application is for illustration, and those of ordinary skill in the art to which the present application pertains will understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and likewise components described as distributed may also be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present application.

1000: 미술품 가격 예측 장치
1100: 정보 수집부
1200: 학습데이터 생성부
1300: 모델 생성부
1400: 가격 예측부
1210: 추출부
1220: 생성부
1221: 텍스트 수치화부
1222: 이미지 처리부
1223: 학습데이터 처리부1000: art price prediction device
1100: information collection unit
1200: training data generation unit
1300: model generation unit
1400: price prediction unit
1210: extraction unit
1220: generator
1221: text digitization unit
1222: image processing unit
1223: learning data processing unit

Claims

A device for predicting the price of a work of art, comprising:
an information collection unit that collects artwork information including images, artists, and prices of each of the plurality of artworks;
a learning data generating unit for generating artificial intelligence learning data corresponding to the plurality of works of art based on the art information;
a model generator for generating a price prediction model from the artificial intelligence learning data by using at least one artificial intelligence learning algorithm; and
and a price prediction unit for predicting the price of the inputted art using the price prediction model,
The learning data generation unit,
an influence factor extracting unit for extracting a plurality of factors affecting the price of each of the plurality of artworks by using the artwork information; and
A generator for generating artificial intelligence learning data corresponding to the plurality of artworks based on the artwork information and the plurality of preset factors,
The generating unit,
A text processing unit generating a first characteristic by vectorizing the text included in the artwork information, an image processing unit generating a second characteristic using a color extracted from an image included in the artwork information, the first characteristic and the second characteristic Based on that, which includes a learning data processing unit for generating artificial intelligence learning data, an art price prediction device.

delete

The method of claim 1,
The generating unit,
Determining the mutual importance of at least two or more of the plurality of factors, and generating artificial intelligence learning data corresponding to the plurality of artworks based on the determined importance, the art price prediction device.

delete

The method of claim 1,
The image processing unit selects at least one primary color from among the plurality of colors based on the priority of each of the plurality of colors extracted from the image included in the artwork information, and selects at least one primary color from the selected at least one primary color. generating a feature.

7. The method of claim 6,
The image processing unit will determine the priority based on the histogram of the extracted plurality of colors, art price prediction apparatus.

The method of claim 1,
The image processing unit generates a second feature by using the color, saturation, and brightness extracted from the image included in the artwork information.

The method of claim 1,
The image processing unit generates a second feature by using the color of each of a plurality of pixels of the image included in the art information price prediction apparatus.

The method of claim 1,
The model generation unit,
A first price prediction model is generated from the artificial intelligence learning data using a first artificial intelligence algorithm,
Using a second artificial intelligence algorithm to generate a second price prediction model from the artificial intelligence learning data, art price prediction device.

The method of claim 1,
The generating unit,
First AI learning data corresponding to the plurality of artworks is generated based on at least one factor included in the first group among the plurality of factors, and at least one factor included in the second group among the plurality of factors Generates second artificial intelligence learning data corresponding to the plurality of works of art based on
The model generation unit,
Generating a first price prediction model from the first artificial intelligence learning data using the artificial intelligence learning algorithm,
Using the artificial intelligence learning algorithm to generate a second price prediction model from the second artificial intelligence learning data, the art price prediction device.

12. The method of claim 11,
The price prediction unit,
The apparatus for predicting the price of the artwork by comparing the first prediction result of the first price prediction model with the second prediction result of the second price prediction model.

The method of claim 1,
The generator applies a positive weight or a negative weight to each of the plurality of factors, determines the mutual importance of at least two of the plurality of factors according to the application result, and determines the mutual importance of at least two of the plurality of factors based on the determined importance. An art price prediction device that generates artificial intelligence learning data corresponding to the artwork.

14. The method of claim 13,
The model generation unit,
Based on the determined importance to generate a price prediction model from the artificial intelligence learning data, art price prediction device.

14. The method of claim 13,
The plurality of factors include the existence of an artist, the number of existing artworks of the artist, the period of art publication, the priority of the artist's artworks, the artist's career, the price change range of the artist's artworks, the artist's major, the artist's award history, and the artist's recognition index , which includes at least one of a rating of an exhibited art museum, related article information, an art size, and visitor information of the exhibited art museum, an art price prediction device.

As a step of predicting the price of the artwork performed by the art price prediction device,
collecting artwork information including images, artists, and prices of each of the plurality of artworks;
generating artificial intelligence learning data corresponding to the plurality of artworks based on the artwork information;
generating a price prediction model from the artificial intelligence learning data using at least one artificial intelligence learning algorithm; and
Art price prediction step of predicting the price of the input art work using the price prediction model
including,
The step of generating the learning data includes:
an influence factor extraction step of extracting a plurality of factors affecting the price of each of the plurality of artworks by using the artwork information; and
A generating step of generating artificial intelligence learning data corresponding to the plurality of artworks based on the artwork information and the plurality of factors,
The generating step is
A text processing step of generating a first characteristic by vectorizing the text included in the artwork information, an image processing step of generating a second characteristic using a color extracted from an image included in the artwork information, the first characteristic and the second A method of predicting the price of a work of art, comprising a learning data processing step of generating artificial intelligence learning data based on the features.

In a computer recording a program for executing the method of claim 16 on a computer
readable recording medium.