KR102363961B1

KR102363961B1 - Analysis method based on product image, apparatus and program

Info

Publication number: KR102363961B1
Application number: KR1020210103186A
Authority: KR
Inventors: 이기원; 김서영; 양희; 김도희
Original assignee: 재단법인차세대융합기술연구원; 서울대학교 산학협력단
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2022-02-16

Abstract

Disclosed is an analysis device based on a product image. The analysis device includes a data collection unit, a text extraction unit, and one or more processors for pre-processing the extracted text. By providing the analysis device, market research for market entry can be effectively performed.

Description

Analysis method, device and program based on product image {ANALYSIS METHOD BASED ON PRODUCT IMAGE, APPARATUS AND PROGRAM}

본 발명은 제품 이미지 데이터에서 텍스트나 이미지를 추출하여 제품의 영양 성분, 기능 및 인증 관련 정보를 분석하는 장치에 관한 것이다.The present invention relates to an apparatus for extracting text or images from product image data to analyze nutritional components, functions, and authentication-related information of products.

제품을 개발하거나 판매할 때, 해당 제품에 관련된 시장 조사는 필연적이다. 전통적인 시장조사 방법은 직접 유통 채널을 방문하는 것인데, 시간 및 비용 부담이 상당하다. When developing or selling a product, market research related to the product is inevitable. The traditional method of market research is to visit distribution channels directly, which is time-consuming and costly.

인터넷 통신 기술의 발전에 힘입어, 온라인 쇼핑몰 사이트를 통해 제품 정보를 직접 수집하는 시장조사 방법은 직접 방문 방식보다는 시간 및 비용의 부담이 적으나, 검색/수집 인력이 필요하여, 분석대상 제품의 개수가 늘어나면 시간 및 비용 부담이 증가한다는 문제가 있다.Thanks to the development of Internet communication technology, the market research method that directly collects product information through an online shopping mall site takes less time and cost than the direct visit method, but requires a search/collection manpower, so the number of products to be analyzed There is a problem that the burden of time and cost increases as the number increases.

이에, 보다 효율적으로 제품에 관련된 시장 조사를 수행하는 방법이 필요하다. Accordingly, there is a need for a method for conducting market research related to a product more efficiently.

해당 방법을 도출하기 위해, 농식품부 산하 국가 R&D 프로젝트 <수출연구사업단 지원을 위한 공통기술 연구>의 일환으로, “및 자연어처리 기반의 수출 시장 분석” 연구가 수행되었다.In order to derive the method, as part of the national R&D project under the Ministry of Agriculture, Food and R&D <Research on Common Technology to Support the Export Research Project Group>, a study on “Export Market Analysis based on Natural Language Processing and Natural Language Processing” was conducted.

대한민국 공개특허공보 제10-2021-0007775호(공개일 : 2021.01.20)Republic of Korea Patent Publication No. 10-2021-0007775 (published date: 2021.01.20)

상술한 바와 같은 문제점을 해결하기 위한 본 발명은 제품 이미지 데이터에서 텍스트를 추출 및 가공하여, 가공된 텍스트에서 영양 성분, 기능 및 인증 등에 관련된 분석 정보를 제공하는 데에 있다. The present invention for solving the problems described above is to extract and process text from product image data, and to provide analysis information related to nutritional components, functions, and authentication from the processed text.

또한, 본 발명은 제품 이미지 데이터에서 이미지를 추출 및 가공하여, 가공된 이미지에서 인증 마크를 인식하는 방법을 제공하는 데에 있다.In addition, the present invention is to provide a method of extracting and processing an image from product image data, and recognizing a certification mark in the processed image.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명에 따른 제품 이미지 기반의 분석 장치는 하나 이상의 제품 이미지 데이터를 수집하는 데이터 수집부, 상기 데이터 수집부를 통해 수집된 제품 이미지 데이터로부터 텍스트를 추출하는 텍스트 추출부 및 추출된 상기 텍스트를 전처리하는 하나 이상의 프로세서를 포함할 수 있다.A product image-based analysis device according to the present invention for solving the above-described problems includes a data collection unit for collecting one or more product image data, a text extraction unit for extracting text from product image data collected through the data collection unit, and extraction It may include one or more processors for pre-processing the text.

상기 프로세서는, 불용어 제거, 표제어 추출 등 통상적으로 사용되는 기본적인 텍스트 전처리를 모두 수행할 수 있다.The processor may perform all of the commonly used basic text preprocessing, such as removing stopwords and extracting headwords.

상기 프로세서는, 상기 전처리된 텍스트에 대해, 기 저장된 데이터 풀(Pool)에 기초하여, 하나 이상의 영양 성분 정보를 포함하는 영양 성분 리스트, 하나 이상의 기능 정보를 포함하는 기능 리스트 및 하나 이상의 인증 정보를 포함하는 인증 리스트를 생성할 수 있다.The processor includes, for the preprocessed text, based on a pre-stored data pool, a nutritional component list including one or more nutritional component information, a function list including one or more function information, and one or more authentication information You can create a list of certifications that

상기 프로세서는, 생성된 적어도 하나의 리스트에 기초하여, 상기 영양 성분 정보, 상기 기능 정보, 상기 인증 정보 중 중복을 허용하여 선택된 둘 이상의 정보 간 상관 관계를 결정하도록 구성될 수 있다.The processor may be configured to determine a correlation between two or more pieces of information selected by allowing redundancy among the nutritional component information, the functional information, and the authentication information, based on the generated at least one list.

상기 텍스트 추출부는, OCR(Optical Character Recognition) 기반으로 제품 이미지 데이터로부터 텍스트를 추출할 수 있다.The text extractor may extract text from product image data based on OCR (Optical Character Recognition).

상기 추출된 텍스트에 대한 전처리를 수행하는 프로세서는 추출된 텍스트를 단어 또는 문구 단위로 구분하며, 구분된 단어 또는 문구가 기 저장된 표준 사전에 포함된 단어 또는 문구인지 확인하며, 구분된 단어 또는 문구의 사용 빈도수가 소정 빈도수 이하인 경우, 해당 단어 또는 문구를 제외할 수 있다.A processor performing pre-processing on the extracted text divides the extracted text into word or phrase units, checks whether the separated word or phrase is a word or phrase included in a pre-stored standard dictionary, and When the frequency of use is less than or equal to a predetermined frequency, the corresponding word or phrase may be excluded.

또한, 프로세서는 추출된 텍스트의 품사(가령, 동사/형용사/명사 등)를 구분하는 전처리를 수행할 수 있다.Also, the processor may perform preprocessing for classifying parts of speech (eg, verbs/adjectives/nouns, etc.) of the extracted text.

아울러, 프로세서는 의미 없는 특수 문자를 제외하는 전처리를 수행할 수 있다. In addition, the processor may perform preprocessing for excluding meaningless special characters.

상기 프로세서는 구분된 단어가 표준 사전에 포함되지 않더라도, 상기 구분된 단어 또는 문구가 소정의 전문가 그룹이 사용하는 영양 성분 정보, 기능 정보 및 인증 정보 중 적어도 하나에 포함된 경우, 해당 구분된 단어 또는 문구를 해당 정보에 매핑되는 해당 리스트에 포함하도록 구성될 수 있다.Even if the divided word is not included in the standard dictionary, the processor determines that the divided word or phrase is included in at least one of nutritional component information, functional information, and authentication information used by a predetermined expert group, the divided word or The phrase may be configured to be included in the corresponding list mapped to the corresponding information.

상술한 과제를 해결하기 위한 본 발명의 일 실시 예에 따른 제품 이미지 기반의 분석 장치는 하나 이상의 제품 이미지 데이터를 수집하는 데이터 수집부를 통해 수집된 제품 이미지 데이터로부터 이미지를 추출하는 이미지 추출부 및 추출된 상기 이미지를 전처리하는 하나 이상의 프로세서를 포함할 수 있다.Product image-based analysis apparatus according to an embodiment of the present invention for solving the above-described problems is an image extraction unit for extracting an image from product image data collected through a data collection unit for collecting one or more product image data, and the extracted It may include one or more processors to pre-process the image.

상기 프로세서는 입력된 이미지에 포함된 인증 마크를 인식하도록 기 학습된 인증 마크 인식 모델에 기초하여, 상기 전처리된 이미지에 포함된 인증 마크를 인식할 수 있다.The processor may recognize the authentication mark included in the pre-processed image based on a previously trained authentication mark recognition model to recognize the authentication mark included in the input image.

상기 추출된 이미지에 대한 전처리를 수행하는 프로세서는, 상기 추출된 이미지에서 소정의 제품 카테고리와 관련 없는 광고 데이터를 제외하고, 제품 코드에 기초하여 중복 데이터를 제외하며, 소정 이하의 이미지 화질인 이미지를 제외할 수 있다.The processor performing pre-processing on the extracted image excludes advertisement data irrelevant to a predetermined product category from the extracted image, excludes duplicate data based on the product code, and returns an image having image quality of a predetermined or less can be excluded.

상기 인증 마크 인식 모델은 인증 마크의 화질, 색상, 촬영 각도, 기울기, 일부 유실 정도 및 일부 왜곡 정도 중 적어도 하나가 서로 다른 훈련 이미지 데이터를 이용하여 학습될 수 있다.The authentication mark recognition model may be learned using training image data in which at least one of quality, color, photographing angle, tilt, partial loss, and partial distortion of the authentication mark is different from each other.

상기 프로세서는 인증 마크의 표현 복잡도에 기초하여, 인증 마크 별로 학습 데이터 구성이 변화되도록, 훈련 이미지 데이터를 상기 인증 마크 인식 모델에 제공할 수 있다. The processor may provide training image data to the authentication mark recognition model so that the configuration of the learning data is changed for each authentication mark based on the expression complexity of the authentication mark.

본 발명의 일 실시 예에 따른 제품 이미지 기반의 분석 장치는 디스플레이를 더 포함할 수 있다.The product image-based analysis apparatus according to an embodiment of the present invention may further include a display.

상기 프로세서는 제품 또는 제품 카테고리 별로, 영양 성분 정보, 기능 정보, 인증 정보 및 인증 마크를 통합하여 상기 디스플레이에 출력할 수 있다.The processor may integrate nutritional information, function information, authentication information, and certification mark for each product or product category and output the integrated information to the display.

또한, 상기 프로세서는 특정 제품 카테고리 별로, 취득이 필요 또는 추천되는 영양 정보, 기능 정보 및 인증 정보 중 적어도 하나를 상기 디스플레이에 출력할 수 있다.In addition, the processor may output, on the display, at least one of nutritional information, function information, and authentication information required or recommended to be acquired for each specific product category.

상기 프로세서는 하나 이상의 인증 정보가 포함된 경우, 상기 제품 이미지 데이터에서, 상기 인증 정보에 대응하는 인증 마크가 빈번하게 배치되는 영역의 이미지를 우선적으로 상기 인증 마크 인식 모델에 제공할 수 있다.When one or more pieces of authentication information are included, the processor may preferentially provide, in the product image data, an image of a region where an authentication mark corresponding to the authentication information is frequently disposed to the authentication mark recognition model.

상술한 과제를 해결하기 위한 본 발명에 따른 제품 이미지 기반의 분석 방법은 하나 이상의 제품 이미지 데이터를 수집하는 단계, 수집된 제품 이미지 데이터로부터 텍스트를 추출하는 단계, 추출된 상기 텍스트를 전처리하는 단계, 상기 전처리된 텍스트에 대해, 기 저장된 데이터 풀(Pool)에 기초하여, 하나 이상의 영양 성분 정보를 포함하는 영양 성분 리스트, 하나 이상의 기능 정보를 포함하는 기능 리스트 및 하나 이상의 인증 정보를 포함하는 인증 리스트를 생성하는 단계 및 생성된 적어도 하나의 리스트에 기초하여, 상기 영양 성분 정보, 상기 기능 정보, 상기 인증 정보 중 중복을 허용하여 선택된 둘 이상의 정보 간 상관 관계를 결정하는 단계를 포함할 수 있다.The product image-based analysis method according to the present invention for solving the above-described problems includes the steps of collecting one or more product image data, extracting text from the collected product image data, pre-processing the extracted text, the For the preprocessed text, based on the pre-stored data pool, a nutritional component list including one or more nutritional component information, a function list including one or more function information, and an authentication list including one or more authentication information are generated. and determining a correlation between two or more pieces of information selected by allowing redundancy among the nutritional component information, the functional information, and the authentication information based on the generated at least one list.

상술한 과제를 해결하기 위한 본 발명에 따른 제품 이미지 기반의 분석 방법은 하나 이상의 제품 이미지 데이터를 수집하는 단계, 수집된 제품 이미지 데이터로부터 이미지를 추출하는 단계, 추출된 상기 이미지를 전처리하는 단계 및 입력된 이미지에 포함된 인증 마크를 인식하도록 기 학습된 인증 마크 인식 모델에 기초하여, 상기 전처리된 이미지에 포함된 인증 마크를 인식하는 단계를 포함할 수 있다.The product image-based analysis method according to the present invention for solving the above-described problems includes the steps of collecting one or more product image data, extracting an image from the collected product image data, pre-processing the extracted image, and input The method may include recognizing the authentication mark included in the pre-processed image based on the previously trained authentication mark recognition model to recognize the authentication mark included in the image.

상술한 과제를 해결하기 위한 본 발명에 따른 컴퓨터 프로그램은, 하드웨어인 컴퓨터와 결합되어, 상기 제품 이미지 기반의 분석 방법을 수행하기 위해 매체에 저장된, 분석 장치의 제품 이미지 기반의 분석 방법을 제공할 수 있다.The computer program according to the present invention for solving the above problems is combined with a computer that is hardware, and stored in a medium to perform the product image-based analysis method, to provide a product image-based analysis method of an analysis device there is.

이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 장치, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, other methods, other devices, other systems for implementing the present invention, and a computer-readable recording medium for recording a computer program for executing the method may be further provided.

상기와 같은 본 발명에 따른 제품 이미지 기반의 분석 장치가 제공됨으로써, 시장 조사를 위한 시간 및 비용이 절약되어, 사용자 편의가 제고될 수 있다. By providing the product image-based analysis apparatus according to the present invention as described above, time and cost for market research can be saved, and user convenience can be improved.

아울러, 본 발명에 따르면, 온라인에서 제품 데이터가 자동으로 수집될 수 있고, 제품 포장에 기재된 정보가 신뢰할 수 있는 공공 데이터베이스를 이용하여 자동으로 추출됨으로 신뢰할 수 있는 제품 관련 정보가 자동으로 추출될 수 있으며, 추출된 데이터에 기반하여 분석이 수행되여 신제품 개발 및 마케팅 전략 도출에 활용될 수 있다. In addition, according to the present invention, product data can be automatically collected online, and reliable product-related information can be automatically extracted because the information described on product packaging is automatically extracted using a trusted public database. , analysis is performed based on the extracted data and can be used for new product development and marketing strategy derivation.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시 예에 따른 제품의 이미지에서 다양한 정보를 추출, 가공 및 분석하는 장치를 개략적으로 설명하기 위한 도면,
도 2는 본 발명의 일 실시 예에 따른 제품 이미지 기반의 분석 장치의 구성을 나타내는 블록도,
도 3은 본 발명의 일 실시 예에 따른 제품 이미지 기반의 분석 장치가 텍스트 전처리를 수행한 결과를 나타내고,
도 4는 본 발명의 일 실시 예에 따른 영양 성분 정보와 기능 정보의 상관 관계를 나타내는 제품 이미지 기반의 분석 장치를 나타내며,
도 5는 본 발명의 일 실시 예에 따른 레이블 정보를 이용하여 인증 마크 인식 모델의 인증 마크 인식률을 높이기 위한 방법을 나타내고,
도 6은 본 발명의 일 실시 예에 따른 제품 이미지 데이터로부터 추출된 텍스트를 이용한 분석 방법의 시퀀스도, 그리고,
도 7은 본 발명의 일 실시 예에 따른 제품 이미지 데이터로부터 추출된 이미지를 이용한 분석 방법의 시퀀스도이다.1 is a view for schematically explaining an apparatus for extracting, processing and analyzing various information from an image of a product according to an embodiment of the present invention;
2 is a block diagram showing the configuration of a product image-based analysis device according to an embodiment of the present invention;
3 shows a result of text pre-processing performed by the product image-based analysis apparatus according to an embodiment of the present invention;
4 shows a product image-based analysis device showing the correlation between nutritional information and functional information according to an embodiment of the present invention;
5 shows a method for increasing the authentication mark recognition rate of the authentication mark recognition model using label information according to an embodiment of the present invention;
6 is a sequence diagram of an analysis method using text extracted from product image data according to an embodiment of the present invention;
7 is a sequence diagram of an analysis method using an image extracted from product image data according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the present embodiments allow the disclosure of the present invention to be complete, and those of ordinary skill in the art to which the present invention pertains. It is provided to fully understand the scope of the present invention to those skilled in the art, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other components in addition to the stated components. Like reference numerals refer to like elements throughout, and "and/or" includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various elements, these elements are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first component mentioned below may be the second component within the spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein will have the meaning commonly understood by those of ordinary skill in the art to which this invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

설명에 앞서 본 명세서에서 사용하는 용어의 의미를 간략히 설명한다. 그렇지만 용어의 설명은 본 명세서의 이해를 돕기 위한 것이므로, 명시적으로 본 발명을 한정하는 사항으로 기재하지 않은 경우에 본 발명의 기술적 사상을 한정하는 의미로 사용하는 것이 아님을 주의해야 한다.Before the description, the meaning of the terms used in this specification will be briefly described. However, it should be noted that, since the description of the term is for the purpose of helping the understanding of the present specification, it is not used in the meaning of limiting the technical idea of the present invention unless explicitly described as limiting the present invention.

도 1은 본 발명의 일 실시 예에 따른 제품의 이미지에서 다양한 정보를 추출, 가공 및 분석하는 장치를 개략적으로 설명하기 위한 도면이며, 해당 장치를 제품 이미지 기반의 분석 장치(100)로 칭하기로 한다.1 is a diagram schematically illustrating an apparatus for extracting, processing, and analyzing various information from an image of a product according to an embodiment of the present invention, and the apparatus will be referred to as a product image-based analysis apparatus 100 .

제품 이미지 기반의 분석 장치(100)는 온라인 상에서 표시되는 제품 표면, 포장 등의 이미지에서 텍스트 또는 이미지를 추출할 수 있으며, 추출된 텍스트 또는 이미지를 가공(가령, 전처리 등)하여 분석에 사용할 수 있다.The product image-based analysis apparatus 100 may extract text or images from images such as a product surface and packaging displayed online, and may process (eg, pre-process, etc.) the extracted text or image and use it for analysis. .

여기서, 제품은 완제품에 한정되는 것은 아니며, 제품 이미지는 제품 표면/포장 이미지뿐만 아니라, 온/오프라인 상에서 제품을 나타내고 이미지로 표현된 것을 포함할 수 있다. 가령, 제품 이미지는 제품 자체의 이미지뿐만 아니라, 제품과 관련된 웹/모바일 페이지 자체 또는 웹/모바일 페이지에서 제품과 관련된 다양한 정보(리뷰 정보, 가격 정보, 평점 정보, 제품을 설명하기 위한 정보 등)를 포함할 수 있다.Here, the product is not limited to the finished product, and the product image may include not only a product surface/package image, but also a product that represents the product on/offline and is expressed as an image. For example, product images include not only the image of the product itself, but also the web/mobile page itself related to the product or various information related to the product on the web/mobile page (review information, price information, rating information, information to describe the product, etc.) may include

또한, 제품 이미지 데이터는 하나 이상의 제품 이미지를 포함하며, 빅 데이터로 구축되어 클라우드, 서버, 사용자 단말 등에 저장될 수 있다.In addition, product image data includes one or more product images, and may be built with big data and stored in a cloud, a server, a user terminal, and the like.

도 1을 참고하면, 제품 이미지 기반의 분석 장치(100)는 디스플레이(140)를 통해 제품(Ar)을 표시할 수 있으며, 표시된 제품(Ar)에서 OCR(Optical Character Recognition) 기반으로, 텍스트를 추출할 수 있다. 가령, 텍스트는 제품명(21), 영양 성분 정보(22), 기능 정보(23), 인증 정보(24), 제조사 정보(26) 등을 포함할 수 있다. 선택적 또는 부가적 실시 예로, 텍스트는 웹/모바일 페이지에서 제품과 관련된 다양한 정보(가령, 리뷰 정보, 가격 정보, 평점 정보, 제품을 설명하기 위한 정보 등)를 포함할 수 있다.Referring to FIG. 1 , the product image-based analysis device 100 may display a product Ar through the display 140 , and extract text from the displayed product Ar based on Optical Character Recognition (OCR). can do. For example, the text may include product name 21 , nutritional information 22 , function information 23 , authentication information 24 , manufacturer information 26 , and the like. In an optional or additional embodiment, the text may include various information related to the product on the web/mobile page (eg, review information, price information, rating information, information for describing the product, etc.).

제품 이미지 기반의 분석 장치(100)는 최적의 분석 프로세스를 수행하기 위해, 추출된 텍스트를 가공할 수 있다. The product image-based analysis apparatus 100 may process the extracted text to perform an optimal analysis process.

추출된 텍스트에 대한 전처리는 불용어 제거, 표제어 추출 등 통상적으로 사용되는 기본적인 텍스트 전처리를 모두 포함할 수 있다.The pre-processing of the extracted text may include all commonly used basic text pre-processing, such as removal of stopwords and extraction of headwords.

가령, 제품 이미지 기반의 분석 장치(100)는 추출된 텍스트를 단어 또는 문구 단위로 구분하는 전처리 프로세스를 수행할 수 있다. 여기서, 문구는 둘 이상의 단어를 포함하는 어절, 문장 등을 포함할 수 있다. 또한, 제품 이미지 기반의 분석 장치(100)는 추출된 텍스트의 품사(가령, 동사/형용사/명사 등)를 구분하는 전처리를 수행할 수 있다.For example, the product image-based analysis apparatus 100 may perform a pre-processing process of classifying the extracted text into words or phrases. Here, the phrase may include a word or sentence including two or more words. Also, the product image-based analysis apparatus 100 may perform pre-processing for classifying parts of speech (eg, verbs/adjectives/nouns, etc.) of the extracted text.

아울러, 제품 이미지 기반의 분석 장치(100)는 의미 없는 특수 문자를 제외하는 전처리를 수행할 수 있다. In addition, the product image-based analysis apparatus 100 may perform pre-processing of excluding meaningless special characters.

제품 이미지 기반의 분석 장치(100)는 전처리된 텍스트에서 영양 성분 정보, 기능 정보 및 인증 정보 등을 리스트로 생성하여, 분석 프로세스를 수행할 수 있다.The product image-based analysis apparatus 100 may perform an analysis process by generating a list of nutritional information, function information, and authentication information from the pre-processed text.

여기서, 제품 이미지 기반의 분석 장치(100)는 생성된 적어도 하나의 리스트에 기초하여, 영양 성분 정보, 기능 정보, 인증 정보 중 중복을 허용하여 선택된 둘 이상의 정보 간 상관 관계를 결정할 수 있다.Here, the product image-based analysis apparatus 100 may determine a correlation between two or more pieces of information selected by allowing duplication among nutritional information, function information, and authentication information, based on the at least one generated list.

가령, 제품 이미지 기반의 분석 장치(100)는 제품에 대해 영양 성분 정보 사이의 상관 관계, 기능 정보 사이의 상관 관계, 인증 정보 사이의 상관 관계, 영양 성분 정보와 기능 정보 사이의 상관 관계, 기능 정보와 인증 정보 사이의 상관 관계 및 영양 성분 정보와 인증 정보 사이의 상관 관계 중 적어도 하나를 결정할 수 있다.For example, the product image-based analysis apparatus 100 may include a correlation between nutritional information for a product, a correlation between functional information, a correlation between authentication information, a correlation between nutritional information and functional information, and functional information. and at least one of a correlation between the and authentication information and a correlation between the nutritional component information and the authentication information may be determined.

또한, 제품 이미지 기반의 분석 장치(100)는 제품 이미지의 인증 마크(25a, 25b)를 이미지 형태로 추출하고, 추출된 이미지가 어떤 인증 마크인지 인식할 수 있다. 이를 위해, 신경망 기반의 지도 학습(Supervised Learning)으로 학습된 모델이 사용될 수 있다.In addition, the product image-based analysis apparatus 100 may extract the authentication marks 25a and 25b of the product image in the form of an image, and recognize which authentication mark the extracted image is. For this, a model trained by neural network-based supervised learning may be used.

도 2는 본 발명의 일 실시 예에 따른 제품 이미지 기반의 분석 장치(100)의 구성을 나타내는 블록도이다.2 is a block diagram showing the configuration of the product image-based analysis apparatus 100 according to an embodiment of the present invention.

제품 이미지 기반의 분석 장치(100)는 데이터 수집부(110), 텍스트 추출부(120), 이미지 추출부(130), 디스플레이(140), 메모리(150) 및 프로세서(190)를 포함할 수 있다. 제품 이미지 기반의 분석 장치(100)는 상술한 구성을 일부만 포함하거나 다른 구성들을 더 포함할 수 있다.The product image-based analysis apparatus 100 may include a data collection unit 110 , a text extraction unit 120 , an image extraction unit 130 , a display 140 , a memory 150 , and a processor 190 . . The product image-based analysis apparatus 100 may include only some of the above-described components or may further include other components.

데이터 수집부(110)는 하나 이상의 제품 이미지 데이터를 수집할 수 있다. 데이터 수집부(110)는 다양한 인터페이스를 구비하여, 다양한 데이터를 수집할 수 있다.The data collection unit 110 may collect one or more product image data. The data collection unit 110 may have various interfaces to collect various data.

가령, 데이터 수집부(110)는 온라인 쇼핑몰 사이트로부터 특정 제품 카테고리(가령, 영유아 제품, 녹차 등)에 속하는 제품의 전후좌우(대표적으로, 전)의 표면 이미지를 빅 데이터로 수집할 수 있다.For example, the data collection unit 110 may collect surface images of front, back, left, and right (typically, before) of products belonging to a specific product category (eg, infant products, green tea, etc.) from the online shopping mall site as big data.

텍스트 추출부(120)는 데이터 수집부(110)를 통해 수집된 제품 이미지 데이터로부터 텍스트를 추출할 수 있다. 가령, 텍스트 추출부(120)는 OCR(Optical Character Recognition) 기반으로 제품 이미지 데이터로부터 텍스트를 추출할 수 있으나, 실시 예가 이에 국한되는 것은 아니다. 텍스트 추출부(120)는 올바르게 텍스트가 추출되었는지 확인할 수 있다.The text extraction unit 120 may extract text from the product image data collected through the data collection unit 110 . For example, the text extraction unit 120 may extract text from product image data based on OCR (Optical Character Recognition), but the embodiment is not limited thereto. The text extraction unit 120 may check whether the text has been correctly extracted.

이미지 추출부(130)는 하나 이상의 제품 이미지 데이터를 수집하는 데이터 수집부(110)를 통해 수집된 제품 이미지 데이터로부터 이미지를 추출할 수 있다.The image extraction unit 130 may extract an image from the product image data collected through the data collection unit 110 that collects one or more product image data.

선택적 실시 예로, 이미지 추출부(130)는 제품 이미지 데이터의 전체 영역 및 부분 영역 각각의 이미지들을 추출할 수 있다.In an optional embodiment, the image extraction unit 130 may extract images of each of the entire region and the partial region of the product image data.

디스플레이(140)는 다양한 데이터를 프로세서(190)의 제어에 따라 화면 상에 표시할 수 있으며, 백라이트 기반의 디스플레이 또는 자발광 소자를 탑재한 디스플레이로 구현될 수 있다.The display 140 may display various data on the screen under the control of the processor 190 , and may be implemented as a backlight-based display or a display equipped with a self-luminous device.

메모리(150)는 다양한 형태의 저장소를 통칭하는 모듈이며, 메모리(150)는 인공 지능, 머신 러닝, 인공 신경망을 이용하여 연산을 수행하는데 필요한 정보를 저장할 수 있다. 메모리(150)는 다양한 학습 모델을 저장할 수 있는데, 상기 학습 모델들은 학습 데이터가 아닌 새로운 입력 데이터에 대하여 결과 값을 추론해 내는데 사용될 수 있고, 추론된 값은 어떠한 동작을 수행하기 위한 판단의 기초로 이용될 수 있다. The memory 150 is a module that collectively refers to various types of storage, and the memory 150 may store information necessary to perform an operation using artificial intelligence, machine learning, and artificial neural networks. The memory 150 may store various learning models. The learning models may be used to infer a result value with respect to new input data other than the learning data, and the inferred value is used as a basis for a decision to perform a certain operation. can be used

상기 학습 모델들은 레이블(Label) 정보에 기초하여, 학습이 수행될 수 있으며, 학습 정확도를 높이기 위해, 역전파(Backpropagation) 알고리즘 등이 사용될 수 있다. 본 명세서에서, 메모리(150)는 제품 이미지 데이터로부터 추출된 이미지에서 인증 마크를 인식하는 인증 마크 인식 모델(MM)을 저장할 수 있다.The learning models may be trained based on label information, and a backpropagation algorithm may be used to increase learning accuracy. In this specification, the memory 150 may store the authentication mark recognition model (MM) for recognizing the authentication mark in the image extracted from the product image data.

프로세서(190)는 하나 이상으로 구현될 수 있으며, 단수로 표현하더라도 복수로 간주될 수 있다. 프로세서(190)는 제품 이미지 기반의 분석 장치(100)의 구성들을 컨트롤하는 모듈이며, 프로세서(190)는 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 프로세서(190)는 인공 지능 연산을 수행하기 위한 러닝 프로세서를 별도로 구비하거나, 자체적으로 러닝 프로세서를 구비할 수 있다.The processor 190 may be implemented as one or more, and even if expressed in a singular number, it may be regarded as a plurality. The processor 190 is a module that controls the components of the product image-based analysis device 100, and the processor 190 has a circuit physically structured to perform a function expressed as a code or instruction included in a program, It may mean a data processing device built into hardware. As an example of the data processing apparatus embedded in the hardware as described above, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated (ASIC) circuit) and a processing device such as a field programmable gate array (FPGA), but the scope of the present invention is not limited thereto. The processor 190 may separately include a learning processor for performing artificial intelligence operations, or may have a learning processor itself.

프로세서(190)는 추출된 상기 텍스트를 전처리할 수 있다. 이에 따라, 검색 속도 및 정확도가 향상될 수 있다. The processor 190 may pre-process the extracted text. Accordingly, the search speed and accuracy may be improved.

또한, 프로세서(190)는 불용어 제거, 표제어 추출 등 통상적으로 사용되는 기본적인 텍스트 전처리를 모두 수행할 수 있다.In addition, the processor 190 may perform all of the commonly used basic text preprocessing, such as removing stopwords and extracting headwords.

또한, 프로세서(190)는 추출된 텍스트를 단어 또는 문구 단위로 구분하며, 구분된 단어 또는 문구가 기 저장된 표준 사전에 포함된 단어(또는 문구)인지 확인할 수 있다. 표준 사전은 다양한 제품 카테고리 별로 구비될 수 있다.In addition, the processor 190 may classify the extracted text in units of words or phrases, and check whether the divided words or phrases are words (or phrases) included in a pre-stored standard dictionary. The standard dictionary may be provided for each of various product categories.

또한, 프로세서(190)는 구분된 단어 또는 문구의 사용 빈도수가 소정 빈도수 이하인 경우, 해당 단어 또는 문구를 제외할 수 있다. 소정 빈도수는 사용 빈도 1% 미만일 수 있으나, 실시 예가 이에 국한되는 것은 아니다.Also, when the frequency of use of the divided word or phrase is less than or equal to a predetermined frequency, the processor 190 may exclude the word or phrase. The predetermined frequency may be less than 1% of the frequency of use, but the embodiment is not limited thereto.

프로세서(190)는 구분된 단어 또는 문구가 표준 사전에 포함되지 않더라도, 상기 구분된 단어 또는 문구가 소정의 전문가 그룹이 사용하는 영양 성분 정보, 기능 정보 및 인증 정보 중 적어도 하나에 포함된 경우, 해당 구분된 단어 또는 문구를 후술할 영양 성분 리스트, 기능 리스트 및 인증 리스트에 포함할 수 있다. The processor 190 determines that even if the divided word or phrase is not included in the standard dictionary, when the divided word or phrase is included in at least one of nutritional information, function information, and authentication information used by a predetermined expert group, the corresponding The separated words or phrases may be included in the nutritional ingredient list, function list, and certification list to be described later.

프로세서(190)는 전처리된 텍스트에 대해, 기 저장된 데이터 풀(Pool)에 기초하여, 하나 이상의 영양 성분 정보를 포함하는 영양 성분 리스트, 하나 이상의 기능 정보를 포함하는 기능 리스트 및 하나 이상의 인증 정보를 포함하는 인증 리스트를 생성할 수 있다.The processor 190 includes a nutritional component list including one or more nutritional component information, a function list including one or more function information, and one or more authentication information for the pre-processed text, based on a pre-stored data pool. You can create a list of certifications that

선택적 실시 예로, 프로세서(190)는 공공 영양 데이터베이스(가령, USDA 등)를 활용하여 수집된 전체 데이터에 포함된 영양 성분 리스트를 생성할 수 있다. 가령, 공공 영양 데이터베이스는 추후 제품화 전략에 도움을 주기 위해, 국가 별로 농식품부에서 제공하는 영양 표시가 가능한 성분들을 포함할 수 있다.In an optional embodiment, the processor 190 may utilize a public nutrition database (eg, USDA, etc.) to generate a nutritional component list included in the collected overall data. For example, the public nutrition database may include ingredients that can be labeled with nutrition provided by the Ministry of Agriculture, Food and Rural Affairs for each country in order to help with future commercialization strategies.

또한, 프로세서(190)는 공공 기능성 데이터 베이스를 활용하여 수집된 전체 데이터에 포함된 기능 리스트를 생성할 수 있다. 가령, 프로세서(190)는 기능 관련 정보, 영양 성분 정보를 포함하는 MesH terms 와 같은 자료를 이용할 수 있다.In addition, the processor 190 may generate a function list included in the collected total data by utilizing the public functionality database. For example, the processor 190 may use data such as MesH terms including function-related information and nutritional component information.

프로세서(190)는 다양한 데이터 베이스(가령, 소정 전문가 그룹이 사용하는 표준 사전 또는 연구자가 정리한 인증 관련 리스트 등)에서 수집된 전체 데이터에 포함된 인증 리스트를 생성할 수 있다.The processor 190 may generate an authentication list included in all data collected from various databases (eg, a standard dictionary used by a predetermined expert group or an authentication-related list organized by a researcher).

가령, 프로세서(190)는 인증 관련 단어(가령, non-GMO, GMO 등)가 표준 사전에 없더라도 영양 성분-인증 간의 상관 관계를 파악하기 위해, 해당 단어를 추출하고, 제외하지 않을 수 있다.For example, the processor 190 may extract the corresponding word and not exclude it in order to determine the correlation between the nutritional component and the certification even if the certification-related word (eg, non-GMO, GMO, etc.) does not exist in the standard dictionary.

프로세서(190)는 생성된 적어도 하나의 리스트에 기초하여, 영양 성분 정보, 기능 정보, 인증 정보 중 중복을 허용하여 선택된 둘 이상의 정보 간 상관 관계를 결정할 수 있다.The processor 190 may determine a correlation between two or more pieces of information selected by allowing redundancy among nutritional component information, functional information, and authentication information based on the at least one generated list.

가령, 프로세서(190)는 생성된 적어도 하나의 리스트에 기초하여, 영양 성분 정보 사이의 상관 관계, 기능 정보 사이의 상관 관계, 인증 정보 사이의 상관 관계, 영양 성분 정보와 기능 정보 사이의 상관 관계, 기능 정보와 인증 정보 사이의 상관 관계 및 영양 성분 정보와 인증 정보 사이의 상관 관계 중 적어도 하나를 결정할 수 있다.For example, the processor 190 may determine, based on the generated at least one list, a correlation between nutritional component information, a correlation between functional information, a correlation between authentication information, a correlation between nutritional component information and functional information, At least one of a correlation between the functional information and the authentication information and a correlation between the nutritional component information and the authentication information may be determined.

선택적 또는 부가적 실시 예로, 프로세서(190)는 영양 성분 정보, 기능 정보 및 인증 정보 사이의 상관 관계를 결정할 수 있다.In an optional or additional embodiment, the processor 190 may determine a correlation between the nutritional component information, the functional information, and the authentication information.

프로세서(190)는 특정 영양 성분을 강조할 때, 함께 언급되는 다른 영양 성분 정보를 파악할 수 있다. 가령, 프로세서(190)는 영유아 제품 카테고리에서 DHA 와 철분이 같이 언급될 확률, 철분이 사용되는 경우 DHA 도 언급되는 확률 등이 높으므로, 이를 함께 매핑할 수 있다.When a specific nutritional component is emphasized, the processor 190 may identify other nutritional component information mentioned together. For example, since the processor 190 has a high probability that DHA and iron are mentioned together in the infant product category, and a probability that DHA is also mentioned when iron is used, it can map them together.

본 명세서에서 사용하는 상관 관계라는 용어는 정보 사이의 관계 또는 연관성 규칙 등을 포괄적으로 표현하기 위한 용어이므로, 통계학 등에서 사용되는 상관 관계에 한정되는 것은 아니다.The term correlation used in this specification is a term for comprehensively expressing a relationship between information or a rule of association, and is not limited to correlation used in statistics.

프로세서(190)는 영양 성분 정보와 기능 정보의 상관 관계를 표현할 때, 특정 영양 성분의 효능 중에서 어떤 효능을 강조하는지 파악할 수 있다. 가령, 프로세서(190)는 미국 영유아 제품 카테고리 중에서 루테인은 눈 보다 뇌에 더 연관된 것을 파악하여, 이를 함께 매핑할 수 있다.When the correlation between the nutritional component information and the functional information is expressed, the processor 190 may determine which efficacy among the effects of a specific nutritional component is emphasized. For example, the processor 190 may recognize that lutein is more related to the brain than the eyes from among the product categories for infants and toddlers in the United States, and may map them together.

프로세서(190)는 특정 영양 성분을 강조할 때, 어떤 인증이 함께 언급되는지 파악할 수 있다. 가령, 프로세서(190)는 미량 영양소에 비해, 탄수화물, 지방, 단백질 등과 같은 사이즈의 영양소가 인증 관련 단어들과 언급되는 비율이 높음에 따라, 이를 함께 매핑할 수 있다.The processor 190 may determine which certification is referred to when emphasizing a particular nutritional component. For example, the processor 190 may map nutrients of a size such as carbohydrate, fat, protein, etc., compared to micronutrients, with the authentication-related words having a high rate of mention thereof.

즉, 프로세서(190)는 제품 이미지 데이터로부터 연관된 정보(영양 성분 정보, 기능 정보, 인증 정보 등)가 함께 병기된 것을 파악하고 다양한 데이터 베이스를 이용하여, 정보 간의 상관 관계를 결정할 수 있다.That is, the processor 190 may determine that the related information (nutritional information, function information, authentication information, etc.) is written together from the product image data, and determine the correlation between the information by using various databases.

한편, 프로세서(190)는 이미지 추출부(130)에서 추출된 이미지를 전처리할 수 있다. 프로세서(190)는 추출된 이미지에서 소정의 제품 카테고리와 관련 없는 광고 데이터를 제외하고, 제품 코드에 기초하여 중복 데이터를 제외하며, 소정 이하의 이미지 화질인 이미지를 제외할 수 있다.Meanwhile, the processor 190 may pre-process the image extracted by the image extraction unit 130 . The processor 190 may exclude advertisement data unrelated to a predetermined product category from the extracted image, exclude duplicate data based on a product code, and exclude images having image quality of a predetermined or less.

프로세서(190)는 입력된 이미지에 포함된 인증 마크를 인식하도록 기 학습된 인증 마크 인식 모델(MM)에 기초하여, 전처리된 이미지에 포함된 인증 마크를 인식할 수 있다.The processor 190 may recognize the authentication mark included in the pre-processed image based on the previously trained authentication mark recognition model (MM) to recognize the authentication mark included in the input image.

인증 마크 인식 모델(MM)은 하나 이상의 신경망 알고리즘을 포함할 수 있다. 가령, 인증 마크 인식 모델(MM)은 하나 이상의 CNN(Convolution Neural Network), DNN(Deep Neural Network) 등을 포함할 수 있다. 인증 마크 인식 모델(MM)은 학습을 위한 인증 마크를 표시하는 훈련 데이터를 다양하게 수집할 수 있으며, 해당 훈련 데이터와 함께 레이블(Label) 정보를 함께 구비하여, 모델의 인식 효율을 높일 수 있다.The authentication mark recognition model (MM) may include one or more neural network algorithms. For example, the authentication mark recognition model (MM) may include one or more Convolution Neural Networks (CNNs), Deep Neural Networks (DNNs), and the like. The authentication mark recognition model (MM) can collect various training data indicating the authentication mark for learning, and by providing label information together with the corresponding training data, it is possible to increase the recognition efficiency of the model.

인증 마크 인식 모델(MM)은 훈련시에, 화질, 색상 및 촬영 각도 중 적어도 하나가 서로 다른 훈련 이미지 데이터를 이용하여 인증 마크의 인식 확률을 높일 수 있다. 이때, 역전파(Backpropagation) 알고리즘이 사용될 수 있으며, 손실 함수의 값이 소정 목표치에 이르기까지 학습을 반복할 수 있다.The authentication mark recognition model (MM) may increase the recognition probability of the authentication mark by using training image data having different at least one of image quality, color, and shooting angle during training. In this case, a backpropagation algorithm may be used, and learning may be repeated until the value of the loss function reaches a predetermined target value.

선택적 실시 예로, 인증 마크 인식 모델(MM)은 구글사의 AutoML Vision 모델 기반으로 구현될 수 있다. 인증 마크 인식 모델(MM)은 추후 모델에 적용할 이미지 빅데이터와 유사하게 훈련 데이터를 수집할 수 있으며, 인증 마크 별 훈련 데이터 수에 대한 밸런스를 맞추도록 훈련될 수 있다. As an optional embodiment, the authentication mark recognition model (MM) may be implemented based on Google's AutoML Vision model. The authentication mark recognition model (MM) may collect training data similar to image big data to be applied to the model later, and may be trained to balance the number of training data for each authentication mark.

인증 마크 인식 모델(MM)은 인증 마크의 화질, 색상(흑백 포함), 촬영 각도, 기울기, 일부 유실 정도 및 일부 왜곡 정도 중 적어도 하나가 서로 다른 훈련 이미지 데이터를 이용하여 학습될 수 있다.인증 마크 인식 모델(MM)은 제품이 원통형으로 구성되어 인증 마크가 휘어지는 경우를 고려할 수 있다. 또한, 인증 마크 인식 모델(MM)은 제품 이미지가 50KB 이하인 경우, 훈련 데이터에서 제외할 수 있다.The authentication mark recognition model (MM) may be learned using training image data in which at least one of image quality, color (including black and white), shooting angle, tilt, partial loss, and partial distortion of the authentication mark is different. The recognition model (MM) may consider a case where the product is configured in a cylindrical shape and the certification mark is curved. In addition, the authentication mark recognition model (MM) may be excluded from the training data when the product image is 50 KB or less.

프로세서(190)는 인증 마크의 표현 복잡도에 기초하여, 인증 마크 별로 학습 데이터 구성이 변화되도록 훈련 이미지 데이터를 인증 마크 인식 모델(MM)에 제공할 수 있다. 이에 따라, 간단하게 구성된 인증 마크(가령, 단순한 문자와 단순한 도형의 결합 등)라도 수월하게 인식될 수 있다.The processor 190 may provide training image data to the authentication mark recognition model MM so that the configuration of the learning data is changed for each authentication mark based on the expression complexity of the authentication mark. Accordingly, even a simple authentication mark (eg, a combination of a simple character and a simple figure) can be easily recognized.

학습이 완료되면, 프로세서(190)는 하나 이상의 인증 정보가 포함된 경우, 상기 제품 이미지 데이터에서, 상기 인증 정보에 대응하는 인증 마크가 빈번하게 배치되는 영역의 이미지를 우선적으로 상기 인증 마크 인식 모델에 제공할 수 있다. 이에 따라, 이미지 데이터의 전체 영역을 검색하지 않고도 부분 영역만 검색할 수 있어서, 인증 마크 인식 속도가 향상될 수 있다.When learning is completed, when one or more authentication information is included, in the product image data, the processor 190 preferentially assigns an image of an area where an authentication mark corresponding to the authentication information is frequently disposed to the authentication mark recognition model. can provide Accordingly, it is possible to search only a partial area without searching the entire area of the image data, so that the authentication mark recognition speed can be improved.

한편, 프로세서(190)는 제품 또는 제품 카테고리 별로, 영양 성분 정보, 기능 정보, 인증 정보 및 인증 마크를 통합하여 상기 디스플레이에 출력할 수 있다. 이는, 프로세서(190)는 시장 조사의 결과를 일목요연하게 표현하여 필요한 사용자에게 제공할 수 있다.Meanwhile, the processor 190 may integrate nutritional component information, function information, authentication information, and authentication mark for each product or product category and output it on the display. In this case, the processor 190 may express the results of the market research at a glance and provide it to the necessary users.

또한, 프로세서(190)는 제품 전략을 위해, 특정 제품 카테고리 별로, 취득이 필요 또는 추천되는 영양 정보, 기능 정보 및 인증 정보 중 적어도 하나를 상기 디스플레이(140)에 출력할 수 있다. 이에 따라, 제품 출시를 목적으로 하는 개인/기업에 도움을 줄 수 있다.In addition, the processor 190 may output, on the display 140 , at least one of nutritional information required or recommended to be acquired, functional information, and authentication information for each specific product category for product strategy. Accordingly, it is possible to help individuals/companys aiming to launch a product.

프로세서(190)는 제품 이미지 데이터에 인증 마크가 없더라도, OCR로 추출된 텍스트와 비교하여, 인증 관련 텍스트 현황을 비교 분석할 수 있다.Even if there is no authentication mark in the product image data, the processor 190 may compare and analyze the authentication-related text status by comparing it with the text extracted by OCR.

선택적 실시 예로, 제품 이미지 기반의 분석 장치(100)는 상술한 구성들을 모두 포함하지 않고 텍스트 추출에 관련된 구성인 데이터 수집부, 텍스트 추출부, 메모리, 프로세서(110, 120, 150, 190)만 포함하거나, 이미지 추출에 관련된 구성인 데이터 수집부, 이미지 추출부, 메모리, 프로세서(110, 130, 150, 190)만 포함하도록 구현될 수 있다. 이 경우, 제품 이미지 기반의 분석 장치는 복수로 구현될 수 있으며, 필요에 따라 디스플레이를 포함한 다양한 구성이 포함될 수 있다.In an optional embodiment, the product image-based analysis apparatus 100 does not include all of the above-described components, but includes only the data collection unit, text extraction unit, memory, and processor 110, 120, 150, and 190, which are components related to text extraction. Alternatively, it may be implemented to include only the data collection unit, image extraction unit, memory, and processor 110 , 130 , 150 , 190 that are components related to image extraction. In this case, a plurality of product image-based analysis devices may be implemented, and various configurations including a display may be included as necessary.

선택적 또는 부가적 실시 예로, 제품 이미지 기반의 분석 장치(100)은 통신부를 더 포함하여, 다양한 단말로부터 제품 이미지 데이터를 수집하거나 분석된 결과에 관련된 정보를 통신부를 통해 제공할 수 있다.In an optional or additional embodiment, the product image-based analysis apparatus 100 may further include a communication unit to collect product image data from various terminals or provide information related to an analysis result through the communication unit.

도 3 내지 도 5는 본 발명의 일 실시 예에 따른 제품 이미지 기반의 분석 장치(100)의 구체적인 동작을 설명하기 위한 도면들이다.3 to 5 are diagrams for explaining a specific operation of the product image-based analysis apparatus 100 according to an embodiment of the present invention.

도 3은 본 발명의 일 실시 예에 따른 제품 이미지 기반의 분석 장치(100)가 텍스트 전처리를 수행한 결과를 나타낸다.3 shows a result of text preprocessing performed by the product image-based analysis apparatus 100 according to an embodiment of the present invention.

도 3을 참고하면, 제품 이미지 기반의 분석 장치(100)는 제품 이미지 데이터의 텍스트를 단어 또는 문구 단위로 구분할 수 있다. 제품 이미지 기반의 분석 장치(100)는 상당한 량의 제품 이미지 데이터에서 텍스트를 단어 또는 문구 단위로 구분할 수 있으며, 하나의 제품 이미지 데이터에서도 텍스트를 단어 또는 문구 단위로 구분하여 제공할 수 있다.Referring to FIG. 3 , the product image-based analysis apparatus 100 may classify text of product image data in units of words or phrases. The product image-based analysis apparatus 100 may classify text in units of words or phrases in a considerable amount of product image data, and may provide texts in units of words or phrases even in one product image data.

도 4는 본 발명의 일 실시 예에 따른 영양 성분 정보와 기능 정보의 상관 관계를 나타내는 제품 이미지 기반의 분석 장치(100)를 나타낸다.4 shows a product image-based analysis apparatus 100 showing a correlation between nutritional component information and functional information according to an embodiment of the present invention.

제품 이미지 기반의 분석 장치(100)는 장바구니 분석으로 도출된 연관 규칙을 표와 같이 표시할 수 있다. 제품 이미지 기반의 분석 장치(100)는 기능 정보가 어떤 효능과 주로 연관되는지, 영양 성분 간에 강한 상관 관계 또는 약한 상관 관계가 있는지, 이와 관련된 인증 정보는 무엇인지에 대한 정보를 제공할 수 있다.The product image-based analysis apparatus 100 may display the association rules derived from shopping cart analysis as in a table. The product image-based analysis apparatus 100 may provide information on which efficacy information is mainly associated with, whether there is a strong or weak correlation between nutritional components, and what kind of authentication information related thereto.

선택적 또는 부가적 실시 예로, 제품 이미지 기반의 분석 장치(100)는 영양 성분 정보에 기재된 일반 영양 성분의 함량이 어떻게 되는지 분석할 수 있으나, 실시 예가 이에 한정되는 것은 아니다.In an optional or additional embodiment, the product image-based analysis apparatus 100 may analyze how the content of the general nutritional component described in the nutritional component information is, but the embodiment is not limited thereto.

선택적 또는 부가적 실시 예로, 제품 이미지 기반의 분석 장치(100)는 식품의 경우 제품명, 식품유형, 판매 및 생산 기업명, 유통기한, 포장단위 및 중량, 원재료명, 영양성분 함량, 포장재질, 품목보고번호, 성분명, 보관방법, 주의사항 등을 분석할 수 있으며, 화장품의 경우, 제품명, 제품 정보, 사용방법, 성분명, 주의사항, 용량, 제조 및 판매 기업명, 제조번호, 사용기한 등을 분석할 수 있으나, 실시 예가 상술한 분석에 한정되는 것은 아니다.In an optional or additional embodiment, the product image-based analysis device 100 reports product name, food type, sales and production company name, expiration date, packaging unit and weight, raw material name, nutrient content, packaging material, and item in the case of food. Number, ingredient name, storage method, precautions, etc. can be analyzed, and in the case of cosmetics, product name, product information, usage method, ingredient name, precautions, capacity, manufacturing and sales company name, manufacturing number, expiration date, etc. can be analyzed. However, the embodiment is not limited to the above-described analysis.

도 5는 본 발명의 일 실시 예에 따른 레이블 정보를 이용하여 인증 마크 인식 모델(MM)의 인증 마크 인식률을 높이기 위한 방법을 나타낸다.5 shows a method for increasing the authentication mark recognition rate of the authentication mark recognition model (MM) using label information according to an embodiment of the present invention.

프로세서(190)는 인증 마크 인식 모델(MM)이 인식한 인증 마크에 대한 클래스 정보와 검증 데이터(정답)를 함께 비교하여, 정답 클래스를 생성할 수 있다.The processor 190 may generate a correct answer class by comparing the class information for the authentication mark recognized by the authentication mark recognition model (MM) with the verification data (correct answer).

도 6은 본 발명의 일 실시 예에 따른 제품 이미지 데이터로부터 추출된 텍스트를 이용한 분석 방법의 시퀀스도이다. 분석 방법의 각 단계는 프로세서(190)의 통합적인 제어에 의해 수행될 수 있다.6 is a sequence diagram of an analysis method using text extracted from product image data according to an embodiment of the present invention. Each step of the analysis method may be performed by the integrated control of the processor 190 .

먼저, 제품 이미지 기반의 분석 장치(100)는 제품 이미지 데이터를 수집한다(S610).First, the product image-based analysis apparatus 100 collects product image data (S610).

여기서, 제품 이미지 데이터는 빅 데이터로 구현될 수 있으며, 제품 이미지 기반의 분석 장치(100)의 메모리(150)에 저장될 수 있으며, 온라인의 다양한 저장 장소(가령, 클라우드)에 저장될 수 있다. 또한, 이미지 기반의 분석 장치(100)는 디바이스, 서버, 클라우드 시스템 등으로 구현될 수 있으나, 실시 예가 이에 국한되는 것은 아니다.Here, the product image data may be implemented as big data, may be stored in the memory 150 of the product image-based analysis device 100 , and may be stored in various online storage locations (eg, cloud). In addition, the image-based analysis apparatus 100 may be implemented as a device, a server, a cloud system, etc., but the embodiment is not limited thereto.

그 후에, 제품 이미지 기반의 분석 장치(100)는 텍스트를 추출하고 전처리한다(S620).After that, the product image-based analysis apparatus 100 extracts and pre-processes the text ( S620 ).

이미지 기반의 분석 장치(100)는 불용어 제거, 표제어 추출 등 통상적으로 사용되는 기본적인 텍스트 전처리를 모두 수행할 수 있다.The image-based analysis apparatus 100 may perform all of the commonly used basic text preprocessing, such as removal of stopwords and extraction of headwords.

그 다음, 제품 이미지 기반의 분석 장치(100)는 영양 성분 리스트, 기능 리스트 및 인증 리스트를 생성한다(S630).Next, the product image-based analysis apparatus 100 generates a nutritional component list, a function list, and a certification list (S630).

그 후에, 제품 이미지 기반의 분석 장치(100)는 생성된 적어도 하나의 리스트에 기초하여, 영양 성분 정보, 기능 정보, 인증 정보 중 중복을 허용하여 선택된 둘 이상의 정보 간 상관 관계를 분석한다(S640).After that, the product image-based analysis apparatus 100 analyzes the correlation between two or more pieces of information selected by allowing duplication among nutritional information, functional information, and authentication information based on the at least one generated list (S640) .

도 7은 본 발명의 일 실시 예에 따른 제품 이미지 데이터로부터 추출된 이미지를 이용한 분석 방법의 시퀀스도이다.7 is a sequence diagram of an analysis method using an image extracted from product image data according to an embodiment of the present invention.

먼저, 제품 이미지 기반의 분석 장치(100)는 인증 마크 인식 모델을 학습한다(S710).First, the product image-based analysis apparatus 100 learns the authentication mark recognition model (S710).

그 다음, 제품 이미지 기반의 분석 장치(100)는 제품 이미지 데이터를 수집한다(S720).Next, the product image-based analysis apparatus 100 collects product image data (S720).

그 후에, 제품 이미지 기반의 분석 장치(100)는 이미지 추출 및 전처리를 수행한다(S730).After that, the product image-based analysis apparatus 100 performs image extraction and pre-processing (S730).

그 다음, 학습된 인증 마크 인식 모델을 활용하여 인증 마크를 인식한다(S740).Next, the authentication mark is recognized by using the learned authentication mark recognition model (S740).

이상에서 전술한 본 발명의 일 실시예에 따른 방법은, 하드웨어인 컴퓨터와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다. 여기서, 컴퓨터는 앞에서 설명한 제품 이미지 기반의 분석 장치(100)일 수 있다.The method according to an embodiment of the present invention described above may be implemented as a program (or application) to be executed in combination with a computer, which is hardware, and stored in a medium. Here, the computer may be the product image-based analysis apparatus 100 described above.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, Python, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The above-mentioned program, in order for the computer to read the program and execute the methods implemented as a program, C, C++, Python, JAVA, which the processor (CPU) of the computer can read through the device interface of the computer; It may include code coded in a computer language such as machine language. Such code may include functional code related to a function defining functions necessary for executing the methods, etc., and includes an execution procedure related control code necessary for the processor of the computer to execute the functions according to a predetermined procedure. can do. In addition, the code may further include additional information necessary for the processor of the computer to execute the functions or code related to memory reference for which location (address address) in the internal or external memory of the computer to be referenced. there is. In addition, when the processor of the computer needs to communicate with any other computer or server located remotely in order to execute the above functions, the code uses the communication module of the computer to determine how to communicate with any other computer or server remotely. It may further include a communication-related code for whether to communicate and what information or media to transmit and receive during communication.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in relation to an embodiment of the present invention may be implemented directly in hardware, as a software module executed by hardware, or by a combination thereof. A software module may contain random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any type of computer-readable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.In the above, embodiments of the present invention have been described with reference to the accompanying drawings, but those of ordinary skill in the art to which the present invention pertains can realize that the present invention can be embodied in other specific forms without changing the technical spirit or essential features thereof. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

100 : 제품 이미지 기반의 분석 장치,
110 : 데이터 수집부,
120 : 텍스트 추출부,
130 : 이미지 추출부
140 : 디스플레이
150 : 메모리
190 : 프로세서100: product image-based analysis device;
110: data collection unit;
120: text extraction unit;
130: image extraction unit
140: display
150 : memory
190: processor

Claims

As a product image-based analysis device,
a data collection unit for collecting one or more product image data;
a text extraction unit for extracting text from the product image data collected through the data collection unit;
an image extraction unit for extracting an image from the product image data collected through the data collection unit; and
One or more processors for pre-processing the extracted text and images,
The processor is
For the pre-processed text, based on a pre-stored data pool including a predetermined nutritional database, a functional database, and an authentication-related database, a nutritional component list including one or more nutritional component information, one or more functions generating a function list including information and an authentication list including one or more authentication information;
Based on the generated at least one list, a correlation between two or more pieces of information selected by allowing redundancy among the nutritional information, the function information, and the authentication information is determined,
The processor is
Based on the previously trained authentication mark recognition model to recognize the authentication mark included in the input image, it is configured to recognize the authentication mark included in the pre-processed image,
The processor is
The extracted text is divided into words or phrases, and when the frequency of use of the divided word or phrase is less than or equal to a predetermined frequency, the word or phrase is excluded, and stopwords are removed;
Even if the divided word or phrase is not included in the standard dictionary, if the divided word or phrase is included in at least one of nutritional information, functional information, and authentication information used by a predetermined expert group, the divided word or include the phrase in that list that maps to that information;
and excluding advertisement data irrelevant to a predetermined product category from the extracted image, excluding duplicate data based on a product code, and excluding images with image quality below a predetermined value.

According to claim 1,
The text extraction unit,
Text is extracted from product image data based on OCR (Optical Character Recognition),
A processor performing pre-processing on the extracted text,
An analysis device for confirming whether the divided word or phrase is a word or phrase included in a pre-stored standard dictionary.

delete

According to claim 1,
The authentication mark recognition model is,
An analysis device in which at least one of image quality, color, shooting angle, tilt, partial loss, and partial distortion of the authentication mark is learned using different training image data.

8. The method of claim 7,
The processor is
An analysis apparatus, configured to provide training image data to the authentication mark recognition model so that the configuration of the learning data is changed for each authentication mark based on the expression complexity of the authentication mark.

According to claim 1,
further comprising a display;
The processor is
For each product or product category, the analysis device is configured to integrate nutritional information, function information, certification information and certification mark and output to the display.

10. The method of claim 9,
The processor is
The analysis device, configured to output, on the display, at least one of nutritional information, function information, and authentication information required or recommended to be acquired for each specific product category.

According to claim 1,
The processor is
and, when one or more pieces of authentication information are included, in the product image data, an image of an area in which an authentication mark corresponding to the authentication information is frequently disposed is preferentially provided to the authentication mark recognition model.

A product image-based analysis method performed by a processor, comprising:
collecting one or more product image data;
extracting text and images from the collected product image data;
pre-processing the extracted text and images;
For the pre-processed text, based on a pre-stored data pool including a predetermined nutritional database, a functional database, and an authentication-related database, a nutritional component list including one or more nutritional component information, one or more functions generating a function list including information and an authentication list including one or more pieces of authentication information; and
Based on the generated at least one list, allowing redundancy among the nutritional component information, the functional information, and the authentication information to determine a correlation between two or more pieces of information selected,
The processor is
Based on the previously trained authentication mark recognition model to recognize the authentication mark included in the input image, it is configured to recognize the authentication mark included in the pre-processed image,
The processor is
The extracted text is divided into words or phrases, and when the frequency of use of the divided word or phrase is less than or equal to a predetermined frequency, the word or phrase is excluded, and stopwords are removed;
Even if the divided word or phrase is not included in the standard dictionary, if the divided word or phrase is included in at least one of nutritional information, functional information, and authentication information used by a predetermined expert group, the divided word or include the phrase in that list that maps to that information;
and excluding advertisement data unrelated to a predetermined product category from the extracted image, excluding duplicate data based on a product code, and excluding images with image quality below a predetermined value.

delete

A computer program for providing a product image-based analysis method of an analysis device, which is stored in a computer-readable medium to perform the product image-based analysis method of claim 12 in combination with a computer that is hardware.