KR102221355B1

KR102221355B1 - Method and system for similarity evaluation of patent documents

Info

Publication number: KR102221355B1
Application number: KR1020200093205A
Authority: KR
Inventors: 송인석
Original assignee: 한국과학기술정보연구원
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2021-03-02

Abstract

The present invention is to implement a new method of classifying similar patents in a detailed and specialized way, which can calculate the similarity between patents based on the CPC code by utilizing not only the co-occurrence frequency of a CPC code granted to a patent, but also the characteristics of the CPC code, such as the code hierarchical relationship, code attributes (inventive/additional), etc.

Description

Similar patent classification method and similar patent classification system {METHOD AND SYSTEM FOR SIMILARITY EVALUATION OF PATENT DOCUMENTS}

본 발명은, 유사 특허를 분류하는 기술에 관한 것으로서, 더욱 상세하게는 특허 문서 간 유사도 측정을 기반으로 유사 특허를 분류하는 기술에 관한 것이다.The present invention relates to a technology for classifying similar patents, and more particularly, to a technology for classifying similar patents based on similarity measurement between patent documents.

지식재산권의 확보는 기업의 기술 경쟁력 확보를 위해 점점 더 중요해지고 있으며, 특히 지식재산권 중 특허는 기업의 핵심 기술 및 요소 기술을 포함하고 있기 때문에 특허 분석을 통한 기업 가치 측정 및 경쟁 기술 분야 분석 등의 연구가 활발히 진행되고 있다. Securing intellectual property rights is becoming more and more important for securing a company's technological competitiveness. In particular, since patents among intellectual property rights contain the company's core technology and elemental technology, it can be used to measure corporate value through patent analysis and analyze competitive technology fields. Research is actively underway.

이러한 특허 분석을 통한 연구를 위해서는, 수 많은 특허들 중 특정 특허에 대한 유사 특허를 분류해내는 등 특허 분류에 있어서, 정확하게 특허를 분류/선별해내는 기술이 선행되어야 한다. For research through such patent analysis, a technique for accurately classifying/selecting patents should be preceded in patent classification, such as classifying similar patents to specific patents among numerous patents.

이에 기존에는, 특허 문서 내 키워드 기반의 발생 빈도 및 출현 분석을 통한 유사도 산정 방식으로 특허를 분류/선별하는 기술이 사용되어 왔으나, 이는 어휘/구문구조 다의성으로 인해 특허 분류/선별의 정확도가 떨어지는 문제가 있다.Therefore, in the past, a technique for classifying/selecting patents by calculating the similarity through analysis of the occurrence frequency and appearance of keywords in patent documents has been used, but this is a problem that the accuracy of patent classification/selection is poor due to vocabulary/syntax structure versatility. There is.

한편, 최근에는, 특허에 최신의 기술 분야를 포함하고 상세한 기술 분류를 위한 선진특허분류(CPC: Cooperative Patent Classification)가 부여되고 있으며, 이러한 CPC 코드를 기반으로 한 특허 분류/선별하는 기술이 연구되고 있다.On the other hand, in recent years, advanced patent classification (CPC: Cooperative Patent Classification) for detailed technology classification has been granted to patents including the latest technology fields, and technology for classifying/selecting patents based on such CPC codes is being studied. have.

하지만, 현재까지 연구된 기술 수준에서는, CPC 코드가 갖는 특징을 고려 및 이를 기반으로 특허를 분류/선별하는 구체화된 기술이 미흡한 상황이다. However, at the level of technology studied so far, the specific technology of classifying/selecting patents based on and considering the characteristics of the CPC code is insufficient.

이에, 본 발명에서는, 특허에 부여되는 CPC 코드가 갖는 특징 예컨대 코드 속성, 코드 계층관계 등을 활용하여 CPC 코드를 기반으로 특허 간 유사도를 산정하는 세분화되고 구체화된 기술을 실현함으로써, 특허 분류/선별의 성능 및 정확도를 현저히 개선할 수 있는 방안을 제안하고자 한다.Accordingly, in the present invention, by realizing a subdivided and detailed technology that calculates the degree of similarity between patents based on the CPC code by utilizing features of the CPC code granted to the patent, such as code properties, code hierarchy, etc., patent classification/selection We would like to propose a method that can significantly improve the performance and accuracy of

본 발명은 상기한 사정을 감안하여 창출된 것으로서, 본 발명에서 도달하고자 하는 목적은, 특허에 부여되는 CPC 코드가 갖는 특징 예컨대 코드 속성, 코드 계층관계 등을 활용하여 CPC 코드를 기반으로 특허 간 유사도를 산정하는 세분화되고 구체화된 기술을 실현하는, 새로운 방식의 유사 특허 분류방법(방안)을 구현하는데 있다. The present invention was created in view of the above circumstances, and the object of the present invention is to achieve similarity between patents based on CPC codes by utilizing features of the CPC codes granted to patents, such as code attributes and code hierarchy. It is to implement a new method of classifying similar patents (plans) that realizes subdivided and detailed technology that calculates

상기 목적을 달성하기 위한 본 발명의 일 관점에 따른 유사 특허 분류방법은, 기준 특허의 CPC(Cooperative Patent Classification) 및 대상 특허의 CPC 코드로부터 정규화한 CPC 세트를 생성하여, 상기 생성한 CPC 세트를 근거로 상기 기준 특허 및 상기 대상 특허 간 기술 분야 유사도를 산출하는 기술분야유사도산출단계; 상기 기준 특허의 CPC 코드 및 상기 대상 특허의 CPC 코드로부터 공통 정규화 CPC 코드를 식별하고 상기 공통 정규화 CPC 코드로부터 상기 기준 특허의 CPC 코드까지의 코드 구간 및 상기 대상 특허의 CPC 코드까지의 코드 구간을 반영하여, 상기 기준 특허 및 상기 대상 특허 간 기술 속성 유사도를 산출하는 기술속성유사도산출단계; 및 상기 산출한 기술 분야 유사도 및 기술 속성 유사도를 근거로, 상기 기준 특허 및 상기 대상 특허 간 유사도를 측정하는 유사도측정단계를 포함한다.A similar patent classification method according to an aspect of the present invention for achieving the above object is based on the generated CPC set by generating a normalized CPC set from the CPC (Cooperative Patent Classification) of the reference patent and the CPC code of the target patent. A technical field similarity calculation step of calculating a technical field similarity between the reference patent and the target patent; A common normalized CPC code is identified from the CPC code of the reference patent and the CPC code of the target patent, and the code section from the common normalized CPC code to the CPC code of the reference patent and the code section from the CPC code of the target patent are reflected. Thus, a technology attribute similarity calculation step of calculating a technology attribute similarity between the reference patent and the target patent; And a similarity measuring step of measuring a similarity between the reference patent and the target patent based on the calculated similarity of the technology field and the similarity of the technology attribute.

구체적으로, 상기 기준 특허와 관련하여, 비교대상 특허군을 선정하는 선정단계를 더 포함하며; 상기 대상 특허는 상기 비교대상 특허군에 속해 있는 각 특허일 수 있다.Specifically, in relation to the reference patent, further comprising a selection step of selecting a group of patents to be compared; The target patent may be each patent belonging to the comparison target patent group.

구체적으로, 상기 기술분야유사도산출단계는, CPC 스키마(schema)를 근거로, 상기 기준 특허의 각 CPC 코드 및 상기 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화하고, 상기 기준 특허 및 상기 대상 특허 각각에 대하여, 상기 기준 특허 및 상기 대상 특허에서 상기 정규화한 정규화 CPC 코드 별로 특허 문서 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 상기 CPC 세트로서 생성하고, 상기 기준 특허 및 상기 대상 특허 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬을 근거로, 상기 기준 특허 및 상기 대상 특허 간 기술 분야 유사도를 산출할 수 있다. Specifically, in the technical field similarity calculation step, based on a CPC schema, each CPC code of the reference patent and each CPC code of the target patent are normalized to a main-group level, and the reference For each of the patent and the target patent, a co-occurrence frequency matrix indicating the number of simultaneous appearances in the patent document for each of the normalized CPC codes in the reference patent and the target patent is generated as the CPC set, and the reference patent and the target patent Based on a matrix of concurrent occurrence frequency for each normalized CPC code generated for each target patent, the degree of similarity in the technology field between the reference patent and the target patent may be calculated.

구체적으로, 상기 기술분야유사도산출단계는, 상기 기준 특허 및 상기 대상 특허 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬을 근거로, 정규화 CPC 코드 별로 상기 기준 특허 및 상기 대상 특허에서의 각 동시출현 빈도수를 곱한 값의 총 합, 상기 기준 특허에서 정규화 CPC 코드의 동시출현 빈도수 합, 상기 대상 특허에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하여, 상기 기준 특허 및 상기 대상 특허 간 기술 분야 유사도를 산출할 수 있다.Specifically, in the technical field similarity calculation step, based on the simultaneous occurrence frequency matrix for each normalized CPC code generated for each of the reference patent and the target patent, each of the reference patent and the target patent for each normalized CPC code Using the sum of the values multiplied by the number of appearances, the sum of the number of simultaneous occurrences of the normalized CPC code in the reference patent, and the sum of the number of simultaneous occurrences of the normalized CPC code in the target patent, the similarity of the technology field between the reference patent and the target patent is calculated. Can be calculated.

구체적으로, 상기 기술속성유사도산출단계는, CPC 스키마(schema)를 근거로, 상기 기준 특허의 각 CPC 코드 및 상기 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화하고, 상기 기준 특허 및 상기 대상 특허 각각에 대하여, 상기 정규화한 정규화 CPC 코드 별로 원본 CPC 코드 및 정규화 CPC 코드 사이에 존재하는 상위 구간의 코드로 구성되는 코드구간집합을 식별하고, 상기 기준 특허 및 상기 대상 특허 간 공통된 정규화 CPC 코드 별로, 정규화 CPC 코드를 기준으로 상기 기준 특허에 대해 식별한 코드구간집합 및 상기 대상 특허에 대해 식별한 코드구간집합을 이용하여 원본 CPC 코드 간 유사도를 측정하고, 상기 공통된 정규화 CPC 코드 별로 측정한 원본 CPC 코드 간 유사도를 반영하여, 상기 기준 특허 및 상기 대상 특허 간 기술 속성 유사도를 산출할 수 있다.Specifically, in the step of calculating the technology attribute similarity, based on a CPC schema, each CPC code of the reference patent and each CPC code of the target patent are normalized to a main-group level, and the reference For each of the patent and the target patent, for each of the normalized CPC codes, a code segment set consisting of codes of the upper segment existing between the original CPC code and the normalized CPC code is identified, and a common between the reference patent and the target patent For each normalized CPC code, the similarity between the original CPC codes is measured using the set of code segments identified for the reference patent based on the normalized CPC code and the set of code segments identified for the target patent, and for each of the common normalized CPC codes. By reflecting the degree of similarity between the measured original CPC codes, the degree of similarity of technology properties between the reference patent and the target patent may be calculated.

구체적으로, 상기 각 원본 CPC 코드 간 유사도는, 상기 공통된 정규화 CPC 코드 별로, 상기 기준 특허의 원본 CPC 코드 및 정규화 CPC 코드 사이에서 식별한 코드구간집합 및 상기 대상 특허의 원본 CPC 코드 및 정규화 CPC 코드 사이에서 식별한 코드구간집합에 대한 교집합 내 코드 개수와 합집합 내 코드 개수를 이용하여, 교집합 내 코드 개수가 많을수록 높은 유사도로 측정될 수 있다.Specifically, the degree of similarity between the original CPC codes is, for each of the common normalized CPC codes, a set of code segments identified between the original CPC code and the normalized CPC code of the reference patent and the original CPC code and the normalized CPC code of the target patent. Using the number of codes in the intersection and the number of codes in the union with respect to the code interval set identified in, the higher the number of codes in the intersection, the higher the degree of similarity can be measured.

구체적으로, 상기 기술속성유사도산출단계는, 상기 기준 특허 및 상기 대상 특허 각각에 대하여, 상기 기준 특허 및 상기 대상 특허에서 상기 정규화한 정규화 CPC 코드 별로 특허 문서 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 생성하고, 상기 기준 특허 및 상기 대상 특허 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬을 근거로, 상기 기준 특허에서 정규화 CPC 코드의 동시출현 빈도수 합, 상기 대상 특허에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하고, 상기 공통된 정규화 CPC 코드 별로 측정한 원본 CPC 코드 간 유사도를 반영하여, 상기 기준 특허 및 상기 대상 특허 간 기술 속성 유사도를 산출할 수 있다. Specifically, the step of calculating the technology attribute similarity is a co-occurrence frequency matrix indicating the number of simultaneous appearances in the patent document for each of the reference patent and the target patent, for each of the normalized CPC codes normalized in the base patent and the target patent. And, based on the simultaneous occurrence frequency matrix for each normalized CPC code generated for each of the reference patent and the target patent, the sum of the simultaneous occurrence frequencies of the normalized CPC codes in the reference patent, and the simultaneous occurrence of the normalized CPC code in the target patent. By using the sum of appearance frequency and reflecting the similarity between the original CPC codes measured for each of the common normalized CPC codes, the similarity of the technology attribute between the reference patent and the target patent may be calculated.

구체적으로, 상기 비교대상 특허군 내 속하는 대상 특허 별로 상기 유사도측정단계를 통해 측정된 상기 기준 특허와의 유사도를 근거로, 상기 비교대상 특허군에서 상기 기준 특허에 대한 유사특허를 분류하는 분류단계를 더 포함할 수 있다. Specifically, a classification step of classifying similar patents with respect to the reference patent in the comparison target patent group based on the similarity with the reference patent measured through the similarity measurement step for each target patent belonging to the comparison target patent group is performed. It may contain more.

구체적으로, 특허에 부여되는 CPC 코드는 제1구분값 또는 제2구분값으로 구분되며, 상기 기술분야유사도산출단계 및 상기 기술속성유사도산출단계는, 상기 기준 특허 및 상기 대상 특허 간 기술 분야 유사도 산출 및 상기 기준 특허 및 상기 대상 특허 간 기술 속성 유사도 산출 시, 기 설정된 산출 옵션에 따라, 상기 기준 특허 및 상기 대상 특허에 부여된 CPC 코드들을 상기 제1구분값 및 상기 제2구분값 구분 없이 이용하여 산출하거나, 상기 제1구분값으로 구분되는 CPC 코드 만을 이용하여 산출하거나, 또는 상기 제2구분값으로 구분되는 CPC 코드 만을 이용하여 산출하거나, 또는 상기 제1구분값으로 구분되는 CPC 코드 만을 이용한 결과 및 상기 제2구분값으로 구분되는 CPC 코드 만을 이용한 결과에 특정 가중치를 반영하여 산출할 수 있다. Specifically, the CPC code granted to a patent is classified into a first segment value or a second segment value, and the step of calculating the degree of similarity of the technology field and the step of calculating the degree of similarity of the technology attribute calculates the degree of similarity of the technology field between the reference patent and the target patent. And when calculating the similarity of the technology attribute between the reference patent and the target patent, according to a preset calculation option, the CPC codes granted to the reference patent and the target patent are used without distinction between the first and second classification values. Calculation, calculation using only the CPC code divided by the first division value, or calculation using only the CPC code divided by the second division value, or the result of using only the CPC code divided by the first division value And it may be calculated by reflecting a specific weight to the result of using only the CPC code classified by the second division value.

구체적으로, 상기 기준 특허에 대한 유사특허 제공 시, 상기 제2구분값 대비 중요도가 높은 상기 제1구분값으로 구분되는 CPC 코드 만을 이용한 결과에 상기 제2구분값으로 구분되는 CPC 코드 만을 이용한 결과에 반영하는 가중치 보다 큰 가중치를 반영하여 산출한 유사도의 크기 순서를 제공 우선순위로 하여 제공하는 단계를 더 포함할 수 있다.Specifically, when providing a similar patent to the reference patent, the result of using only the CPC code divided by the first division value with higher importance compared to the second division value, and the result of using only the CPC code divided by the second division value. The step of providing the order of the magnitude of the similarity calculated by reflecting a weight greater than the weight to be reflected as a provision priority.

상기 목적을 달성하기 위한 본 발명의 일 관점에 따른 유사 특허 분류시스템은, 기준 특허의 CPC(Cooperative Patent Classification) 및 대상 특허의 CPC 코드로부터 정규화한 CPC 세트를 생성하여, 상기 생성한 CPC 세트를 근거로 상기 기준 특허 및 상기 대상 특허 간 기술 분야 유사도를 산출하는 기술분야유사도산출부; 상기 기준 특허의 CPC 코드 및 상기 대상 특허의 CPC 코드로부터 공통 정규화 CPC 코드를 식별하고 상기 공통 정규화 CPC 코드로부터 상기 기준 특허의 CPC 코드까지의 코드 구간 및 상기 대상 특허의 CPC 코드까지의 코드 구간을 반영하여, 상기 기준 특허 및 상기 대상 특허 간 기술 속성 유사도를 산출하는 기술속성유사도산출부; 및 상기 산출한 기술 분야 유사도 및 기술 속성 유사도를 근거로, 상기 기준 특허 및 상기 대상 특허 간 유사도를 측정하는 유사도측정부를 포함한다.A similar patent classification system according to an aspect of the present invention for achieving the above object generates a normalized CPC set from the CPC (Cooperative Patent Classification) of the reference patent and the CPC code of the target patent, and based on the generated CPC set. A technology field similarity calculation unit for calculating a technology field similarity between the reference patent and the target patent; A common normalized CPC code is identified from the CPC code of the reference patent and the CPC code of the target patent, and the code section from the common normalized CPC code to the CPC code of the reference patent and the code section from the CPC code of the target patent are reflected. Thus, a technology attribute similarity calculation unit that calculates a technology attribute similarity between the reference patent and the target patent; And a similarity measuring unit measuring the similarity between the reference patent and the target patent based on the calculated similarity of the technology field and the similarity of the technology attribute.

구체적으로, 상기 기술분야유사도산출부는, CPC 스키마(schema)를 근거로, 상기 기준 특허의 각 CPC 코드 및 상기 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화하고, 상기 기준 특허 및 상기 대상 특허 각각에 대하여, 상기 기준 특허 및 상기 대상 특허에서 상기 정규화한 정규화 CPC 코드 별로 특허 문서 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 상기 CPC 세트로서 생성하고, 상기 기준 특허 및 상기 대상 특허 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬을 근거로, 상기 기준 특허 및 상기 대상 특허 간 기술 분야 유사도를 산출할 수 있다. Specifically, the technical field similarity calculation unit normalizes each CPC code of the reference patent and each CPC code of the target patent to a main-group level based on a CPC schema, and the reference patent And, for each of the target patents, a co-occurrence frequency matrix representing the number of simultaneous appearances in the patent document for each of the normalized CPC codes in the reference patent and the target patent, as the CPC set, and the reference patent and the target patent The similarity of the technology field between the reference patent and the target patent can be calculated based on the co-occurrence frequency matrix for each normalized CPC code generated for each patent.

구체적으로, 상기 기술분야유사도산출부는, 상기 기준 특허 및 상기 대상 특허 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬을 근거로, 정규화 CPC 코드 별로 상기 기준 특허 및 상기 대상 특허에서의 각 동시출현 빈도수를 곱한 값의 총 합, 상기 기준 특허에서 정규화 CPC 코드의 동시출현 빈도수 합, 상기 대상 특허에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하여, 상기 기준 특허 및 상기 대상 특허 간 기술 분야 유사도를 산출할 수 있다. Specifically, the technical field similarity calculation unit, based on the co-occurrence frequency matrix for each normalized CPC code generated for each of the reference patent and the target patent, each co-appearance in the reference patent and the target patent for each normalized CPC code. Calculate the similarity of the technology field between the reference patent and the target patent by using the sum of the values multiplied by the frequency, the sum of the simultaneous occurrence frequencies of the normalized CPC code in the reference patent, and the simultaneous occurrence frequency of the normalized CPC code in the target patent. can do.

구체적으로, 상기 기술속성유사도산출부는, CPC 스키마(schema)를 근거로, 상기 기준 특허의 각 CPC 코드 및 상기 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화하고, 상기 기준 특허 및 상기 대상 특허 각각에 대하여, 상기 정규화한 정규화 CPC 코드 별로 원본 CPC 코드 및 정규화 CPC 코드 사이에 존재하는 상위 구간의 코드로 구성되는 코드구간집합을 식별하고, 상기 기준 특허 및 상기 대상 특허 간 공통된 정규화 CPC 코드 별로, 정규화 CPC 코드를 기준으로 상기 기준 특허에 대해 식별한 코드구간집합 및 상기 대상 특허에 대해 식별한 코드구간집합을 이용하여 원본 CPC 코드 간 유사도를 측정하고, 상기 공통된 정규화 CPC 코드 별로 측정한 원본 CPC 코드 간 유사도를 반영하여, 상기 기준 특허 및 상기 대상 특허 간 기술 속성 유사도를 산출할 수 있다.Specifically, the technology attribute similarity calculation unit normalizes each CPC code of the reference patent and each CPC code of the target patent to a main-group level based on a CPC schema, and the reference patent And for each of the target patents, for each of the normalized CPC codes, identify a code segment set consisting of codes of an upper segment existing between the original CPC code and the normalized CPC code, and normalize common between the reference patent and the target patent. For each CPC code, the similarity between the original CPC codes is measured using the code segment set identified for the reference patent based on the normalized CPC code and the code segment set identified for the target patent, and measured for each common normalized CPC code. By reflecting the degree of similarity between one original CPC code, the degree of similarity of technology properties between the reference patent and the target patent may be calculated.

구체적으로, 상기 기술속성유사도산출부는, 상기 기준 특허 및 상기 대상 특허 각각에 대하여, 상기 기준 특허 및 상기 대상 특허에서 상기 정규화한 정규화 CPC 코드 별로 특허 문서 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 생성하고, 상기 기준 특허 및 상기 대상 특허 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬을 근거로, 상기 기준 특허에서 정규화 CPC 코드의 동시출현 빈도수 합, 상기 대상 특허에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하고, 상기 공통된 정규화 CPC 코드 별로 측정한 원본 CPC 코드 간 유사도를 반영하여, 상기 기준 특허 및 상기 대상 특허 간 기술 속성 유사도를 산출할 수 있다.Specifically, the technology attribute similarity calculation unit, for each of the reference patent and the target patent, generates a simultaneous appearance frequency matrix indicating the number of simultaneous occurrences in the patent document for each of the normalized CPC codes normalized in the reference patent and the target patent. The sum of the simultaneous occurrence frequencies of the normalized CPC codes in the reference patent and the simultaneous appearance of the normalized CPC codes in the target patent based on the generation and simultaneous occurrence frequency matrix for each normalized CPC code generated for each of the reference patent and the target patent. By using the sum of the frequencies and reflecting the similarity between the original CPC codes measured for each of the common normalized CPC codes, the similarity of the technology properties between the reference patent and the target patent may be calculated.

이에, 본 발명에 의하면, 특허에 부여되는 CPC 코드가 갖는 특징 예컨대 코드 속성, 코드 계층관계 등을 활용하여 CPC 코드를 기반으로 특허 간 유사도를 산정하는 세분화되고 구체화된 기술을 실현하는, 새로운 방식의 유사 특허 분류 방법을 구현할 수 있다.Accordingly, according to the present invention, a new method of realizing a subdivided and detailed technology that calculates the similarity between patents based on the CPC code by utilizing the features of the CPC code granted to the patent, such as code properties, code hierarchy, etc. A similar patent classification method can be implemented.

이로 인해, 본 발명에 따르면, 특허 분류/선별의 성능 및 정확도를 현저히 개선할 수 있는 효과를 도출한다.For this reason, according to the present invention, an effect of remarkably improving the performance and accuracy of patent classification/selection is derived.

도 1은 본 발명의 실시예에 따른 유사 특허 분류시스템의 구성을 보여주는 블록 예시도이다.
도 2는 본 발명의 실시예에 따른 유사 특허 분류 방법의 동작 흐름을 보여주는 예시도이다.1 is a block diagram showing the configuration of a similar patent classification system according to an embodiment of the present invention.
2 is an exemplary diagram showing an operation flow of a similar patent classification method according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 대하여 설명한다.Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

한편, 기존에는, 특허에 부여되는 국제특허분류(IPC)를 기반으로 특허 간 유사도를 산정하여 특허를 분류/선별하는 기술도 존재한다.Meanwhile, conventionally, there is also a technology for classifying/selecting patents by calculating the similarity between patents based on the International Patent Classification (IPC) granted to patents.

헌데, IPC는 최신의 기술 분야를 포함하고 있지 않으며, 기술의 상세 분류가 충분하지 않아, 이를 기반으로 한 특허 분류/선별의 정확도 역시 만족할 만한 수준에 미치지 못하는 한계점이 있다.However, IPC does not include the latest technology fields, and detailed classification of technology is not sufficient, so the accuracy of patent classification/selection based on this also has a limitation that does not reach a satisfactory level.

한편, 최근에는, 특허에 최신의 기술 분야를 포함하고 상세한 기술 분류를 위한 선진특허분류(CPC: Cooperative Patent Classification)가 부여되고 있으나, CPC 코드가 갖는 특징을 고려 및 이를 기반으로 특허를 분류/선별하는 구체화된 기술이 미흡한 상황이다. On the other hand, in recent years, the patent includes the latest technical field and has been granted Cooperative Patent Classification (CPC) for detailed technology classification, but the characteristics of the CPC code are considered and patents are classified/selected based on this. It is a situation in which the detailed technology to be performed is insufficient.

이에, 본 발명에서는, 특허에 부여되는 CPC 코드가 갖는 특징 예컨대 코드 속성, 코드 계층관계 등을 활용하여 CPC 코드를 기반으로 특허 간 유사도를 산정하는 세분화되고 구체화된 기술을 실현함으로써, 특허 분류/선별의 성능 및 정확도를 현저히 개선할 수 있는 방안(이하, 유사 특허 분류방법)을 제안하고자 한다.Accordingly, in the present invention, by realizing a subdivided and detailed technology that calculates the degree of similarity between patents based on the CPC code by utilizing features of the CPC code granted to the patent, such as code properties, code hierarchy, etc., patent classification/selection We would like to propose a method that can significantly improve the performance and accuracy of (hereinafter, a similar patent classification method).

본 발명의 실시예에 따른 유사 특허 분류방법은, 크게 기준 특허에 대해 유사도를 측정할 비교대상 특허군 선정, 기준 특허 및 비교대상 특허군 내 각 대상 특허 간 유사도 측정, 측정된 유사도를 근거로 한 유사특허 분류, 분류된 유사특허 제공(예: 디스플레이)의 4단계 과정으로 구분할 수 있다. The similar patent classification method according to an embodiment of the present invention is largely based on the selection of a comparison target patent group to measure the similarity of the reference patent, measurement of the similarity between the reference patent and each target patent within the comparison target patent group, and the measured similarity. It can be classified into a four-step process of classifying similar patents and providing classified similar patents (eg, display).

본 발명의 유사 특허 분류방법은, 전술과 같이 구분되는 과정 중 특히 기준 특허 및 대상 특허 간 유사도 측정, 유사특허 제공(예: 디스플레이)의 과정에 핵심적 구성이 있다 할 수 있다.The similar patent classification method of the present invention can be said to have a core configuration in the process of measuring the similarity between a reference patent and a target patent, and providing a similar patent (eg, display) among the processes classified as described above.

보다 구체적으로는, 본 발명에서는, 기준 특허 및 대상 특허 간 유사도 측정의 과정에서, 특허에 부여되는 CPC 코드가 갖는 특징 예컨대 코드 속성, 코드 계층관계 등을 활용하여 CPC 코드를 기반으로 특허 간 유사도를 산정/측정하는 세분화되고 구체화된 기술을 실현할 수 있다.More specifically, in the present invention, in the process of measuring the similarity between the reference patent and the target patent, the similarity between patents is determined based on the CPC code by utilizing features of the CPC code granted to the patent, such as code properties and code hierarchy. It is possible to realize detailed and detailed technology of calculation/measurement.

도 1은 본 발명의 실시예에 따른 유사 특허 분류방법을 실현하는 시스템의 구성을 보여주고 있다.1 shows the configuration of a system for realizing a similar patent classification method according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명은, 기준 특허에 대해 유사도를 측정할 비교대상 특허군 선정하고, 기준 특허 및 비교대상 특허군 내 각 대상 특허 간 유사도 측정하여, 측정된 유사도를 근거로 한 유사특허 분류하고, 분류된 유사특허 제공(예: 디스플레이)함으로써, 본 발명의 유사 특허 분류방법을 수행할 수 있다.As shown in FIG. 1, the present invention selects a comparison target patent group to measure the similarity to a reference patent, and measures the similarity between each target patent in the reference patent and the comparison target patent group, and based on the measured similarity. By classifying similar patents and providing the classified similar patents (eg, display), the similar patent classification method of the present invention can be performed.

보다 구체적으로 설명하면, 도 1에 도시된 바와 같이, 본 발명의 유사 특허 분류방법을 실현하는 시스템(이하, 유사 특허 분류 시스템(100))은, 기술분야유사도산출부(120), 기술속성유사도산출부(130), 유사도측정부(140)를 포함할 수 있다.More specifically, as shown in Fig. 1, the system for realizing the similar patent classification method of the present invention (hereinafter, the similar patent classification system 100) includes a technical field similarity calculation unit 120, a technology attribute similarity diagram. A calculation unit 130 and a similarity measurement unit 140 may be included.

그리고, 본 발명의 실시예에 따른 유사 특허 분류 시스템(100)은, 기준/비교대상 특허군 선정부(110)를 더 포함할 수 있고, 유사특허분류부(150)를 더 포함할 수 있다.Further, the similar patent classification system 100 according to an embodiment of the present invention may further include a reference/comparison target patent group selection unit 110, and may further include a similar patent classification unit 150.

아울러, 본 발명의 실시예에 따른 유사 특허 분류 시스템(100)은, 전술한 구성 이외에, 특허문서DB(10), 유관기관 서버(미도시), 운영자 또는 이용자의 개인 디바이스(미도시) 등 외부 디바이스와의 실질적인 통신 기능을 담당하는 통신부(미도시)의 구성을 더 포함할 수 있다.In addition, the similar patent classification system 100 according to the embodiment of the present invention, in addition to the above-described configuration, is a patent document DB 10, a related institution server (not shown), an operator or a user's personal device (not shown), etc. It may further include a configuration of a communication unit (not shown) in charge of a practical communication function with the device.

이러한 유사 특허 분류 시스템(100)의 구성 전체 내지는 적어도 일부는 하드웨어 모듈 형태 또는 소프트웨어 모듈 형태로 구현되거나, 하드웨어 모듈과 소프트웨어 모듈이 조합된 형태로도 구현될 수 있다.All or at least a part of the similar patent classification system 100 may be implemented in the form of a hardware module or a software module, or may be implemented in a form in which a hardware module and a software module are combined.

여기서, 소프트웨어 모듈이란, 예컨대 유사 특허 분류 시스템(100) 내에서 연산을 제어하는 프로세서에 의해 실행되는 명령어로 이해될 수 있으며, 이러한 명령어는 유사 특허 분류 시스템(100) 내 메모리에 탑재된 형태를 가질 수 있을 것이다.Here, the software module may be understood as, for example, an instruction executed by a processor that controls an operation in the similar patent classification system 100, and such instructions have a form mounted in a memory in the similar patent classification system 100. I will be able to.

결국, 본 발명의 실시예에 따른 유사 특허 분류 시스템(100)은, 전술한 구성을 통해, 본 발명에서 제안하는 유사 특허 분류방법을 실현하며, 이하에서는 이를 실현하기 위한 유사 특허 분류 시스템(100) 내 각 구성에 대해 보다 구체적으로 설명하기로 한다.After all, the similar patent classification system 100 according to the embodiment of the present invention realizes the similar patent classification method proposed in the present invention through the above-described configuration, and hereinafter, the similar patent classification system 100 for realizing this. Each of my components will be described in more detail.

기술분야유사도산출부(120)는, 기준 특허의 CPC(Cooperative Patent Classification) 코드 및 대상 특허의 CPC 코드로부터 정규화한 CPC 세트를 생성하여, 상기 생성한 CPC 세트를 근거로 상기 기준 특허 및 상기 대상 특허 간 기술 분야 유사도를 산출하는 기능을 수행한다.The technical field similarity calculation unit 120 generates a CPC set normalized from the CPC (Cooperative Patent Classification) code of the reference patent and the CPC code of the target patent, and based on the generated CPC set, the reference patent and the target patent It performs the function of calculating the similarity of the liver technology field.

기술속성유사도산출부(130)는, 상기 기준 특허의 CPC 코드 및 상기 대상 특허의 CPC 코드로부터 공통 정규화 CPC 코드를 식별하고 상기 공통 정규화 CPC 코드로부터 상기 기준 특허의 CPC 코드까지의 코드 구간 및 상기 대상 특허의 CPC 코드까지의 코드 구간을 반영하여, 상기 기준 특허 및 상기 대상 특허 간 기술 속성 유사도를 산출하는 기능을 수행한다.The technology attribute similarity calculation unit 130 identifies a common normalized CPC code from the CPC code of the reference patent and the CPC code of the target patent, and the code section from the common normalized CPC code to the CPC code of the reference patent and the object A function of calculating the similarity of technology properties between the reference patent and the target patent is performed by reflecting the code section up to the CPC code of the patent.

유사도측정부(140)은, 기술분야유사도산출부(120)에서 산출한 기술 분야 유사도 및 기술속성유사도산출부(130)에서 산출한 기술 속성 유사도를 근거로, 상기 기준 특허 및 상기 대상 특허 간 유사도를 측정하는 기능을 수행한다.The similarity measurement unit 140, based on the similarity of the technology field calculated by the technology field similarity calculation unit 120 and the similarity of the technology attribute calculated by the technology attribute similarity calculation unit 130, the similarity between the reference patent and the target patent It performs the function of measuring.

이처럼, 본 발명에서는, 기준 특허 및 대상 특허 간 유사도를 산정/산출하는데 있어, 기술분야 간 유사도 및 기술속성 유사도를 구분하여 산출한 후 합산을 통해 유사도를 산정/산출하는 기술 방식을 제안하고 있다.As described above, in the present invention, in calculating/calculating the similarity between the reference patent and the target patent, a technology method of calculating/calculating the similarity through the summation after calculating the similarity between the technical fields and the similarity of the technology attributes is proposed.

이하에서는, 기술분야유사도산출부(120) 및 기술속성유사도산출부(130)의 기능 수행에 대해 구체적으로 설명하겠다.Hereinafter, the functions of the technical field similarity calculation unit 120 and the technology attribute similarity calculation unit 130 will be described in detail.

구체적 설명에 앞서, 설명의 편의 상, 기준 특허와 대상 특허에 대해 정의하도록 하겠다.Prior to the detailed description, for convenience of description, a reference patent and a target patent will be defined.

기준/비교대상 특허군 선정부(110)는, 유사특허를 분류하는데 기준이 될 기준 특허를 선정할 수 있다.The reference/comparison target patent group selection unit 110 may select a reference patent to be used as a reference for classifying similar patents.

이러한 기준 특허는, 운영자 또는 이용자에 의해 특정되는 특허일 수 있으며, 운영자 또는 이용자에 의해 특정된 기술분야에서 기 정의된 기준 선정 정책에 따라 자동으로 추출/특정되는 특허일 수 있다.Such a reference patent may be a patent specified by an operator or user, and may be a patent that is automatically extracted/specified according to a standard selection policy defined in the technical field specified by the operator or user.

예를 들면, 기준 특허는, 제목, 초록, CPC, 핵심 제품/기술 목록, 출원인 등이 포함되는 서지정보에서, 키워드 검색을 통해 검색된 검색 결과 목록에서 수동 또는 자동으로 선정될 수 있다.For example, the reference patent may be manually or automatically selected from a list of search results searched through a keyword search from bibliographic information including a title, abstract, CPC, core product/technology list, applicant, and the like.

이처럼, 본 발명에서는 기준 특허가 선정되는 방식에 제한이 없다 할 것이다.As such, in the present invention, it will be said that there is no limitation on the manner in which the reference patent is selected.

그리고, 기준/비교대상 특허군 선정부(110)는, 기준 특허와 관련하여, 비교대상 특허군을 선정할 수 있다.In addition, the reference/comparison target patent group selection unit 110 may select a comparison target patent group in relation to the reference patent.

구체적인 실시예를 설명하면, 기준/비교대상 특허군 선정부(110)는, 기준 특허의 핵심 제품/기술을 식별하고 이와 유사/동일한 제품/기술을 핵심 제품/기술로 포함하는 특허들을 그룹핑하여 비교대상 특허군으로 선정할 수 있다.Explaining a specific embodiment, the reference/comparison target patent group selection unit 110 identifies the core product/technology of the reference patent, and groups and compares patents including similar/same product/technology as core products/technology. It can be selected as a target patent group.

여기서, 핵심 제품/기술이란, 특허 문서에서 중심적으로 다루고 있는 대상을 의미하며, 예를 들어, "motion sensor for detecting the activity of person"의 명칭을 갖는 특허를 가정할 때, "motion sensor"를 해당 특허의 핵심 제품/기술의 분야(수식어(motion) + 표제어(sensor))으로 식별할 수 있고, "detecting the activity of person"를 핵심 제품/기술의 속성(기능(detecting) + 작용 대상(activity of person))으로 식별할 수 있다.Here, the core product/technology refers to an object that is mainly dealt with in the patent document. For example, assuming a patent with the name of "motion sensor for detecting the activity of person", "motion sensor" corresponds to It can be identified as the field of the core product/technology of the patent (motion + sensor), and "detecting the activity of person" is the property of the core product/technology (detecting) + activity of person)).

아울러, 기준/비교대상 특허군 선정부(110)는, SAO 검색을 통해, 기준 특허의 핵심 제품/기술의 분야(수식어(motion) + 표제어(sensor))를 동의어 범위까지 확장하여, 비교대상 특허군으로 선정할 수도 있다.In addition, the reference/comparison target patent group selection unit 110 expands the field of the core product/technology of the reference patent to the synonym range through SAO search, You can also select a group.

또 다른 실시예를 설명하면, 기준/비교대상 특허군 선정부(110)는, 전술과 같이 1차로 선정한 비교대상 특허군을 그대로 사용하지 않고, 추가적인 필터링을 통해 더욱 정교하게 비교대상 특허군으로 선정할 수 있다.In another embodiment, the reference/comparison target patent group selection unit 110 does not use the firstly selected target patent group for comparison as described above, but more elaborately selects the target patent group for comparison through additional filtering. can do.

예를 들면, 기준/비교대상 특허군 선정부(110)는, 전술과 같이 1차로 선정한 비교대상 특허군에 대하여, 기준 특허의 CPC 코드를 최소 N개 이상 포함하는 특허들만 필터링하여 비교대상 특허군으로 선정할 수 있다.For example, the reference/comparison target patent group selection unit 110 filters only patents including at least N CPC codes of the reference patent with respect to the comparison target patent group selected first as described above, and filters the target patent group to be compared. Can be selected as.

이때, 유사/동일한 핵심 제품/기술은, 예를 들어 dirt cleaner 라 하더라도, 진공청소기, 오염제거 화학물질, 청소도구 등 다양한 기술 분야 및 속성에 따라 유사도가 결정되므로, 유사도 수준의 최소 조건을 CPC 코드 개수 N개로 정의 할 수 있다.At this time, the similarity/same core product/technology is determined according to various technical fields and properties such as vacuum cleaners, decontamination chemicals, cleaning tools, etc., even if it is a dirt cleaner, so the minimum condition of the similarity level is the CPC code. It can be defined as the number of N.

이에, 기술분야유사도산출부(120) 및 기술속성유사도산출부(130)에서 언급하는 대상 특허는, 기준 특허에 대해 선정한 비교대상 특허군에 속해 있는 각 특허를 의미할 수 있다.Accordingly, the target patents referred to in the technical field similarity calculation unit 120 and the technology attribute similarity calculation unit 130 may mean each patent belonging to the comparison target patent group selected for the reference patent.

이하에서 먼저, 기술분야유사도산출부(120)의 기능 수행에 대해 설명하겠다.In the following, first, the function of the technical field similarity calculation unit 120 will be described.

기술분야유사도산출부(120)는, CPC 스키마(schema)를 근거로, 기준 특허의 각 CPC 코드 및 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화한다.The technical field similarity calculation unit 120 normalizes each CPC code of a reference patent and each CPC code of a target patent to a main-group level based on a CPC schema.

특허 문서에는, 다수 개의 CPC 코드가 부여되며, CPC 코드는 섹션(section), 클래스(class), 서브 클래스(sub-class), 메인 그룹(main-group), 서브 그룹(sub-group), 서브 그룹 이하의 계층적 구조로 기술을 분류하며 코드 형태로 기술 분류를 표현한다.In the patent document, a number of CPC codes are assigned, and the CPC codes are section, class, sub-class, main-group, sub-group, and sub- The technology is classified in a hierarchical structure below the group, and the technology classification is expressed in code form.

기술분야유사도산출부(120)는, 위 설명한 CPC 코드의 스키마/계층구조를 근거로, 기준 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화하고, 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화할 수 있다.The technical field similarity calculation unit 120 normalizes each CPC code of a reference patent to a main-group level based on the schema/hierarchy of the CPC code described above, and selects each CPC code of the target patent as a main group. It can be normalized to the main-group level.

그리고, 기술분야유사도산출부(120)는, 기준 특허 및 대상 특허 각각에 대하여, 기준 특허 및 대상 특허에서 정규화한 정규화 CPC 코드 별로 특허 문서 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 상기 CPC 세트로서 생성한다.In addition, the technical field similarity calculation unit 120, for each of the reference patent and the target patent, sets the CPC set to a co-occurrence frequency matrix indicating the number of simultaneous appearances in the patent document for each normalized CPC code normalized by the reference patent and the target patent. It is created as

구체적으로 설명하면, 특허에 부여된 다수 개의 CPC 코드를 메인 그룹(main-group) 수준으로 정규화하면, 정규화한 정규화 CPC 코드의 개수는 정규화 이전(원본) CPC 코드의 개수 보다 적거나 동일할 것이다.Specifically, when a plurality of CPC codes granted to a patent are normalized to the main-group level, the number of normalized CPC codes will be less than or equal to the number of CPC codes before normalization (original).

기술분야유사도산출부(120)는, 기준 특허에서 정규화한 정규화 CPC 코드 별로, 특허 문서 즉 기준 특허 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 기준 특허의 CPC 세트로서 생성할 수 있다.The technical field similarity calculation unit 120 may generate, for each normalized CPC code normalized in the reference patent, a concurrent appearance frequency matrix indicating the number of simultaneous appearances in the patent document, that is, the reference patent, as a CPC set of the reference patent.

마찬가지로, 기술분야유사도산출부(120)는, 대상 특허에서 정규화한 정규화 CPC 코드 별로, 특허 문서 즉 대상 특허 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 대상 특허의 CPC 세트로서 생성할 수 있다.Likewise, the technical field similarity calculation unit 120 may generate, for each normalized CPC code normalized in the target patent, a simultaneous appearance frequency matrix indicating the number of simultaneous appearances in the patent document, that is, the target patent, as a CPC set of the target patent.

기준 특허를 A, 대상 특허를 B라고 가정하면, 기준 특허의 CPC 세트로서 생성한 동시출현 빈도수 행렬을 Ai, 대상 특허의 CPC 세트로서 생성한 동시출현 빈도수 행렬을 Bi로 정의할 수 있다. Assuming that the reference patent is A and the target patent is B, the co-occurrence frequency matrix generated as the CPC set of the reference patent can be defined as Ai, and the co-occurrence frequency matrix generated as the CPC set of the target patent can be defined as Bi.

이때, 기준 특허의 동시출현 빈도수 행렬을 Ai, 대상 특허의 동시출현 빈도수 행렬을 Bi는, A 문서 및 B 문서에서 정규화한 정규화 CPC 코드의 전체집합에 해당되는 n개의 정규화 CPC 코드를 행렬 성분(entry)으로 동일하게 가지며, 각 행렬 성분(정규화 CPC 코드)은 해당 특허 내에서 해당 행렬 성분(정규화 CPC 코드)이 출현한 횟수로 표현될 수 있다. At this time, n normalized CPC codes corresponding to the entire set of normalized CPC codes normalized in document A and document B, where Ai is the co-occurrence frequency matrix of the reference patent and Bi is the co-occurrence frequency matrix of the target patent. ), and each matrix component (normalized CPC code) can be expressed as the number of occurrences of the corresponding matrix component (normalized CPC code) within the patent.

예를 들어, 기준 특허인 A 문서에만 존재하는 정규화 CPC 코드가 A 문서 내에서 3회(섹션부터 메인 그룹까지 동일하며 이하 하위 구간의 코드가 다른 CPC 코드가 A 문서에 3개 부여된 경우) 출현한 경우를 가정하면, Ai에서 해당 정규화 CPC 코드의 행렬 성분에는 3이 표현되고 Ai에서 해당 정규화 CPC 코드의 행렬 성분에는 0이 표현될 수 있다. For example, the normalized CPC code that exists only in document A, which is a reference patent, appears 3 times in document A (when 3 CPC codes are assigned to document A with the same code from section to main group and with different codes in the sub-section below) Assuming one case, 3 may be represented in the matrix component of the corresponding normalized CPC code in Ai, and 0 may be represented in the matrix component of the corresponding normalized CPC code in Ai.

그리고, 기술분야유사도산출부(120)는, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 즉 Ai 및 Bi를 근거로, 기준 특허(A) 및 대상 특허(B) 간 기술 분야 유사도를 산출할 수 있다.In addition, the technical field similarity calculation unit 120 is based on the reference patent (A) and the reference patent (A) and the reference patent (A), based on the simultaneous occurrence frequency matrix for each normalized CPC code, that is, Ai and Bi generated for each of the reference patent (A) and the target patent (B). The degree of similarity in the technical field between the target patents (B) can be calculated.

보다 구체적으로 설명하면, 기술분야유사도산출부(120)는, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 즉 Ai 및 Bi를 근거로, 정규화 CPC 코드 별로 기준 특허(A) 및 대상 특허(B)에서의 각 동시출현 빈도수를 곱한 값의 총 합, 기준 특허(A)에서 정규화 CPC 코드의 동시출현 빈도수 합, 대상 특허(B)에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하여, 기준 특허(A) 및 대상 특허(B) 간 기술 분야 유사도를 산출할 수 있다.More specifically, the technical field similarity calculation unit 120 is based on the co-occurrence frequency matrix for each normalized CPC code generated for each of the reference patent (A) and the target patent (B), i.e., Ai and Bi. For each code, the sum of the multiplied by the number of simultaneous occurrences in the reference patent (A) and the target patent (B), the sum of the number of simultaneous occurrences of the normalized CPC code in the reference patent (A), the normalized CPC code in the target patent (B) By using the sum of the frequency of simultaneous occurrence of, the degree of similarity in the technical field between the reference patent (A) and the target patent (B) can be calculated.

구체적인 실시예에 따르면, 기술분야유사도산출부(120)는, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 즉 Ai 및 Bi를 근거로, 정규화 CPC 코드 별로 기준 특허(A) 및 대상 특허(B)에서의 각 동시출현 빈도수를 곱한 값의 총 합을, 다음의 수학식 1에 따라 계산할 수 있다.According to a specific embodiment, the technical field similarity calculation unit 120 is based on the co-occurrence frequency matrix for each normalized CPC code generated for each of the reference patent (A) and the target patent (B), that is, the normalized CPC. The total sum of the values obtained by multiplying the number of simultaneous appearances in the reference patent (A) and the target patent (B) for each code can be calculated according to Equation 1 below.

그리고, 기술분야유사도산출부(120)는, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 즉 Ai 및 Bi를 근거로, 기준 특허(A)에서 정규화 CPC 코드의 동시출현 빈도수 합, 대상 특허(B)에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하여, 다음의 수학식 2에 따라 계산할 수 있다.And, the technical field similarity calculation unit 120, based on the simultaneous occurrence frequency matrix for each normalized CPC code, i.e. Ai and Bi, generated for each of the reference patent (A) and the target patent (B), in the reference patent (A). It can be calculated according to Equation 2 below by using the sum of the number of simultaneous occurrences of the normalized CPC code and the sum of the number of simultaneous occurrences of the normalized CPC code in the target patent (B).

이에, 기술분야유사도산출부(120)는, 전술의 수학식 1 및 수학식 2에서 계산한 결과를 분자/분모로 사용하여, 다음 수학식 3에 따라 기준 특허(A) 및 대상 특허(B) 간 기술 분야 유사도(Classification Similarity, CS(A,B))를 산출할 수 있다.Accordingly, the technical field similarity calculation unit 120 uses the result calculated in Equation 1 and Equation 2 as a numerator/denominator, and uses the reference patent (A) and the target patent (B) according to Equation 3 below. Classification Similarity (CS(A,B)) can be calculated.

이하에서는, 기술속성유사도산출부(130)의 기능 수행에 대해 설명하겠다.Hereinafter, the function of the technology attribute similarity calculation unit 130 will be described.

기술속성유사도산출부(130)는, CPC 스키마(schema)를 근거로, 기준 특허의 각 CPC 코드 및 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화한다.The technology attribute similarity calculation unit 130 normalizes each CPC code of a reference patent and each CPC code of a target patent to a main-group level based on a CPC schema.

기술속성유사도산출부(130)는, 전술의 기술분야유사도산출부(120)에서 수행된 정규화 결과를 공유하여 사용할 수 있고, 별도로 정규화를 수행할 수도 있다.The technology attribute similarity calculation unit 130 may share and use the normalization result performed by the technical field similarity calculation unit 120 described above, or may perform normalization separately.

이하에서는 설명의 편의 상, 전술의 설명과 마찬가지로, 기준 특허를 A, 대상 특허를 B라고 가정하겠다.Hereinafter, for convenience of explanation, as in the above description, it is assumed that the reference patent is A and the target patent is B.

그리고, 기술속성유사도산출부(130)는, 기준 특허(A) 및 대상 특허(B) 각각에 대하여, 정규화한 정규화 CPC 코드 별로 원본 CPC 코드 및 정규화 CPC 코드 사이에 존재하는 상위 구간의 코드로 구성되는 코드구간집합을 식별한다.And, for each of the reference patent (A) and the target patent (B), the technology attribute similarity calculation unit 130 is composed of a code of an upper section existing between the original CPC code and the normalized CPC code for each normalized CPC code. Identifies the set of code segments to be used.

구체적으로 설명하면, 기술속성유사도산출부(130)는, 기준 특허(A)에서 정규화한 정규화 CPC 코드 별로, 정규화한 정규화 CPC 코드 별로 원본 CPC 코드 및 정규화 CPC 코드 사이에 존재하는 상위 구간의 코드로 구성되는 코드구간집합을 식별한다.Specifically, the technology attribute similarity calculation unit 130 is a code of an upper section existing between the original CPC code and the normalized CPC code for each normalized CPC code normalized in the reference patent (A) and for each normalized CPC code. Identify the set of code segments to be composed.

예를 들어, 기준 특허인 A 문서에 부여된 다수 개의 CPC 코드 중, 섹션부터 메인 그룹까지 동일하며 그 다음 구간의 코드가 다른 CPC 코드가 3개가 있다고 가정하면, 이 3개의 서로 다른 원본 CPC 코드에 대해 정규화한 정규화 CPC 코드는 1개일 것이다.For example, if there are three CPC codes that are the same from the section to the main group and have different codes in the next section among the multiple CPC codes granted to document A, which is a reference patent, these three different original CPC codes There will be one normalized CPC code normalized to.

이 경우, 기술속성유사도산출부(130)는, 기준 특허(A)에서 전술한 1개의 정규화 CPC 코드에 대해, 3개의 서로 다른 원본 CPC 코드 별로 원본 CPC 코드와 정규화 CPC 코드 사이에 존재하는 원본 CPC 코드 관점 상위 구간의 코드 즉 서브 그룹(sub-group) 및 그 이하의 코드들을 모두 전술한 1개의 정규화 CPC 코드에 대한 코드구간집합으로 식별할 수 있다.In this case, the technology attribute similarity calculation unit 130, for the one normalized CPC code described above in the reference patent (A), is an original CPC existing between the original CPC code and the normalized CPC code for each of three different original CPC codes. Codes of the upper section of the code perspective, that is, sub-groups and codes below it can all be identified as a set of code sections for one normalized CPC code described above.

이와 같은 방식으로, 기술속성유사도산출부(130)는, 기준 특허(A)에서 정규화한 정규화 CPC 코드 별로, 코드구간집합으로 식별할 수 있다.In this way, the technology attribute similarity calculation unit 130 can identify each normalized CPC code normalized in the reference patent (A) as a code segment set.

마찬가지로, 기술속성유사도산출부(130)는, 대상 특허(B)에서 정규화한 정규화 CPC 코드 별로, 코드구간집합으로 식별할 수 있다.Similarly, the technology attribute similarity calculation unit 130 may identify each normalized CPC code normalized in the target patent (B) as a code segment set.

그리고, 기술속성유사도산출부(130)는, 기준 특허(A) 및 대상 특허(B) 간 공통된 정규화 CPC 코드 별로, 정규화 CPC 코드를 기준으로 기준 특허(A)에 대해 식별한 코드구간집합 및 대상 특허(B)에 대해 식별한 코드구간집합을 이용하여 원본 CPC 코드 간 유사도를 측정할 수 있다.In addition, the technology attribute similarity calculation unit 130 is, for each normalized CPC code common between the reference patent (A) and the target patent (B), a set of code segments and targets identified for the reference patent (A) based on the normalized CPC code. The similarity between the original CPC codes can be measured using the set of code segments identified for the patent (B).

여기서, 각 원본 CPC 코드 간 유사도는, 기준 특허(A) 및 대상 특허(B) 간 공통된 정규화 CPC 코드 별로, 기준 특허(A)의 원본 CPC 코드 및 정규화 CPC 코드 사이에서 식별한 코드구간집합 및 대상 특허(B)의 원본 CPC 코드 및 정규화 CPC 코드 사이에서 식별한 코드구간집합에 대한 교집합 내 코드 개수와 합집합 내 코드 개수를 이용하여, 교집합 내 코드 개수가 많을수록 높은 유사도로 측정될 수 있다.Here, the degree of similarity between each original CPC code is determined by the common normalized CPC code between the reference patent (A) and the target patent (B), the set of code segments and targets identified between the original CPC code and the normalized CPC code of the reference patent (A). Using the number of codes in the intersection and the number of codes in the union for the code section set identified between the original CPC code and the normalized CPC code of the patent (B), the higher the number of codes in the intersection, the higher the similarity can be measured.

구체적으로 기술속성유사도산출부(130)는, 기준 특허(A) 및 대상 특허(B) 간 공통된 정규화 CPC 코드 별로, 기준 특허(A)의 원본 CPC 코드 및 정규화 CPC 코드 사이에서 식별한 코드구간집합 및 대상 특허(B)의 원본 CPC 코드 및 정규화 CPC 코드 사이에서 식별한 코드구간집합에 대한 교집합 내 코드 개수를 분자로 사용하고 그 합집합 내 코드 개수를 분모로 사용하는, 다음 수학식 4에 따라 기준 특허(A) 및 대상 특허(B) 사이의 원본 CPC 코드 간 유사도를 측정할 수 있다.Specifically, the technology attribute similarity calculation unit 130 is a set of code segments identified between the original CPC code and the normalized CPC code of the reference patent (A) for each normalized CPC code common between the reference patent (A) and the target patent (B). And the number of codes in the intersection for the set of code segments identified between the original CPC code and the normalized CPC code of the target patent (B) as a numerator and the number of codes in the union as a denominator, based on Equation 4 below. Similarity between original CPC codes between patent (A) and target patent (B) can be measured.

보다 구체적으로 설명하면, 기술속성유사도산출부(130)는, 기준 특허(A) 및 대상 특허(B) 각각에 대하여, 기준 특허(A) 및 대상 특허(B)에서 정규화한 정규화 CPC 코드 별로 특허 문서 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 생성할 수 있다. More specifically, the technology attribute similarity calculation unit 130, for each of the reference patent (A) and the target patent (B), applies a patent for each normalized CPC code normalized in the reference patent (A) and the target patent (B). It is possible to create a co-occurrence frequency matrix representing the co-occurrence frequencies in the document.

기술속성유사도산출부(130)는, 전술의 기술분야유사도산출부(120)에서 생성한 동시출현 빈도수 행렬을 공유하여 사용할 수 있고, 별도로 생성할 수도 있다.The technology attribute similarity calculation unit 130 may share and use the co-occurrence frequency matrix generated by the technical field similarity calculation unit 120 described above, or may generate a separate generation.

이하에서는 설명의 편의 상, 전술의 설명과 마찬가지로, 기준 특허를 A, 대상 특허를 B라고 가정하고, 기준 특허의 CPC 세트로서 생성한 동시출현 빈도수 행렬을 Ai, 대상 특허의 CPC 세트로서 생성한 동시출현 빈도수 행렬을 Bi로 정의하여 설명하겠다.In the following, for convenience of explanation, as in the above description, assuming that the reference patent is A and the target patent is B, the simultaneous appearance frequency matrix generated as the CPC set of the reference patent is generated as Ai and the CPC set of the target patent. We will explain by defining the frequency matrix of appearance as Bi.

이에, 기술속성유사도산출부(130)는, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 Ai 및 Bi를 근거로, 정규화 CPC 코드 별로 기준 특허(A) 및 대상 특허(B)에서의 각 동시출현 빈도수를 곱한 값의 총 합(전술의 수학식 1)에, 정규화 CPC 코드 별로 기준 특허(A)의 코드구간집합 및 대상 특허(B)의 코드구간집합에 대한 교집합 내 코드 개수(분자)/그 합집합 내 코드 개수(분모)를 반영하는, 다음 수학식 4에 따라 기준 특허(A) 및 대상 특허(B) 사이의 원본 CPC 코드 간 유사도를 측정할 수 있다.Accordingly, the technology attribute similarity calculation unit 130, based on the simultaneous occurrence frequency matrices Ai and Bi for each normalized CPC code generated for each of the reference patent (A) and the target patent (B), the reference patent for each normalized CPC code ( A) and the target patent (B) multiplied by the number of simultaneous occurrences (Equation 1 above), for each normalized CPC code, the code segment set of the reference patent (A) and the code of the target patent (B) Measure the similarity between the original CPC code between the reference patent (A) and the target patent (B) according to the following Equation 4, which reflects the number of codes in the intersection set (molecule)/the number of codes in the union set (denominator) for the interval set. can do.

수학식 4에서 알 수 있듯이, 기준 특허(A) 및 대상 특허(B) 간 원본 CPC 코드 유사도는, 원본 CPC 코드 및 정규화 CPC 코드 사이의 코드 구간(거리)를 반영한 코드구간집합을 활용함으로써, 기준 특허(A) 및 대상 특허(B) 간 코드 구간(거리)의 동일성이 높을수록 즉 교집합 내 코드 개수가 많을수록 기준 특허(A) 및 대상 특허(B) 간 원본 CPC 코드 유사도가 높게 측정될 것이다.As can be seen from Equation 4, the similarity of the original CPC code between the reference patent (A) and the target patent (B) is determined by utilizing the code section set reflecting the code section (distance) between the original CPC code and the normalized CPC code. The higher the identity of the code section (distance) between the patent (A) and the target patent (B), that is, the greater the number of codes in the intersection, the higher the similarity of the original CPC code between the reference patent (A) and the target patent (B) will be measured.

이에, 기술속성유사도산출부(130)는, 전술의 수학식 4를 통해 공통된 정규화 CPC 코드 별로 측정한 원본 CPC 코드 간 유사도를 반영하여, 기준 특허(A) 및 대상 특허(B) 간 기술 속성 유사도를 산출할 수 있다. Accordingly, the technology attribute similarity calculation unit 130 reflects the similarity between the original CPC codes measured for each common normalized CPC code through Equation 4 above, and the technology attribute similarity between the reference patent (A) and the target patent (B) Can be calculated.

구체적으로, 기술속성유사도산출부(130)는, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 Ai 및 Bi를 근거로, 기준 특허(A)에서 정규화 CPC 코드의 동시출현 빈도수 합, 대상 특허(B)에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하고, 공통된 정규화 CPC 코드 별로 측정한 원본 CPC 코드 간 유사도를 반영하여, 기준 특허(A) 및 대상 특허(B) 간 기술 속성 유사도를 산출할 수 있다. Specifically, the technology attribute similarity calculation unit 130 is based on the simultaneous occurrence frequency matrices Ai and Bi for each normalized CPC code generated for each of the reference patent (A) and the target patent (B), in the reference patent (A). Using the sum of the number of simultaneous occurrences of the normalized CPC code, the sum of the number of simultaneous occurrences of the normalized CPC code in the target patent (B), and reflecting the similarity between the original CPC codes measured for each common normalized CPC code, the reference patent (A) and the target It is possible to calculate the similarity of technology properties between patents (B).

구체적인 실시예에 따르면, 기술속성유사도산출부(130)는, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 Ai 및 Bi를 근거로, 기준 특허(A)에서 정규화 CPC 코드의 동시출현 빈도수 합, 대상 특허(B)에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하여 전술의 수학식 2에 따라 계산한 결과를 분모로 사용하고, 전술의 수학식 4에 따라 계산한 원본 CPC 코드 간 유사도를 분자로 사용하여, 다음 수학식 5에 따라 기준 특허(A) 및 대상 특허(B) 간 기술 속성 유사도(Technical feature Similarity, TS(A,B))를 산출할 수 있다. According to a specific embodiment, the technology attribute similarity calculation unit 130, based on the simultaneous occurrence frequency matrices Ai and Bi for each normalized CPC code generated for each of the reference patent (A) and the target patent (B), the reference patent ( Using the sum of the simultaneous occurrence frequencies of the normalized CPC code in A) and the sum of the simultaneous occurrence frequencies of the normalized CPC code in the target patent (B), the result calculated according to Equation 2 above is used as the denominator. Using the similarity between the original CPC codes calculated according to the numerator, the technical feature similarity (TS(A,B)) between the reference patent (A) and the target patent (B) is calculated according to Equation 5 below. can do.

이에, 유사도측정부(140)는, 기술분야유사도산출부(120)에서 산출한 기준 특허(A) 및 대상 특허(B) 간 기술 분야 유사도(CS(A,B)) 및 기술속성유사도산출부(130)에서 산출한 기준 특허(A) 및 대상 특허(B) 간 기술 속성 유사도(TS(A,B))를 합산하는 다음 수학식 6에 따라, 기준 특허(A) 및 대상 특허(B) 간 유사도(Patent Similarity, PS(A,B))를 측정/산정/산출할 수 있다. Accordingly, the similarity measurement unit 140 calculates the similarity of the technology field (CS(A,B)) and the technology attribute similarity between the reference patent (A) and the target patent (B) calculated by the technology field similarity calculation unit 120 According to the following Equation 6, which sums the similarity of technology properties (TS(A,B)) between the reference patent (A) and the target patent (B) calculated in (130), the reference patent (A) and the target patent (B) It is possible to measure/calculate/calculate patent similarity (PS(A,B)).

즉, 본 발명에서는, 기준 특허 및 대상 특허 간 유사도를 산정/산출하는데 있어, CPC 코드의 동시출현 빈도수 뿐 아니라 코드 계층관계까지 고려하여, 기술분야 간 유사도(CS) 및 기술속성 유사도(TS)를 구분하여 산출한 후 합산을 통해 유사도를 산정/산출하는, 세분화/구체화된 기술 방식을 실현하고 있다. That is, in the present invention, in calculating/calculating the similarity between the reference patent and the target patent, not only the frequency of simultaneous occurrence of the CPC code, but also the code hierarchy is considered, and the similarity between technical fields (CS) and the technology attribute similarity (TS) are determined. It realizes a subdivided/concrete technical method that calculates and calculates the similarity through the summation.

더 나아가, 본 발명에서 유사특허분류부(150)는, 비교대상 특허군 내 속하는 각각의 대상 특허 별로, 유사도측정부(140)를 통해 측정된 기준 특허와의 유사도(PS)를 근거로, 비교대상 특허군에서 기준 특허에 대한 유사특허를 분류할 수 있다.Furthermore, in the present invention, the similar patent classification unit 150 compares each target patent belonging to the target patent group on the basis of the similarity (PS) with the reference patent measured through the similarity measurement unit 140. Similar patents to the reference patent can be classified in the target patent group.

예를 들면, 유사특허분류부(150)는, 비교대상 특허군 내 속하는 대상 특허 중, 기준 특허와의 유사도(PS)를 기준으로 기 설정된 상위 M%를 유사특허로 분류하거나, 또는 기준 특허와의 유사도(PS)가 기 설정된 임계 유사도 이상인 대상 특허를 유사특허로 분류할 수도 있다.For example, the similar patent classification unit 150 may classify a predetermined upper M% as similar patents based on the similarity (PS) with the reference patent among target patents belonging to the comparison target patent group, or A target patent having a similarity (PS) of greater than or equal to a preset threshold similarity may be classified as a similar patent.

이상 설명한 바와 같이, 본 발명에 의하면, 특허에 부여되는 CPC 코드의 동시출현 빈도수 뿐 CPC 코드가 갖는 특징 예컨대 코드 계층관계를 활용하여, CPC 코드를 기반으로 특허 간 유사도를 산정하는 세분화되고 구체화된 기술을 실현하는, 새로운 방식의 유사 특허 분류 방법을 구현하고 있다.As described above, according to the present invention, a subdivided and detailed technology that calculates the similarity between patents based on the CPC code by utilizing the characteristics of the CPC code, such as code hierarchy, as well as the number of simultaneous occurrences of the CPC code granted to the patent. A new method of classifying similar patents is implemented.

더 나아가, 본 발명의 유사 특허 분류방법에서는, CPC 코드가 갖는 코드 속성(inventive/additional) 특징을 활용하여, 보다 더 세분화/구체화된 CPC 코를 기반의 특허 간 유사도 산정/산출/측정을 실현할 수도 있다.Furthermore, in the similar patent classification method of the present invention, the similarity calculation/calculation/measurement between patents based on more subdivided/specific CPC codes may be realized by utilizing the code attribute (inventive/additional) features of the CPC code. have.

구체적으로 설명하면, 특허에 부여되는 CPC 코드는, 제1구분값 또는 제2구분값으로 구분될 수 있다.Specifically, the CPC code granted to the patent may be classified into a first classification value or a second classification value.

현재까지의 CPC 코드 구조에 따르면, 제1구분값으로서 inventive, 제2구분값으로서 additional을 정의할 수 있다. According to the CPC code structure up to now, it is possible to define inventive as the first segment value and additional as the second segment value.

이에, 본 발명에서 전술의 기술분야유사도산출부(120)는, 전술의 기준 특허(A) 및 대상 특허(B) 간 기술 분야 유사도(CS) 산출 시, 기 설정된 산출 옵션에 따라서 기준 특허(A) 및 대상 특허(B)에 부여된 CPC 코드들을 제1구분값(inventive) 및 제2구분값(additional) 구분 없이 이용하여 산출(CS(A,B))하거나, 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용하여 산출하거나(CS(i)(A,B)), 또는 제2구분값(additional)으로 구분되는 CPC 코드 만을 이용하여 산출하거나(CS(a)(A,B)), 또는 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용한 결과(CS(i)(A,B)) 및 제2구분값(additional)으로 구분되는 CPC 코드 만을 이용한 결과(CS(a)(A,B))에 특정 가중치를 반영하여 산출할 수 있다.Accordingly, in the present invention, the technical field similarity calculation unit 120, when calculating the technical field similarity (CS) between the above-described reference patent (A) and the target patent (B), according to a preset calculation option, the reference patent (A) ) And the CPC codes granted to the target patent (B) without distinction between the first and second classification values (CS(A,B)), or the first index value (inventive) Calculated using only the CPC code classified by (CS(i)(A,B)), or calculated using only the CPC code divided by the second additional value (CS(a)(A,B)) ), or the result using only the CPC code divided by the first index (CS(i)(A,B)) and the result using only the CPC code divided by the second additional value (CS(a) It can be calculated by reflecting a specific weight in (A,B)).

마찬가지로, 본 발명에서 전술의 기술속성유사도산출부(120)는, 전술의 기준 특허(A) 및 대상 특허(B) 간 기술 속성 유사도(TS) 산출 시, 기 설정된 산출 옵션에 따라서 기준 특허(A) 및 대상 특허(B)에 부여된 CPC 코드들을 제1구분값(inventive) 및 제2구분값(additional) 구분 없이 이용하여 산출(TS(A,B))하거나, 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용하여 산출하거나(TS(i)(A,B)), 또는 제2구분값(additional)으로 구분되는 CPC 코드 만을 이용하여 산출하거나(TS(a)(A,B)), 또는 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용한 결과(TS(i)(A,B)) 및 제2구분값(additional)으로 구분되는 CPC 코드 만을 이용한 결과(TS(a)(A,B))에 특정 가중치를 반영하여 산출할 수 있다.Similarly, in the present invention, when calculating the technology attribute similarity (TS) between the reference patent (A) and the target patent (B), the technology attribute similarity calculation unit 120 ) And the CPC codes granted to the target patent (B) without distinction between the first and second classification values (TS(A,B)), or the first index value (inventive) Calculated using only the CPC code classified by (TS(i)(A,B)), or calculated using only the CPC code divided by the second additional value (TS(a)(A,B)) ), or the result using only the CPC code divided by the first index (TS(i)(A,B)) and the result using only the CPC code divided by the second additional value (TS(a) It can be calculated by reflecting a specific weight in (A,B)).

이렇게 되면, 유사특허분류부(150)에서 전술의 수학식 6에 따라 측정/산정/산출하는 기준 특허(A) 및 대상 특허(B) 간 유사도는 다음과 같이 정의할 수 있다. In this case, the similarity between the reference patent (A) and the target patent (B) measured/calculated/calculated by the similar patent classifying unit 150 according to Equation 6 described above may be defined as follows.

·제1구분값(inventive) 및 제2구분값(additional) 이용 : PS(A,B) = CS(A,B) + TS(A,B)Use of the first and second divisional values: PS(A,B) = CS(A,B) + TS(A,B)

·제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용 : PS(i)(A,B) = CS(i)(A,B) + TS(i)(A,B)Use only the CPC code classified by the first inventive: PS(i)(A,B) = CS(i)(A,B) + TS(i)(A,B)

·제2구분값(additional)으로 구분되는 CPC 코드 만을 이용 : PS(a)(A,B) = CS(a)(A,B) + TS(a)(A,B)Use only CPC codes classified by the second additional value: PS(a)(A,B) = CS(a)(A,B) + TS(a)(A,B)

· 제1구분값(inventive) 및 제2구분값(additional) 이용 및 가중치 반영 : PS(A,B) = α(CS(i)(A,B) + TS(i)(A,B)) + β(CS(a)(A,B) + TS(a)(A,B))· Use of the first and second divisional values and reflect weights: PS(A,B) = α(CS(i)(A,B) + TS(i)(A,B)) + β(CS(a)(A,B) + TS(a)(A,B))

여기서, 각 가중치 α, β는 유사특허를 분류하고자 하는 운영자 또는 이용자에 의해 정해질 수도 있고, 기준 특허 및 비교대상 특허군 선정 시 운영자 또는 이용자에 의한 검색을 분석하여 기 정의된 분석정책에 따라 정해질 수도 있고, 유사특허로서 분류된 특허의 개수 또는 유사특허를 제공하는 방식 및 제공된 유사특허를 출력(디스플레이)하는 디바이스의 사양 등 변경/설정할 수 있는 다양한 기준에 따라 정해질 수도 있다.Here, each weight α, β may be determined by the operator or user who wants to classify similar patents, and when selecting the reference patent and the target patent group to be compared, the search by the operator or user is analyzed and determined according to a predefined analysis policy. It may be determined according to various criteria that can be changed/set, such as the number of patents classified as similar patents, a method of providing similar patents, and specifications of a device that outputs (displays) the provided similar patents.

일 실시예를 설명하면, 유사특허분류부(150)는, 비교대상 특허군 내 속하는 대상 특허에서 분류된 유사특허를 제공하여, 금번 유사특허 분류를 시도한 운영자 또는 이용자의 디바이스에 출력(디스플레이)할 수 있다.Explaining an embodiment, the similar patent classification unit 150 provides similar patents classified from target patents belonging to the target patent group to be compared, and outputs (displays) them to the device of an operator or user who attempted to classify similar patents this time. I can.

이때, 유사특허분류부(150)는, 기준 특허에 대한 유사특허 제공 시, 제2구분값(additional) 대비 중요도가 높은 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용한 결과에 2구분값(additional)으로 구분되는 CPC 코드 만을 이용한 결과에 반영하는 가중치(β) 보다 큰 가중치(α, α > β)를 반영하여 산출한 유사도의 크기 순서를 제공 우선순위로 하여 제공할 수 있다.At this time, the similar patent classifying unit 150, when providing a similar patent to the reference patent, uses only the CPC code that is classified as a first classification value (inventive) that has a higher importance than the second classification value (additional). The order of the magnitude of the similarity calculated by reflecting a weight (α, α> β) larger than the weight (β) reflected in the result using only the CPC code classified as (additional) can be provided as a provision priority.

이렇게 되면, 운영자 또는 이용자의 디바이스에는, 유사특허들이 2구분값(additional) 대비 중요도가 높은 제1구분값(inventive)에 보다 큰 비중을 적용하여 산출한 유사도 크기 순서로 랭킹된 유사특허들을 제공받을 수 있게 된다.In this case, the operator or user's device will receive similar patents ranked in the order of similarity size calculated by applying a greater weight to the first classification value (inventive), which has higher importance compared to the second divisional value (additional). You will be able to.

전술한 실시예 외에도, 본 발명에서는, 기술분야 간 유사도(CS) 및 기술속성 유사도(TS)를 구분하여 산출한 후 합산하는 세분화/구체화된 방식으로 유사도를 산정/산출/측정하기 때문에, 각 가중치 α, β를 설정하는 방식을 유연하게 변경함으로써 유사특허 분류에 있어서 다양한 요구조건 및 환경을 동적으로 반영하여 방향성 있는 분류 결과를 얻을 수 있다.In addition to the above-described embodiments, in the present invention, since similarity is calculated/calculated/measured in a subdivided/specific manner in which the similarity between technical fields (CS) and technology attribute similarity (TS) are separately calculated and then summed, each weight By flexibly changing the method of setting α and β, it is possible to obtain directional classification results by dynamically reflecting various requirements and environments in classifying similar patents.

이상 설명한 바와 같이, 본 발명에 의하면, 특허에 부여되는 CPC 코드의 동시출현 빈도수 뿐 CPC 코드가 갖는 특징 예컨대 코드 계층관계, 코드 속성(inventive/additional) 등을 활용하여, CPC 코드를 기반으로 특허 간 유사도를 산정하는 세분화되고 구체화된 기술을 실현하는, 새로운 방식의 유사 특허 분류 방법을 구현하고 있다.As described above, according to the present invention, by utilizing the characteristics of the CPC code as well as the number of simultaneous occurrences of the CPC code granted to the patent, such as code hierarchy, code attributes (inventive/additional), etc., between patents based on the CPC code. It implements a new method of classifying similar patents, realizing a subdivided and detailed technology that calculates the degree of similarity.

아울러, 본 발명에서 제시하는 새로운 방식의 유사 특허 분류 방법은, 다른 특허 서지정보를 포함한 다양한 키워드 검색 방법과 결합하여 선행기술조사, 특허침해분쟁, 기술동향분석, 기술기회탐색 등 지적재산권 관련 비즈니스 인텔리전스의 공통 기초가 되는 핵심 정보처리 프로세스로서, 기존 국내외 특허정보서비스와 쉽게 연계하여 현재 제공하고 있는 데이터의 품질개선과 새로운 심층 분석 기능 제공에 적용될 수 있는 효과까지 기대할 수 있다.In addition, the new method of classifying similar patents presented in the present invention is combined with various keyword search methods including other patent bibliographic information to provide business intelligence related to intellectual property rights such as prior art search, patent infringement dispute, technology trend analysis, and technology opportunity search. As a core information processing process that is a common basis for the company, it can be easily linked with existing domestic and overseas patent information services to improve the quality of the currently provided data and can expect an effect that can be applied to provide new in-depth analysis functions.

이하에서는, 도 2를 참조하여 본 발명의 실시예에 따른 유사 특허 분류 방법의 동작 흐름을 구체적으로 설명하겠다.Hereinafter, an operation flow of the similar patent classification method according to an embodiment of the present invention will be described in detail with reference to FIG. 2.

먼저, 설명의 편의를 위해, 본 발명의 유사 특허 분류 방법이 수행되는 주체로서 유사 특허 분류 시스템(100)을 언급하여 설명하겠다.First, for convenience of explanation, the similar patent classification system 100 will be referred to as a subject in which the similar patent classification method of the present invention is performed.

본 발명의 유사 특허 분류 방법에 따르면 유사 특허 분류 시스템(100)은, 유사특허를 분류하는데 기준이 될 기준 특허와, 기준 특허에 대한 비교대상 특허군을 선정할 수 있다(S10).According to the similar patent classification method of the present invention, the similar patent classification system 100 may select a reference patent to be used as a reference for classifying similar patents and a group of patents to be compared with respect to the reference patent (S10).

한편, 유사 특허 분류 시스템(100)은, 기준 특허의 핵심 제품/기술을 식별하고 이와 유사/동일한 제품/기술을 핵심 제품/기술로 포함하는 특허들을 그룹핑하여 비교대상 특허군으로 선정할 수 있다.Meanwhile, the similar patent classification system 100 may identify a core product/technology of a reference patent, group patents including similar/same product/technology as core products/technology, and select a target patent group for comparison.

아울러, 유사 특허 분류 시스템(100)은, SAO 검색을 통해, 기준 특허의 핵심 제품/기술의 분야(수식어(motion) + 표제어(sensor))를 동의어 범위까지 확장하여, 비교대상 특허군으로 선정할 수도 있다.In addition, the similar patent classification system 100 expands the field of the core product/technology (motion + sensor) of the reference patent to the synonym range through SAO search, and selects it as a target patent group for comparison. May be.

또 다른 실시예를 설명하면, 유사 특허 분류 시스템(100)은, 전술과 같이 1차로 선정한 비교대상 특허군을 그대로 사용하지 않고, 추가적인 필터링을 통해 더욱 정교하게 비교대상 특허군으로 선정할 수 있다.In another embodiment, the similar patent classification system 100 may not use the firstly selected patent group to be compared as described above, but may more elaborately select the comparison target patent group through additional filtering.

예를 들면, 유사 특허 분류 시스템(100)은, 전술과 같이 1차로 선정한 비교대상 특허군에 대하여, 기준 특허의 CPC 코드를 최소 N개 이상 포함하는 특허들만 필터링하여 비교대상 특허군으로 선정할 수 있다.For example, the similar patent classification system 100 may filter only patents including at least N CPC codes of the reference patent with respect to the firstly selected comparison target patent group as described above to select a comparison target patent group. have.

그리고, 본 발명의 유사 특허 분류 방법에 따르면 유사 특허 분류 시스템(100)은, 기준 특허의 CPC 코드 및 대상 특허의 CPC 코드로부터 정규화한 CPC 세트를 생성하여, 상기 생성한 CPC 세트를 근거로 상기 기준 특허 및 상기 대상 특허 간 기술 분야 유사도를 산출한다(S20).And, according to the similar patent classification method of the present invention, the similar patent classification system 100 generates a normalized CPC set from the CPC code of the reference patent and the CPC code of the target patent, and based on the generated CPC set, the standard The degree of similarity in the technical field between the patent and the target patent is calculated (S20).

이때, 대상 특허는, 기준 특허에 대해 선정한 비교대상 특허군에 속해 있는 각 특허를 의미할 수 있다.In this case, the target patent may mean each patent belonging to the group of target patents to be compared selected for the reference patent.

구체적으로 설명하면, 유사 특허 분류 시스템(100)은, CPC 스키마(schema)를 근거로, 기준 특허의 각 CPC 코드 및 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화한다.Specifically, the similar patent classification system 100 normalizes each CPC code of a reference patent and each CPC code of a target patent to a main-group level based on a CPC schema.

유사 특허 분류 시스템(100)은, 위 설명한 CPC 코드의 스키마/계층구조를 근거로, 기준 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화하고, 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화할 수 있다.The similar patent classification system 100 normalizes each CPC code of a reference patent to a main-group level based on the schema/hierarchy of the CPC code described above, and converts each CPC code of the target patent into a main group. Can be normalized to the (main-group) level.

그리고, 유사 특허 분류 시스템(100)은, 기준 특허 및 대상 특허 각각에 대하여, 기준 특허 및 대상 특허에서 정규화한 정규화 CPC 코드 별로 특허 문서 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 상기 CPC 세트로서 생성한다.And, the similar patent classification system 100, for each of the reference patent and the target patent, for each of the normalized CPC codes normalized in the reference patent and the target patent, a simultaneous appearance frequency matrix indicating the number of simultaneous appearances in the patent document as the CPC set. Generate.

구체적으로 설명하면, 유사 특허 분류 시스템(100)은, 기준 특허에서 정규화한 정규화 CPC 코드 별로, 특허 문서 즉 기준 특허 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 기준 특허의 CPC 세트로서 생성할 수 있다.Specifically, the similar patent classification system 100 may generate, for each normalized CPC code normalized in the reference patent, a simultaneous appearance frequency matrix indicating the number of simultaneous appearances in the patent document, that is, the reference patent, as a CPC set of the reference patent. have.

마찬가지로, 유사 특허 분류 시스템(100)은, 대상 특허에서 정규화한 정규화 CPC 코드 별로, 특허 문서 즉 대상 특허 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 대상 특허의 CPC 세트로서 생성할 수 있다.Similarly, the similar patent classification system 100 may generate a patent document, that is, a simultaneous appearance frequency matrix indicating the number of simultaneous appearances in the target patent, as a CPC set of the target patent, for each normalized CPC code normalized in the target patent.

그리고, 유사 특허 분류 시스템(100)은, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 즉 Ai 및 Bi를 근거로, 기준 특허(A) 및 대상 특허(B) 간 기술 분야 유사도를 산출할 수 있다.In addition, the similar patent classification system 100 is based on the reference patent (A) and the target patent (A) and the target patent (B) based on the simultaneous occurrence frequency matrix for each normalized CPC code, i.e. Ai and Bi. The degree of similarity in the technical field between patents (B) can be calculated.

보다 구체적으로 설명하면, 유사 특허 분류 시스템(100)은, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 즉 Ai 및 Bi를 근거로, 정규화 CPC 코드 별로 기준 특허(A) 및 대상 특허(B)에서의 각 동시출현 빈도수를 곱한 값의 총 합을, 전술의 수학식 1에 따라 계산할 수 있다.More specifically, the similar patent classification system 100 is based on the co-occurrence frequency matrix for each normalized CPC code generated for each of the reference patent (A) and the target patent (B), i.e., Ai and Bi, the normalized CPC code. For each, the sum of the values obtained by multiplying the number of simultaneous appearances in the reference patent (A) and the target patent (B) may be calculated according to Equation 1 above.

그리고, 유사 특허 분류 시스템(100)은, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 즉 Ai 및 Bi를 근거로, 기준 특허(A)에서 정규화 CPC 코드의 동시출현 빈도수 합, 대상 특허(B)에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하여, 전술의 수학식 2에 따라 계산할 수 있다.In addition, the similar patent classification system 100 normalizes in the reference patent (A) based on the co-occurrence frequency matrix for each normalized CPC code, i.e. Ai and Bi, generated for each of the reference patent (A) and the target patent (B). It can be calculated according to Equation 2 above by using the sum of the number of simultaneous occurrences of the CPC code and the sum of the number of simultaneous occurrences of the normalized CPC code in the target patent (B).

이에, 유사 특허 분류 시스템(100)은, 전술의 수학식 1 및 수학식 2에서 계산한 결과를 분자/분모로 사용하여, 전술의 수학식 3에 따라 기준 특허(A) 및 대상 특허(B) 간 기술 분야 유사도(Classification Similarity, CS(A,B))를 산출할 수 있다.Accordingly, the similar patent classification system 100 uses the result calculated in Equation 1 and Equation 2 as a numerator/denominator, and uses the reference patent (A) and the target patent (B) according to Equation 3 above. Classification Similarity (CS(A,B)) can be calculated.

한편, 본 발명의 유사 특허 분류 방법에 따르면 유사 특허 분류 시스템(100)은, 기준 특허의 CPC 코드 및 대상 특허의 CPC 코드로부터 공통 정규화 CPC 코드를 식별하고 공통 정규화 CPC 코드로부터 기준 특허의 CPC 코드까지의 코드 구간 및 대상 특허의 CPC 코드까지의 코드 구간을 반영하여, 기준 특허 및 대상 특허 간 기술 속성 유사도를 산출할 수 있다(S30)Meanwhile, according to the similar patent classification method of the present invention, the similar patent classification system 100 identifies the common normalized CPC code from the CPC code of the reference patent and the CPC code of the target patent, and from the common normalized CPC code to the CPC code of the reference patent. By reflecting the code section up to the code section of the target patent and the CPC code of the target patent, the similarity of the technology properties between the reference patent and the target patent can be calculated (S30).

구체적으로 설명하면 유사 특허 분류 시스템(100)은,Specifically, the similar patent classification system 100,

CPC 스키마(schema)를 근거로, 기준 특허의 각 CPC 코드 및 대상 특허의 각 CPC 코드를 메인 그룹(main-group) 수준으로 정규화한다.Based on the CPC schema, each CPC code of the reference patent and each CPC code of the target patent are normalized to the main-group level.

그리고, 유사 특허 분류 시스템(100)은, 기준 특허(A) 및 대상 특허(B) 각각에 대하여, 정규화한 정규화 CPC 코드 별로 원본 CPC 코드 및 정규화 CPC 코드 사이에 존재하는 상위 구간의 코드로 구성되는 코드구간집합을 식별한다.And, the similar patent classification system 100, for each of the reference patent (A) and the target patent (B), is composed of a code of an upper section existing between the original CPC code and the normalized CPC code for each normalized CPC code. Identify a set of code segments.

구체적으로 설명하면, 유사 특허 분류 시스템(100)은, 기준 특허(A)에서 정규화한 정규화 CPC 코드 별로, 정규화한 정규화 CPC 코드 별로 원본 CPC 코드 및 정규화 CPC 코드 사이에 존재하는 상위 구간의 코드로 구성되는 코드구간집합을 식별한다.Specifically, the similar patent classification system 100 is composed of a code of an upper section existing between the original CPC code and the normalized CPC code for each normalized CPC code normalized in the reference patent (A), and for each normalized normalized CPC code. Identifies the set of code segments to be used

이 경우, 유사 특허 분류 시스템(100)은, 기준 특허(A)에서 전술한 1개의 정규화 CPC 코드에 대해, 3개의 서로 다른 원본 CPC 코드 별로 원본 CPC 코드와 정규화 CPC 코드 사이에 존재하는 원본 CPC 코드 관점 상위 구간의 코드 즉 서브 그룹(sub-group) 및 그 이하의 코드들을 모두 전술한 1개의 정규화 CPC 코드에 대한 코드구간집합으로 식별할 수 있다.In this case, the similar patent classification system 100 is an original CPC code that exists between the original CPC code and the normalized CPC code for each of three different original CPC codes for the one normalized CPC code described above in the reference patent (A). The code of the viewpoint upper section, that is, the sub-group and the codes below it, can be identified as a code section set for one of the above-described normalized CPC codes.

이와 같은 방식으로, 유사 특허 분류 시스템(100)은, 기준 특허(A)에서 정규화한 정규화 CPC 코드 별로, 코드구간집합으로 식별할 수 있다.In this way, the similar patent classification system 100 can identify each normalized CPC code normalized in the reference patent (A) as a code segment set.

마찬가지로, 유사 특허 분류 시스템(100)은, 대상 특허(B)에서 정규화한 정규화 CPC 코드 별로, 코드구간집합으로 식별할 수 있다.Similarly, the similar patent classification system 100 may identify each normalized CPC code normalized in the target patent B as a code segment set.

그리고, 유사 특허 분류 시스템(100)은, 기준 특허(A) 및 대상 특허(B) 간 공통된 정규화 CPC 코드 별로, 정규화 CPC 코드를 기준으로 기준 특허(A)에 대해 식별한 코드구간집합 및 대상 특허(B)에 대해 식별한 코드구간집합을 이용하여 원본 CPC 코드 간 유사도를 측정할 수 있다.And, the similar patent classification system 100, for each normalized CPC code common between the reference patent (A) and the target patent (B), a set of code segments and target patents identified for the reference patent (A) based on the normalized CPC code. The similarity between the original CPC codes can be measured using the set of code segments identified for (B).

구체적으로 유사 특허 분류 시스템(100)은, 기준 특허(A) 및 대상 특허(B) 간 공통된 정규화 CPC 코드 별로, 기준 특허(A)의 원본 CPC 코드 및 정규화 CPC 코드 사이에서 식별한 코드구간집합 및 대상 특허(B)의 원본 CPC 코드 및 정규화 CPC 코드 사이에서 식별한 코드구간집합에 대한 교집합 내 코드 개수를 분자로 사용하고 그 합집합 내 코드 개수를 분모로 사용하는, 전술의 수학식 4에 따라 기준 특허(A) 및 대상 특허(B) 사이의 원본 CPC 코드 간 유사도를 측정할 수 있다.Specifically, the similar patent classification system 100 includes a set of code segments identified between the original CPC code and the normalized CPC code of the reference patent (A), for each normalized CPC code common between the reference patent (A) and the target patent (B). Criteria according to Equation 4 above, in which the number of codes in the intersection for the set of code segments identified between the original CPC code and the normalized CPC code of the target patent (B) is used as a numerator and the number of codes in the union is used as a denominator. Similarity between original CPC codes between patent (A) and target patent (B) can be measured.

보다 구체적으로 설명하면, 유사 특허 분류 시스템(100)은, 기준 특허(A) 및 대상 특허(B) 각각에 대하여, 기준 특허(A) 및 대상 특허(B)에서 정규화한 정규화 CPC 코드 별로 특허 문서 내 동시 출현한 빈도수를 나타내는 동시출현 빈도수 행렬을 생성할 수 있다. More specifically, the similar patent classification system 100, for each of the reference patent (A) and the target patent (B), is a patent document for each normalized CPC code normalized in the reference patent (A) and the target patent (B). You can create a co-occurrence frequency matrix that shows the frequencies that appear simultaneously within yours.

이에, 유사 특허 분류 시스템(100)은, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 Ai 및 Bi를 근거로, 정규화 CPC 코드 별로 기준 특허(A) 및 대상 특허(B)에서의 각 동시출현 빈도수를 곱한 값의 총 합(전술의 수학식 1)에, 정규화 CPC 코드 별로 기준 특허(A)의 코드구간집합 및 대상 특허(B)의 코드구간집합에 대한 교집합 내 코드 개수(분자)/그 합집합 내 코드 개수(분모)를 반영하는, 전술의 수학식 4에 따라 기준 특허(A) 및 대상 특허(B) 사이의 원본 CPC 코드 간 유사도를 측정할 수 있다.Accordingly, the similar patent classification system 100 is based on the co-occurrence frequency matrices Ai and Bi for each normalized CPC code generated for each of the reference patent (A) and the target patent (B), based on the reference patent (A) for each normalized CPC code. ) And the sum of the multiplied by each concurrent occurrence frequency in the target patent (B) (Equation 1 above), the code segment set of the reference patent (A) and the code segment of the target patent (B) for each normalized CPC code Measure the similarity between the original CPC code between the reference patent (A) and the target patent (B) according to Equation 4 above, reflecting the number of codes in the intersection (molecule)/the number of codes in the union (denominator) for a set can do.

이에, 유사 특허 분류 시스템(100)은, 전술의 수학식 4를 통해 공통된 정규화 CPC 코드 별로 측정한 원본 CPC 코드 간 유사도를 반영하여, 기준 특허(A) 및 대상 특허(B) 간 기술 속성 유사도를 산출할 수 있다. Accordingly, the similar patent classification system 100 reflects the similarity between the original CPC codes measured for each common normalized CPC code through Equation 4 above, and calculates the similarity of the technology properties between the reference patent (A) and the target patent (B). Can be calculated.

구체적으로, 유사 특허 분류 시스템(100)은, 기준 특허(A) 및 대상 특허(B) 각각에 대하여 생성한 정규화 CPC 코드 별 동시출현 빈도수 행렬 Ai 및 Bi를 근거로, 기준 특허(A)에서 정규화 CPC 코드의 동시출현 빈도수 합, 대상 특허(B)에서 정규화 CPC 코드의 동시출현 빈도수 합을 이용하여 전술의 수학식 2에 따라 계산한 결과를 분모로 사용하고, 전술의 수학식 4에 따라 계산한 원본 CPC 코드 간 유사도를 분자로 사용하여, 전술의 수학식 5에 따라 기준 특허(A) 및 대상 특허(B) 간 기술 속성 유사도(Technical feature Similarity, TS(A,B))를 산출할 수 있다. Specifically, the similar patent classification system 100 normalizes in the reference patent (A) based on the co-occurrence frequency matrices Ai and Bi for each normalized CPC code generated for each of the reference patent (A) and the target patent (B). Using the sum of the number of simultaneous occurrences of the CPC code and the sum of the number of simultaneous occurrences of the normalized CPC code in the target patent (B), the result calculated according to Equation 2 above is used as the denominator, and calculated according to Equation 4 above. Using the similarity between the original CPC codes as a numerator, it is possible to calculate the technical feature similarity (TS(A,B)) between the reference patent (A) and the target patent (B) according to Equation 5 above. .

이에, 본 발명의 유사 특허 분류 방법에 따르면 유사 특허 분류 시스템(100)은, S20단계에서 산출한 기준 특허(A) 및 대상 특허(B) 간 기술 분야 유사도(CS(A,B)) 및 S30단계에서 산출한 기준 특허(A) 및 대상 특허(B) 간 기술 속성 유사도(TS(A,B))를 합산하는 전술의 수학식 6에 따라, 기준 특허(A) 및 대상 특허(B) 간 유사도(Patent Similarity, PS(A,B))를 측정/산정/산출할 수 있다(S40). Accordingly, according to the similar patent classification method of the present invention, the similar patent classification system 100 includes the similarity of the technical field (CS(A,B)) and S30 between the reference patent (A) and the target patent (B) calculated in step S20. According to Equation 6 above, which sums the similarity of technology properties (TS(A,B)) between the reference patent (A) and the target patent (B) calculated in the step, between the reference patent (A) and the target patent (B) Similarity (Patent Similarity, PS(A,B)) can be measured/calculated/calculated (S40).

본 발명의 유사 특허 분류 방법에 따르면 유사 특허 분류 시스템(100)은, 비교대상 특허군 내 속하는 각각의 대상 특허 별로, S40단계를 통해 측정된 기준 특허와의 유사도(PS)를 근거로, 비교대상 특허군에서 기준 특허에 대한 유사특허를 분류할 수 있다(S50).According to the similar patent classification method of the present invention, the similar patent classification system 100 is subject to comparison based on the similarity (PS) with the reference patent measured through step S40 for each target patent belonging to the comparison target patent group. Similar patents to the reference patent can be classified in the patent group (S50).

예를 들면, 유사 특허 분류 시스템(100)은, 비교대상 특허군 내 속하는 대상 특허 중, 기준 특허와의 유사도(PS)를 기준으로 기 설정된 상위 M%를 유사특허로 분류하거나, 또는 기준 특허와의 유사도(PS)가 기 설정된 임계 유사도 이상인 대상 특허를 유사특허로 분류할 수도 있다.For example, the similar patent classification system 100 classifies a predetermined upper M% of target patents belonging to the comparison target patent group based on the similarity (PS) with the reference patent as similar patents, or A target patent having a similarity (PS) of greater than or equal to a preset threshold similarity may be classified as a similar patent.

이에, 본 발명의 유사 특허 분류 방법에 따르면 유사 특허 분류 시스템(100)은, 전술의 기준 특허(A) 및 대상 특허(B) 간 기술 분야 유사도(CS) 산출 과정(S20)에서, 기 설정된 산출 옵션에 따라서 기준 특허(A) 및 대상 특허(B)에 부여된 CPC 코드들을 제1구분값(inventive) 및 제2구분값(additional) 구분 없이 이용하여 산출(CS(A,B))하거나, 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용하여 산출하거나(CS(i)(A,B)), 또는 제2구분값(additional)으로 구분되는 CPC 코드 만을 이용하여 산출하거나(CS(a)(A,B)), 또는 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용한 결과(CS(i)(A,B)) 및 제2구분값(additional)으로 구분되는 CPC 코드 만을 이용한 결과(CS(a)(A,B))에 특정 가중치를 반영하여 산출할 수 있다.Accordingly, according to the similar patent classification method of the present invention, the similar patent classification system 100 calculates a preset calculation in the process (S20) of calculating the similarity of the technical field (CS) between the reference patent (A) and the target patent (B) described above. Depending on the option, the CPC codes granted to the reference patent (A) and the target patent (B) are calculated (CS(A,B)) using the first and second classification values without distinction, or Calculated using only the CPC code classified by the first index (CS(i)(A,B)), or calculated using only the CPC code classified by the second additional value (CS( a)(A,B)), or the result of using only the CPC code divided by the first inventive value (CS(i)(A,B)) and only the CPC code divided by the second divisional value (additional) It can be calculated by reflecting a specific weight on the used result (CS(a)(A,B)).

마찬가지로, 본 발명의 유사 특허 분류 방법에 따르면 유사 특허 분류 시스템(100)은, 전술의 기준 특허(A) 및 대상 특허(B) 간 기술 속성 유사도(TS) 산출 과정(S30)에서, 기 설정된 산출 옵션에 따라서 기준 특허(A) 및 대상 특허(B)에 부여된 CPC 코드들을 제1구분값(inventive) 및 제2구분값(additional) 구분 없이 이용하여 산출(TS(A,B))하거나, 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용하여 산출하거나(TS(i)(A,B)), 또는 제2구분값(additional)으로 구분되는 CPC 코드 만을 이용하여 산출하거나(TS(a)(A,B)), 또는 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용한 결과(TS(i)(A,B)) 및 제2구분값(additional)으로 구분되는 CPC 코드 만을 이용한 결과(TS(a)(A,B))에 특정 가중치를 반영하여 산출할 수 있다.Likewise, according to the similar patent classification method of the present invention, the similar patent classification system 100, in the process of calculating the similarity of the technology attribute (TS) between the reference patent (A) and the target patent (B) described above (S30), a preset calculation Depending on the option, the CPC codes granted to the reference patent (A) and the target patent (B) are calculated (TS(A,B)) using the first and second classification values without distinction, or Calculated using only the CPC code classified by the first index (TS(i)(A,B)), or calculated using only the CPC code classified by the second additional value (TS( a)(A,B)), or the result of using only the CPC code divided by the first inventive value (TS(i)(A,B)) and only the CPC code divided by the second divisional value (additional) It can be calculated by reflecting a specific weight on the used result (TS(a)(A,B)).

이렇게 되면, 유사 특허 분류 시스템(100)은, S40단계에서 전술의 수학식 6에 따라 측정/산정/산출하는 기준 특허(A) 및 대상 특허(B) 간 유사도는 다음과 같이 정의할 수 있다. In this case, the similar patent classification system 100 may define the degree of similarity between the reference patent (A) and the target patent (B) measured/calculated/calculated according to Equation 6 above in step S40 as follows.

일 실시예를 설명하면, 유사 특허 분류 시스템(100)은, 비교대상 특허군 내 속하는 대상 특허에서 분류된 유사특허를 제공하여, 금번 유사특허 분류를 시도한 운영자 또는 이용자의 디바이스에 출력(디스플레이)할 수 있다.Explaining an embodiment, the similar patent classification system 100 provides similar patents classified from target patents belonging to the target patent group to be compared, and outputs (displays) them to the device of an operator or user who attempted to classify similar patents this time. I can.

이때, 유사 특허 분류 시스템(100)은, 기준 특허에 대한 유사특허 제공 시, 제2구분값(additional) 대비 중요도가 높은 제1구분값(inventive)으로 구분되는 CPC 코드 만을 이용한 결과에 2구분값(additional)으로 구분되는 CPC 코드 만을 이용한 결과에 반영하는 가중치(β) 보다 큰 가중치(α, α > β)를 반영하여 산출한 유사도의 크기 순서를 제공 우선순위로 하여 제공할 수 있다.At this time, the similar patent classification system 100, when providing a similar patent for a reference patent, uses only the CPC code that is classified as a first classification value (inventive) that has a higher importance compared to the second classification value (additional). The order of the magnitude of the similarity calculated by reflecting a weight (α, α> β) larger than the weight (β) reflected in the result using only the CPC code classified as (additional) can be provided as a provision priority.

전술한 실시예 외에도, 본 발명에서는, 기술분야 간 유사도(CS) 및 기술속성 유사도(TS)를 구분하여 산출한 후 합산하는 세분화/구체화된 방식으로 유사도를 산정/산출/측정하기 때문에, 각 가중치 α, β를 설정하는 방식을 유연하게 변경함으로써 유사특허 분류에 있어서 다양한 요구조건 및 환경을 동적으로 반영하여 방향성 있는 분류 결과를 얻을 수 있다. In addition to the above-described embodiments, in the present invention, since similarity is calculated/calculated/measured in a subdivided/specific manner in which the similarity between technical fields (CS) and technology attribute similarity (TS) are separately calculated and then summed, each weight By flexibly changing the method of setting α and β, it is possible to obtain directional classification results by dynamically reflecting various requirements and environments in classifying similar patents.

위 설명한 본 발명의 일 실시예에 따른 유사 특허 분류 방법(기술)은, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The similar patent classification method (technology) according to an embodiment of the present invention described above may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of the program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

지금까지 본 발명을 바람직한 실시 예를 참조하여 상세히 설명하였지만, 본 발명이 상기한 실시 예에 한정되는 것은 아니며, 이하의 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 또는 수정이 가능한 범위까지 본 발명의 기술적 사상이 미친다 할 것이다.Until now, the present invention has been described in detail with reference to preferred embodiments, but the present invention is not limited to the above-described embodiments, and the technical field to which the present invention belongs without departing from the gist of the present invention claimed in the following claims. Anyone of ordinary skill in the art will say that the technical idea of the present invention extends to the range in which various modifications or modifications are possible.

본 발명의 유사 특허 분류 방법(기술)에 따르면, CPC 코드를 기반으로 특허 간 유사도를 산정하는 세분화되고 구체화된 새로운 방식의 기술을 실현해내는 점에서, 기존 기술의 한계를 뛰어 넘음에 따라 관련 기술에 대한 이용만이 아닌 적용되는 장치의 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있는 발명이다.According to the similar patent classification method (technology) of the present invention, in terms of realizing a new method of subdivided and detailed technology that calculates the similarity between patents based on the CPC code, related technologies are beyond the limitations of existing technologies. It is an invention that has industrial applicability because the possibility of commercialization or business of the applied device is sufficient as well as the degree to be practically obvious.

Claims

A technology field similarity diagram that generates a CPC set normalized from the CPC (Cooperative Patent Classification) code of the reference patent and the CPC code of the target patent, and calculates the similarity of the technology field between the reference patent and the target patent based on the generated CPC set. Calculation step;
By identifying a common normalized CPC code from the CPC of the reference patent and the CPC code of the target patent, and reflecting the code segment from the common normalized CPC code to the CPC code of the reference patent and the code segment from the CPC code of the target patent A technology attribute similarity calculation step of calculating a technology attribute similarity between the reference patent and the target patent; And
And a similarity measurement step of measuring a similarity between the reference patent and the target patent based on the calculated similarity of the technical field and the similarity of the technology attribute;
The technical field similarity calculation step,
Based on the CPC schema, each CPC code of the reference patent and each CPC code of the target patent are normalized to a main-group level,
For each of the reference patent and the target patent, a co-occurrence frequency matrix indicating the number of simultaneous appearances in the patent document for each of the normalized CPC codes in the reference patent and the target patent is generated as the CPC set,
A method for classifying similar patents, characterized in that, based on a matrix of concurrent occurrence frequency for each normalized CPC code generated for each of the reference patent and the target patent, a degree of similarity in a technology field between the reference patent and the target patent is calculated.

The method of claim 1,
In relation to the reference patent, further comprising a selection step of selecting a target patent group for comparison;
The target patent is a similar patent classification method, characterized in that each patent belonging to the comparison target patent group.

delete

The method of claim 1,
The technical field similarity calculation step,
Based on the co-occurrence frequency matrix for each normalized CPC code generated for each of the reference patent and the target patent, the sum of the values obtained by multiplying the co-occurrence frequencies of the reference patent and the target patent for each normalized CPC code, and the criterion A similar patent classification method, characterized in that, using the sum of the simultaneous occurrence frequencies of the normalized CPC codes in the patent and the sum of the simultaneous occurrence frequencies of the normalized CPC codes in the target patent, the similarity of the technical field between the reference patent and the target patent is calculated.

The method of claim 1,
The step of calculating the technology attribute similarity,
Based on the CPC schema, each CPC code of the reference patent and each CPC code of the target patent are normalized to a main-group level,
For each of the reference patent and the target patent, identify a code segment set consisting of a code of an upper segment existing between the original CPC code and the normalized CPC code for each of the normalized CPC codes,
For each normalized CPC code common between the reference patent and the target patent, the similarity between the original CPC codes is determined by using the code segment set identified for the reference patent based on the normalized CPC code and the code segment set identified for the target patent. Measure,
A similar patent classification method, comprising calculating a similarity of a technology attribute between the reference patent and the target patent by reflecting the similarity between the original CPC codes measured for each of the common normalized CPC codes.

The method of claim 5,
The similarity between each of the above original CPC codes is,
For each of the common normalized CPC codes, the number of codes in the intersection of the code segment set identified between the original CPC code and the normalized CPC code of the reference patent and the code segment set identified between the original CPC code and the normalized CPC code of the target patent And the number of codes in the union, the similarity is measured as the number of codes in the intersection increases.

The method of claim 5,
The step of calculating the technology attribute similarity,
For each of the reference patent and the target patent, a co-occurrence frequency matrix representing the frequency of simultaneous appearances in the patent document for each of the normalized CPC codes normalized in the reference patent and the target patent is generated,
Based on the simultaneous occurrence frequency matrix for each normalized CPC code generated for each of the reference patent and the target patent, the sum of the number of simultaneous occurrences of the normalized CPC code in the reference patent, and the sum of the number of simultaneous occurrences of the normalized CPC code in the target patent And calculating the similarity of technology properties between the reference patent and the target patent by reflecting the similarity between the original CPC codes measured for each of the common normalized CPC codes.

The method of claim 2,
A classification step of classifying similar patents to the reference patent in the comparison target patent group based on the similarity to the reference patent measured through the similarity measurement step for each target patent belonging to the comparison target patent group. Similar patent classification method, characterized in that.

The method of claim 8,
The CPC code granted to the patent is classified into a first classification value or a second classification value,
The technical field similarity calculation step and the technical property similarity calculation step,
When calculating the similarity of the technology field between the reference patent and the target patent and calculating the similarity of the technology attribute between the reference patent and the target patent, according to a preset calculation option,
The CPC codes granted to the reference patent and the target patent are calculated using the first and second classification values without distinction, or calculated using only the CPC codes classified by the first classification value, or the first classification value. Calculated by using only the CPC code divided by the second division value, or by reflecting a specific weight on the result using only the CPC code divided by the first division value and the CPC code divided by the second division value. Similar patent classification method, characterized in that.

The method of claim 9,
When providing a similar patent to the reference patent, a weight applied to the result of using only the CPC code divided by the second division value in the result of using only the CPC code divided by the first division value with higher importance than the second division value The method of classifying similar patents, further comprising the step of providing an order of magnitude of similarity calculated by reflecting a greater weight as a provision priority.

A technology field similarity diagram that generates a CPC set normalized from the CPC (Cooperative Patent Classification) code of the reference patent and the CPC code of the target patent, and calculates the similarity of the technology field between the reference patent and the target patent based on the generated CPC set. Calculation unit;
A common normalized CPC code is identified from the CPC code of the reference patent and the CPC code of the target patent, and the code section from the common normalized CPC code to the CPC code of the reference patent and the code section from the CPC code of the target patent are reflected. Thus, a technology attribute similarity calculation unit that calculates a technology attribute similarity between the reference patent and the target patent; And
A similarity measurement unit for measuring a similarity between the reference patent and the target patent based on the calculated similarity of the technology field and the similarity of the technology attribute;
The technical field similarity calculation unit,
Based on the CPC schema, each CPC code of the reference patent and each CPC code of the target patent are normalized to a main-group level,
For each of the reference patent and the target patent, a co-occurrence frequency matrix indicating the number of simultaneous appearances in the patent document for each of the normalized CPC codes in the reference patent and the target patent is generated as the CPC set,
A similar patent classification system, characterized in that, based on a matrix of co-occurrence frequencies for each normalized CPC code generated for each of the reference patent and the target patent, a degree of similarity in a technology field between the reference patent and the target patent.

delete

The method of claim 11,
The technical field similarity calculation unit,
Based on the co-occurrence frequency matrix for each normalized CPC code generated for each of the reference patent and the target patent, the sum of the values obtained by multiplying the co-occurrence frequencies of the reference patent and the target patent for each normalized CPC code, and the criterion A similar patent classification system, characterized in that, by using the sum of the simultaneous occurrence frequencies of the normalized CPC codes in the patent and the sum of the simultaneous occurrence frequencies of the normalized CPC codes in the target patent, the similarity of the technical field between the reference patent and the target patent is calculated.

The method of claim 11,
The technology attribute similarity calculation unit,
Based on the CPC schema, each CPC code of the reference patent and each CPC code of the target patent are normalized to a main-group level,
For each of the reference patent and the target patent, identify a code segment set consisting of a code of an upper segment existing between the original CPC code and the normalized CPC code for each of the normalized CPC codes,
For each normalized CPC code common between the reference patent and the target patent, the similarity between the original CPC codes is determined by using the code segment set identified for the reference patent based on the normalized CPC code and the code segment set identified for the target patent. Measure,
A similar patent classification system, comprising calculating a similarity of a technology attribute between the reference patent and the target patent by reflecting the similarity between the original CPC codes measured for each of the common normalized CPC codes.

The method of claim 14,
The degree of similarity between each of the above original CPC codes is,
For each of the common normalized CPC codes, the number of codes in the intersection of the code segment set identified between the original CPC code and the normalized CPC code of the reference patent and the code segment set identified between the original CPC code and the normalized CPC code of the target patent A similar patent classification system, characterized in that, as the number of codes in the intersection increases, a higher degree of similarity is measured using the number of codes in the union and the union.

The method of claim 14,
The technology attribute similarity calculation unit,
For each of the reference patent and the target patent, a co-occurrence frequency matrix representing the frequency of simultaneous appearances in the patent document for each of the normalized CPC codes normalized in the reference patent and the target patent is generated,
Based on the simultaneous occurrence frequency matrix for each normalized CPC code generated for each of the reference patent and the target patent, the sum of the number of simultaneous occurrences of the normalized CPC code in the reference patent, and the sum of the number of simultaneous occurrences of the normalized CPC code in the target patent And reflecting the similarity between the original CPC codes measured for each of the common normalized CPC codes, and calculating a similarity of a technology attribute between the reference patent and the target patent.