KR101269441B1

KR101269441B1 - Apparatus and method for assessing patent infringement risks based on semantic patent claim analysis

Info

Publication number: KR101269441B1
Application number: KR1020110041326A
Authority: KR
Inventors: 박용태; 이창용; 송보미
Original assignee: 서울대학교산학협력단
Priority date: 2011-05-02
Filing date: 2011-05-02
Publication date: 2013-05-30
Also published as: KR20120123781A

Abstract

의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치 및 그 방법이 개시된다. 특허의 특허청구범위에 대한 정보를 추출하는 데이터 수집부, 상기 특허청구범위에 대한 정보를 계층적 SAO(subject-action-object) 벡터로 변환하는 벡터 구축부 및 상기 변환된 계층적 SAO 벡터에 트리 매칭 알고리즘을 적용하고 대상 기술과의 유사도를 측정하여 특허 침해 가능성을 판단하는 침해 판단부를 포함하는 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치는 특허 침해 위험을 평가함에 있어서, 특허 침해 여부 판단의 근거가 되는 특허청구범위의 구조적 유사도와 내용적 유사도를 동시에 고려함으로써 보다 정확한 평가 결과를 도출할 수 있는 효과가 있다.An apparatus and method for determining patent infringement based on analysis of semantic patent claims are disclosed. A data collection unit for extracting information about the claims of the patent, a vector construction unit for converting the information about the claims into a hierarchical subject-action-object (SAO) vector, and a tree in the transformed hierarchical SAO vector The patent infringement determination device based on the analysis of a patent infringement based on a semantic-based patent claim analysis including a infringement determination unit that determines a possibility of patent infringement by applying a matching algorithm and measuring the similarity with a target technology, provides a basis for determining whether a patent infringement is incurred. By considering the structural similarity and the content similarity of the claims at the same time, it is possible to derive a more accurate evaluation results.

Description

Apparatus and method for assessing patent infringement risks based on semantic patent claim analysis}

본 발명은 서비스 시스템에 관한 것으로, 특히 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치 및 그 방법에 관한 것이다.
The present invention relates to a service system, and more particularly, to an apparatus and method for determining patent infringement based on semantic-based patent claim analysis.

최근 특허를 상용화할 생각 없이 지적재산권만을 행사하는 특허 괴물 (patent troll) 기업이 본격화됨에 따라 특허 침해로 인한 특허 분쟁이 증가하고 있다. 특허 분쟁은 막대한 특허 사용료, 소송비 등과 같은 일차적인 금전적 비용뿐만 아니라 특허 분쟁에 대응하는 데 필요한 관리적·시간적 비용을 야기하며, 나아가 관련 제품 또는 서비스의 판매 중지 및 출시 지연, 시장 점유율 감소, 주가 하락 등으로 인한 이차적인 재무적 손실로 이어질 수 있다. 예를 들면, 1991년 코닥은 폴라로이드와의 특허 소송에서 패소함에 따라 총 9억 2천만 달러를 손해배상액으로 지불하였으며, 15억 달러의 공장을 폐쇄하고 700명을 해고하는 등 30억 달러 이상의 경제적 손실로 인하여 파산상태에 이르렀다. 또한 국내의 경우에도 삼성전자는 2012년까지 4억달러, LG전자가 2010년까지 2억 9000만달러의 로열티를 대표적인 특허괴물 기업인 인터디지털에게 지불해야 하는 상황이다.Recently, patent disputes have been increasing due to patent infringement as patent troll companies, which exercise intellectual property rights only, do not intend to commercialize patents. Patent disputes not only incur primary monetary costs, such as enormous patent fees and litigation fees, but also administrative and time costs required to respond to patent disputes.They also stop sales and delays in the sale of related products or services, reduce market share, and reduce stock prices. This could lead to secondary financial losses. For example, in 1991, Kodak lost $ 920 million in damages following a patent lawsuit against Polaroid, and more than $ 3 billion in economic losses, including closing $ 1.5 billion and firing 700 people. Due to bankruptcy. In Korea, Samsung Electronics is required to pay royalties of $ 400 million by 2012 and $ 290 million by 2010 to Interdigit, a leading patent monster company.

이처럼 특허 침해의 위험성이 증가함에 따라 특허 침해 가능성을 모니터링하는 업무의 전략적인 중요성이 증가하고 있다. 세계 각국은 주요 국제무역법을 개정, 기업들이 무역 제재를 통해 특허 침해자를 보복할 수 있도록 하는 한편, 국가 레벨에서 특허 침해의 위험성을 홍보하고 특허 정보 지원 서비스를 육성하기 위한 노력을 기울이고 있다. 또한 기업 레벨에서는 타 기업의 특허 활동을 감시함으로써 침해 가능성이 있는 특허를 사전에 매입하거나 특허 포트폴리오를 구축하는 등의 노력 역시 증가하고 있는 추세이다. As the risk of patent infringement increases, the strategic importance of monitoring the possibility of patent infringement is increasing. Countries around the world are making major efforts to revise major international trade laws, allowing companies to retaliate against patent infringers through trade sanctions, while promoting the risk of patent infringement and fostering patent information support services. At the company level, efforts to monitor patent activity of other companies and to purchase infringing patents in advance or to build a patent portfolio are also increasing.

이러한 방법에 있어서 기존에는 전문가들이 관련된 특허를 정독함으로써 특허 침해 위험성을 평가하는 기법이 주로 사용되었다. 그러나 이러한 과정은 시간이 많이 소요되는 노동집약적인 작업으로, 특허 수가 기하급수적으로 증가하고 기술이 복잡해짐에 따라 그 어려움이 크게 증가하고 있어 전문가의 특허 침해 위험 평가 작업을 지원할 수 있는 자동화된 방법 및 시스템에 대한 요구가 높아지고 있는 실정이다.In this method, a technique for evaluating the risk of patent infringement has been mainly used by experts to read related patents. However, this process is a time-consuming, labor-intensive task, and the difficulty is increasing as the number of patents grows exponentially and the complexity of the technology increases automated methods that can assist professionals in assessing the risk of patent infringement. The demand for the system is increasing.

최근에는 특허 데이터베이스의 접근성 및 가용성 증가, 컴퓨터 알고리즘의 발전 등에 힘입어 특허 침해 위험 평가 프로세스에서 전문가의 의사결정을 지원할 수 있는 자동화된 방법을 제공하려는 시도가 증가하고 있다. 종래 기술들은 텍스트 마이닝(text mining), 계량서지학적 분석, 데이터 마이닝(data mining), 정보시각화 기법 등을 활용하여, 청구항 포인트 지도, 특허 청구항 지도, 키워드 기반 특허 지도, 의미기반 특허 지도, 특허 주제 지도 등과 같은 특허 분석 및 시각화 기법을 제시하여 왔다. 이러한 기법들은 많은 양의 특허 문서를 분석하고 전문가의 정독이 필요한 특허를 선별하기 위해 사용될 수 있으나, 특허 침해 위험 평가에 유용한 정보를 제공함에 있어서 데이터, 방법론, 실용성 측면에서 다음과 같은 문제점을 가진다. In recent years, there has been an increase in attempts to provide automated methods to support expert decision making in the patent infringement risk assessment process, driven by increased access and availability of patent databases and advances in computer algorithms. Conventional techniques utilize text mining, quantitative bibliographic analysis, data mining, information visualization techniques, etc. to claim point maps, patent claim maps, keyword-based patent maps, semantic-based patent maps, and patent topics. Patent analysis and visualization techniques such as maps have been proposed. These techniques can be used to analyze a large amount of patent documents and to select patents that require expert perusal, but have the following problems in terms of data, methodology, and practicality in providing useful information for evaluating patent infringement risk.

첫째, 데이터 측면에서 특허청구범위에 대한 심도 있는 분석이 부족하다. 특허 침해 여부의 판단은 특허 권리범위를 법적으로 규정하는 특허청구범위에 기반하여 이루어지므로, 특허 침해 위험 평가에 유용한 정보를 제공하기 위해서는 특허청구범위의 분석이 필수적이다. 그러나 종래 기술들은 특허청구범위의 분석에 초점을 맞추고 있지 않으며, 특히 독립항과 종속항의 인용관계로 구성된 특허청구범위의 반구조적(semi-structured) 특성을 고려하지 않아 특허 침해 위험 평가에 유용한 정보를 제공하는 데에 한계가 있다. First, there is a lack of in-depth analysis of claims in terms of data. Determination of patent infringement is based on the claims that legally define the scope of patent rights, so analysis of claims is essential to provide useful information for evaluating patent infringement risk. Prior arts, however, do not focus on the analysis of claims, and do not take into account the semi-structured nature of claims, which are made up of citations of independent and dependent claims, providing useful information for assessing patent infringement risk. There is a limit to doing so.

둘째, 방법론 측면에서 특허 침해 위험 평가에 특화된 접근이 부족하다. 종래 기술들은 서지 정보의 통계적 분석, 인용 관계 분석, 유사성에 기반한 유형화 등 범용적인 특허 분석에 초점을 맞추고 있으며, 특허청구범위 레벨에서 특허를 비교하고 유사도를 측정할 수 있는 특화된 접근이 전무하기 때문에 특허 침해 위험 평가를 위한 정확한 정보를 제공하는 데에 한계가 있다. Second, in terms of methodology, there is a lack of specialized approaches to assessing patent infringement risk. Prior arts focus on general-purpose patent analysis, including statistical analysis of bibliographic information, citation relationship analysis, and similarity-based typing, and because there is no specialized approach to comparing patents and measuring similarity at the claims level, There is a limit to providing accurate information for evaluating risks of infringement.

마지막으로, 실용성 측면에서 유용성 및 성능에 대한 검증이 부족하다. 텍스트 마이닝 기법을 이용하여 키워드 벡터(keyword vector)를 기반으로 특허 간 유사도를 측정하는 방법의 경우, 키워드에 의해 그 성능이 크게 좌우되며, 키워드의 선정은 산업, 기술, 분석자에 따라 달라질 수 있다. 따라서 키워드 벡터에 기반한 종래 기술의 경우, 타당성을 획득하고 현장에서 실제적인 효용을 제공하기 위해서는 키워드에 대한 강건성(robustness) 검증이 필수적으로 요구되나 종래 기술에서는 이에 대한 고려가 거의 이루어지지 않고 있다.Finally, there is a lack of verification of utility and performance in terms of practicality. In the case of a method of measuring similarity between patents based on a keyword vector using a text mining technique, the performance depends largely on the keyword, and the selection of the keyword may vary according to industry, technology, and analysts. Therefore, in the prior art based on the keyword vector, robustness verification is required for the keyword in order to obtain validity and provide practical utility in the field. However, in the prior art, little consideration is given to this.

전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.
The above-described background technology is technical information that the inventor holds for the derivation of the present invention or acquired in the process of deriving the present invention, and can not necessarily be a known technology disclosed to the general public prior to the filing of the present invention.

본 발명은 자동화된 특허 수집, 처리, 분석 및 특허 침해 위험 평가 방법을 제공함으로써 특허 정보 조사 및 분석에 요구되는 인적, 시간적, 경제적 비용을 절감할 수 있는 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치 및 그 방법을 제공하기 위한 것이다. The present invention provides an automated patent collection, processing, analysis, and patent infringement risk assessment method, the patent infringement determination device based on the semantic-based patent claims analysis that can reduce the human, time, and economic costs required for patent information research and analysis And a method thereof.

또한, 본 발명은 특허 침해 위험을 평가함에 있어서, 특허 침해 여부 판단의 근거가 되는 특허청구범위의 구조적 유사도와 내용적 유사도를 동시에 고려함으로써 보다 정확한 평가 결과를 도출할 수 있는 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치 및 그 방법을 제공하기 위한 것이다. In addition, the present invention, when evaluating the risk of patent infringement, considers the structural similarity and the content similarity of the claims, which are the basis of the determination of patent infringement, at the same time, the meaning-based patent claim analysis that can derive more accurate evaluation results It is an object of the present invention to provide a patent infringement determination apparatus and method thereof.

또한, 본 발명은 연구개발 초기 단계에서 특허 침해 위험을 평가하기 위한 도구로서 활용될 수 있으며, 이를 통해 특허 분쟁에 휘말릴 가능성을 사전에 차단하고, 연구개발 비용 절감 및 효과적인 자원 활용이 가능하며, 기업이 보유한 특허 및 기술과 분쟁 가능성이 높은 특허들을 도출할 수 있어, 특허 침해 및 위반 감시, 특허 분쟁에 대비한 특허 매입 및 특허 포트폴리오 구축 전략 수립 등을 지원하는 도구로서 활용할 수 있는 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치 및 그 방법을 제공하기 위한 것이다. In addition, the present invention can be used as a tool for evaluating the risk of patent infringement in the early stages of R & D, thereby preventing the possibility of encroachment on patent disputes in advance, and reducing the cost of R & D and effective resource utilization. Analysis of semantic-based patent claims that can be used as a tool to assist in patent infringement and violations, patent purchase in preparation for patent disputes, and establishment of patent portfolio construction strategy by deriving patents and technologies with high potential for disputes It is to provide a patent infringement determination apparatus and a method based on the.

또한, 본 발명은 특허 침해 위험을 평가하는 과정에 있어서 전문가의 정독이 필요한 특허를 일차적으로 선별할 수 있는 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치 및 그 방법을 제공하기 위한 것이다. Another object of the present invention is to provide an apparatus and method for determining patent infringement based on semantic-based patent claim analysis, which can primarily select patents requiring expert perusal in evaluating a patent infringement risk.

본 발명이 제시하는 이외의 기술적 과제들은 하기의 설명을 통해 쉽게 이해될 수 있을 것이다.
The technical problems other than the present invention can be easily understood from the following description.

본 발명의 일 측면에 따르면, 특허의 특허청구범위에 대한 정보를 추출하는 데이터 수집부, 상기 특허청구범위에 대한 정보를 계층적 SAO(subject-action-object) 벡터로 변환하는 벡터 구축부 및 상기 변환된 계층적 SAO 벡터에 트리(tree) 매칭 알고리즘을 적용하고 대상 기술과의 유사도를 측정하여 특허 침해 가능성을 판단하는 침해 판단부를 포함하는 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치가 제시된다.According to an aspect of the invention, the data collection unit for extracting information on the claims of the patent, the vector construction unit for converting the information on the claims to a hierarchical SAO (subject-action-object) vector and the An apparatus for determining patent infringement based on a semantic-based patent claim analysis including an infringement determination unit that applies a tree matching algorithm to a transformed hierarchical SAO vector and measures similarity with a target technology to determine a possibility of patent infringement is provided.

여기서, 상기 벡터 구축부는, 상기 특허청구범위로부터 SAO 관련 키워드를 추출하는 키워드 추출부, 상기 추출된 키워드를 조합하여 SAO_j(j=1 내지 n) 집합을 생성하는 SAO 집합 생성부 및 상기 특허청구범위에 포함된 청구항이 상기 생성된 SAO_j를 포함하는 경우 1, 상기 생성된 SAO_j를 포함하지 않는 경우 0을 할당하여 SAO 벡터를 생성하는 SAO 벡터 생성부를 포함할 수 있다. Here, the vector construction unit, a keyword extraction unit for extracting SAO-related keywords from the claims, a SAO set generation unit for generating a SAO _j (j = 1 to n) set by combining the extracted keywords and the claim When the claim included in the range includes the generated SAO _j 1, and does not include the generated SAO _j may include a SAO vector generator for generating a SAO vector by allocating 0.

또한, 상기 벡터 구축부는, 상기 특허청구범위에 포함되는 복수의 청구항에 대응되는 노드(node)들을 생성하고, 상기 생성된 노드들을 대응되는 청구항 간 인용 관계에 따라 링크(link)로 연결함으로써 트리 구조를 구축하는 트리 구조 생성부를 더 포함할 수 있다. In addition, the vector construction unit, by generating the nodes (nodes) corresponding to the plurality of claims included in the claims, and connecting the generated nodes by a link (link) according to the citation relationship between the corresponding claims (tree) It may further include a tree structure generating unit for constructing.

또한, 상기 벡터 구축부는, 상기 생성된 트리 구조에 포함된 노드에 대응되는 청구항의 SAO 벡터를 할당하고, 상기 특허청구범위에 포함되는 복수의 청구항의 인용 레벨에 상응하여 노드의 레벨을 설정하는 노드 설정부를 더 포함할 수 있다. In addition, the vector construction unit, the node for assigning the SAO vector of the claim corresponding to the node included in the generated tree structure, and set the level of the node corresponding to the citation level of the plurality of claims included in the claims The setting unit may further include.

또한, 상기 침해 판단부는, 상기 변환된 계층적 SAO 벡터에 트리 매칭 알고리즘을 적용하고 제 1 특허와 제 2 특허 간 상기 특허청구범위 사이의 유사도를 측정하여 특허 침해 가능성을 판단할 수 있다.In addition, the infringement determination unit may determine the possibility of patent infringement by applying a tree matching algorithm to the transformed hierarchical SAO vector and measuring the similarity between the claims between the first patent and the second patent.

여기서, 상기 침해 판단부는, 상기 노드의 레벨이 동일한 청구항을 대상으로 제 1 유사도를 측정하여 상기 제 1 유사도가 높은 순서대로 서로 매칭시키는 노드 매칭부를 포함할 수 있으며, 상기 제 1 유사도는 다음과 같은 코사인 유사도 지수로 측정될 수 있다.Here, the infringement determination unit may include a node matching unit for measuring a first similarity level and matching each other in the order of high first similarity level targeting the claims having the same level of the node, wherein the first similarity level is as follows. Cosine similarity index can be measured.

여기서, X는 상기 제 1 특허의 청구항의 SAO 벡터, Y는 상기 제 2 특허의 청구항의 SAO 벡터이다.Where X is the SAO vector of the claims of the first patent and Y is the SAO vector of the claims of the second patent.

또한, 상기 노드 매칭부는, 상기 제 2 특허의 청구항 중 상기 제 1 특허의 노드의 레벨에 상응하는 청구항이 없는 경우 상기 제 2 특허에 더미 노드를 추가할 수 있다.The node matching unit may add a dummy node to the second patent when there is no claim corresponding to the level of the node of the first patent among the claims of the second patent.

여기서, 상기 더미 노드의 SAO 벡터는 0 벡터가 될 수 있다.Here, the SAO vector of the dummy node may be a zero vector.

또한, 상기 침해 판단부는, 하나의 독립항 또는 하나의 독립항과 이를 인용하는 종속항들을 요소로 하여 상기 매칭된 노드들을 바탕으로 다음과 같은 식으로 정의되는 제 2 유사도를 측정함으로써 i 번째 요소 간 유사 여부를 판별하는 유사도 측정부를 더 포함할 수 있다. In addition, the infringement determination unit, whether the similarity between the i-th element by measuring the second similarity, which is defined as follows based on the matched nodes based on one independent term or one independent term and the dependent terms citing it It may further include a similarity measuring unit for determining.

여기서, L은 청구항 요소에 포함된 레벨의 수, N_j는 j번째 레벨에서의 노드의 수, w(j)는 j번째 레벨에 대한 유사도 가중치, w(N_j)는 레벨 j에 위치한 노드의 유사도 가중치, 대응노드(L_j)간 유사도는 매칭된 노드 간 상기 코사인 유사도 지수 값이다.Where L is the number of levels contained in the claim element, N _j is the number of nodes at the j th level, w (j) is the similarity weight for the j th level, and w (N _j ) is the number of nodes located at level j. Similarity weight, the similarity between the corresponding node (L _j ) is the cosine similarity index value between the matching nodes.

또한, 상기 w(N_j)는 레벨 j에 위치한 노드 개수의 역수(1/N_j)가 될 수 있으며, 상기 w(j)는 다음과 같은 식으로 특정될 수 있다. Also, w (N _j ) may be the inverse of the number of nodes located at level j (1 / N _j ), and w (j) may be specified as follows.

여기서, 상기 침해 판단부는, 상기 제 2 유사도를 이용하여 다음과 같은 식으로 정의되는 제 3 유사도를 도출하여 상기 제 1 특허와 상기 제 2 특허간 특허 침해 가능성을 판단하는 특허 침해 판단부를 더 포함할 수 있다.Here, the infringement determination unit may further include a patent infringement determination unit that determines a possibility of patent infringement between the first patent and the second patent by deriving a third similarity using the second similarity as follows. Can be.

여기서, n은 상기 특허청구범위에 포함된 요소의 개수이다.Where n is the number of elements included in the claims.

본 발명의 다른 측면에 따르면, 특허의 특허청구범위에 대한 정보를 추출하는 데이터 수집부, 상기 특허청구범위로부터 SAO 관련 키워드를 추출하며, 상기 추출된 키워드를 조합하여 SAO_j(j=1 내지 n) 집합을 생성하고, 상기 특허청구범위에 포함된 청구항이 상기 생성된 SAO_j를 포함하는 경우 1, 상기 생성된 SAO_j를 포함하지 않는 경우 0을 할당하여 SAO 벡터를 생성하며, 상기 특허청구범위에 포함되는 복수의 청구항의 인용 관계를 트리 구조로 구축하고, 상기 생성된 트리 구조에 포함된 노드에 대응되는 청구항의 SAO 벡터를 할당하고, 상기 특허청구범위에 포함되는 복수의 청구항의 인용 레벨에 상응하여 노드의 레벨을 설정하는 벡터 구축부 및 상기 노드의 레벨이 동일한 청구항을 대상으로 제 1 유사도를 측정하여 상기 제 1 유사도가 높은 순서대로 서로 매칭시키고, 하나의 독립항 또는 하나의 독립항과 이를 인용하는 종속항들을 요소로 하여 상기 매칭된 노드들을 바탕으로 제 2 유사도를 측정함으로써 요소 간 유사 여부를 판별하며, 상기 제 2 유사도를 이용하여 제 3 유사도를 도출하여 제 1 특허와 제 2 특허간 특허 침해 가능성을 판단하는 침해 판단부를 포함하는 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치가 제시된다.According to another aspect of the present invention, a data collection unit for extracting information on the claims of the patent, SAO-related keywords are extracted from the claims, combining the extracted keywords SAO _j (j = 1 to n ) to generate a set, and if the claims included in the claims includes the generated SAO _j 1, by assigning a 0 if it does not contain the generated SAO _j generate SAO vector, the following claims Constructs a citation relationship of a plurality of claims included in a tree structure, allocates a SAO vector of a claim corresponding to a node included in the generated tree structure, and assigns a citation level of the plurality of claims included in the claims. The vector constructing unit which sets the level of the node correspondingly and the first similarity are measured in the order of the first similarity in order of the same claim with the same level of the node. And determine similarity between elements by measuring a second similarity based on the matched nodes based on one independent term or one independent term and dependent terms citing the same. An apparatus for determining patent infringement based on a semantic-based patent claim analysis including an infringement determination unit for deriving similarity and determining a possibility of patent infringement between a first patent and a second patent is provided.

또한, 본 발명의 또 다른 측면에 따르면, 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치가 특허 침해를 판단하는 방법에 있어서, 특허의 특허청구범위에 대한 정보를 추출하는 단계, 상기 특허청구범위에 대한 정보를 계층적 SAO 벡터로 변환하는 단계 및 상기 변환된 계층적 SAO 벡터에 트리 매칭 알고리즘을 적용하고 대상 기술과의 유사도를 측정하여 특허 침해 가능성을 판단하는 단계를 포함하는 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 방법이 제시된다.Further, according to another aspect of the present invention, in the method of determining a patent infringement by the patent infringement determination device based on the semantic-based patent claim analysis, extracting information on the claims of the patent, the claim And converting the information about the hierarchical SAO vector into a hierarchical SAO vector, and applying a tree matching algorithm to the transformed hierarchical SAO vector and measuring similarity with the target technology to determine the possibility of patent infringement. Based on the patent infringement determination method is presented.

여기서, 상기 정보 추출 단계는, 상기 특허청구범위로부터 SAO 관련 키워드를 추출하는 단계, 상기 추출된 키워드를 조합하여 SAO_j(j=1 내지 n) 집합을 생성하는 단계 및 상기 특허청구범위에 포함된 청구항이 상기 생성된 SAO_j를 포함하는 경우 1, 상기 생성된 SAO_j를 포함하지 않는 경우 0을 할당하여 SAO 벡터를 생성하는 단계를 더 포함할 수 있다. The information extracting step may include extracting SAO-related keywords from the claims, generating SAO _j (j = 1 to n) sets by combining the extracted keywords, and including the claims. If the claim does not include the first, the generated SAO _j if it contains the generated SAO _j may further include the step of generating a SAO vector by assigning a zero.

또한, 상기 벡터 변환 단계는, 상기 특허청구범위에 포함되는 복수의 청구항의 인용 관계를 트리 구조로 구축하는 단계 및 상기 생성된 트리 구조에 포함된 노드에 대응되는 청구항의 SAO 벡터를 할당하고 상기 특허청구범위에 포함되는 복수의 청구항의 인용 레벨에 상응하여 노드의 레벨을 설정하는 단계를 더 포함할 수 있다. The vector converting may include: constructing a citation relationship of a plurality of claims included in the claims in a tree structure, and allocating SAO vectors of claims corresponding to nodes included in the generated tree structure; The method may further include setting a level of the node corresponding to the level of citation of the plurality of claims included in the claims.

여기서, 상기 특허 침해 가능성 판단 단계는, 상기 노드의 레벨이 동일한 청구항을 대상으로 제 1 유사도를 측정하여 상기 제 1 유사도가 높은 순서대로 서로 매칭시키고, 하나의 독립항 또는 하나의 독립항과 이를 인용하는 종속항들을 요소로 하여 상기 매칭된 노드들을 바탕으로 제 2 유사도를 측정함으로써 요소 간 유사 여부를 판별하며, 상기 제 2 유사도를 이용하여 제 3 유사도를 도출하여 제 1 특허와 제 2 특허간 특허 침해 가능성을 판단할 수 있다. Here, in the patent infringement determination step, the first level of similarity is measured with respect to the claim of the same level of the node to match each other in the order of high first similarity, one independent claim or one independent claim and the dependent By determining the similarity between elements by measuring a second similarity based on the matched nodes using the terms as elements, the third similarity is derived by using the second similarity to infringe the patent between the first patent and the second patent. Can be judged.

여기서, 상기 제 1 유사도는 다음과 같은 코사인 유사도 지수로 측정될 수 있다.Here, the first similarity may be measured by the following cosine similarity index.

또한, 상기 제 2 유사도는 다음과 같은 식으로 측정될 수 있다.In addition, the second similarity may be measured by the following equation.

또한, 상기 제 3 유사도는 다음과 같은 식으로 측정될 수 있다. In addition, the third similarity may be measured by the following equation.

본 발명의 또 다른 측면에 따르면, 상술한 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 방법을 수행하기 위하여 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며 디지털 처리 장치에 의해 판독될 수 있는 프로그램을 기록한 기록매체가 제시된다.According to another aspect of the present invention, a program of instructions that can be executed by a digital processing apparatus is tangibly implemented and read by the digital processing apparatus to perform the patent infringement determination method based on the semantic-based patent claim analysis described above. A record carrier is provided which records the program.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다.
Other aspects, features, and advantages will become apparent from the following drawings, claims, and detailed description of the invention.

본 발명에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치 및 그 방법은 자동화된 특허 수집, 처리, 분석 및 특허 침해 위험 평가 방법을 제공함으로써 특허 정보 조사 및 분석에 요구되는 인적, 시간적, 경제적 비용을 절감할 수 있는 효과가 있다. Patent infringement determination device and method based on semantic-based patent claim analysis according to the present invention provides automated patent collection, processing, analysis and patent infringement risk assessment method, the human, time, economic cost required for patent information investigation and analysis There is an effect to reduce the.

또한, 본 발명에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치 및 그 방법은 특허 침해 위험을 평가함에 있어서, 특허 침해 여부 판단의 근거가 되는 특허청구범위의 구조적 유사도와 내용적 유사도를 동시에 고려함으로써 보다 정확한 평가 결과를 도출할 수 있는 효과가 있다. In addition, the patent infringement determination device and method based on the semantic-based patent claim analysis according to the present invention, when evaluating the risk of patent infringement, simultaneously consider the structural similarity and the content similarity of the claims that are the basis of the determination of patent infringement As a result, a more accurate evaluation result can be obtained.

또한, 본 발명에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치 및 그 방법은 연구개발 초기 단계에서 특허 침해 위험을 평가하기 위한 도구로서 활용될 수 있으며, 이를 통해 특허 분쟁에 휘말릴 가능성을 사전에 차단하고, 연구개발 비용 절감 및 효과적인 자원 활용이 가능하며, 기업이 보유한 특허 및 기술과 분쟁 가능성이 높은 특허들을 도출할 수 있어, 특허 침해 및 위반 감시, 특허 분쟁에 대비한 특허 매입 및 특허 포트폴리오 구축 전략 수립 등을 지원하는 도구로서 활용할 수 있는 효과가 있다. In addition, the patent infringement determination apparatus and method based on the semantic-based patent claim analysis according to the present invention can be utilized as a tool for evaluating the risk of patent infringement in the early stages of research and development, thereby preliminary to the possibility of being involved in a patent dispute It is possible to cut off R & D costs and effectively utilize resources, and to derive patents and technologies owned by companies and patents with high potential for disputes, so that patent infringement and violations can be monitored, patent purchases for patent disputes, and patent portfolio construction It can be used as a tool to support strategy establishment.

또한, 본 발명은 특허 침해 위험을 평가하는 과정에 있어서 전문가의 정독이 필요한 특허를 일차적으로 선별할 수 있는 효과가 있다.
In addition, the present invention has an effect that can be primarily selected for patents that need to be carefully read in the process of evaluating the risk of patent infringement.

도 1a는 본 발명의 실시예에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치의 블록 구성도.
도 1b는 본 발명의 실시예에 따른 벡터 구축부의 블록 구성도.
도 1c는 본 발명의 실시예에 따른 침해 판단부의 블록 구성도.
도 1d는 본 발명의 실시예에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 방법의 흐름도.
도 2는 주요 특허 침해 분석 방법을 설명한 도면.
도 3은 본 발명의 실시예에 따른 특허청구범위의 계층적 SAO 벡터를 도시한 도면.
도 4는 본 발명의 실시예에 따른 특허 침해 판단을 위해 유사도를 비교하는 절차를 도시한 도면.
도 5는 본 발명의 실시예에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 방법을 검증하기 위한 4개의 특허 세트를 도시한 도면.
도 6은 본 발명의 실시예에 따른 계층적 SAO 벡터를 도시한 도면.
도 7은 본 발명의 실시예에 따른 세트 1의 키워드 유사도 행렬을 도시한 도면.
도 8은 본 발명의 실시예에 따른 세트 2의 키워드 유사도 행렬을 도시한 도면.
도 9는 본 발명의 실시예에 따른 세트 3의 키워드 유사도 행렬을 도시한 도면.
도 10은 본 발명의 실시예에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 방법의 t 검정 결과를 요약한 도면.
도 11 내지 도 13은 본 발명의 실시예에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 방법의 적중률을 종래 기술과 비교한 도면.1A is a block diagram illustrating a patent infringement determination apparatus based on analysis of semantic-based patent claims according to an embodiment of the present invention.
Figure 1b is a block diagram of a vector construction unit according to an embodiment of the present invention.
Figure 1c is a block diagram of an infringement determination unit according to an embodiment of the present invention.
1D is a flowchart of a method for determining patent infringement based on analysis of semantic-based patent claims in accordance with an embodiment of the present invention.
2 is a diagram illustrating a major patent infringement analysis method.
3 illustrates a hierarchical SAO vector of claims according to an embodiment of the present invention.
4 is a diagram illustrating a procedure for comparing similarities for determining patent infringement according to an embodiment of the present invention.
FIG. 5 illustrates four patent sets for verifying a patent infringement determination method based on semantic based patent claim analysis according to an embodiment of the present invention. FIG.
6 illustrates a hierarchical SAO vector according to an embodiment of the present invention.
FIG. 7 illustrates a keyword similarity matrix of site 1 in accordance with an embodiment of the present invention. FIG.
8 illustrates a keyword similarity matrix of set 2 according to an embodiment of the present invention.
9 illustrates a keyword similarity matrix of set 3 according to an embodiment of the present invention.
10 is a view summarizing the results of the t test of the method for determining patent infringement based on analysis of semantic-based patent claims according to an embodiment of the present invention.
11 to 13 is a view comparing the hit ratio of the patent infringement determination method based on the semantic-based patent claim analysis according to an embodiment of the present invention with the prior art.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

제 1, 제 2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms including ordinal numbers such as first and second may be used to describe various components, but the components are not limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be. On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, the terms "comprises" or "having" and the like refer to the presence of stated features, integers, steps, operations, elements, components, or combinations thereof, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

또한, 명세서에 기재된 "…부", "…모듈", "…수단" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.The term " part, "" module," " means, "or the like, which is described in the specification, refers to a unit for processing at least one function or operation and may be implemented by hardware or software or a combination of hardware and software .

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.
In the following description of the present invention with reference to the accompanying drawings, the same components are denoted by the same reference numerals regardless of the reference numerals, and redundant explanations thereof will be omitted. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.

도 1a는 본 발명의 실시예에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치의 블록 구성도이다. 도 1a를 참조하면, 특허 침해 판단 장치(100)는 데이터 수집부(110), 벡터 구축부(120), 침해 판단부(130) 및 제어 유닛(140)을 포함할 수 있다. Figure 1a is a block diagram of a patent infringement determination device based on the analysis of semantic-based patent claims according to an embodiment of the present invention. Referring to FIG. 1A, the patent infringement determination apparatus 100 may include a data collection unit 110, a vector constructing unit 120, an infringement determination unit 130, and a control unit 140.

본 실시예는 특허 침해 위험을 평가함에 있어서, 특허 침해 여부 판단의 근거가 되는 특허청구범위의 구조적 유사도와 내용적 유사도를 동시에 고려함으로써 보다 정확한 평가 결과를 도출할 수 있는 특징이 있다. 즉, 본 실시예는 계층적 SAO 벡터를 통해 특허 침해 여부 판단의 근거가 되는 특허청구범위의 구조와 내용을 동시에 표현할 수 있는 방법을 제공하며, 계층적 SAO 벡터 간 유사도를 측정할 수 있는 트리 매칭 알고리즘을 개발 및 적용함으로써 특허 침해 위험 가능성을 보다 정확하게 도출할 수 있는 특징이 있다.In this embodiment, in evaluating the risk of infringement of a patent, it is possible to derive a more accurate evaluation result by simultaneously considering the structural similarity and the content similarity of the claims which are the basis of the determination of patent infringement. That is, the present embodiment provides a method of simultaneously expressing the structure and contents of claims, which are the basis for determining patent infringement, through hierarchical SAO vectors, and tree matching for measuring similarity between hierarchical SAO vectors. By developing and applying the algorithm, it is possible to more accurately derive the risk of patent infringement.

본 발명은 특허와 대상 기술, 예를 들면, 제 1 특허와 제 2 특허, 특허와 특정 제품을 서로 비교하여 침해 여부를 판단하는 특징이 있다. 특허와 특정 제품을 서로 비교하는 경우 특허청구범위로 표현한 특정 제품의 구성요소를 특허와 비교할 수도 있다. 이하에서는 특허와 특허를 서로 비교하여 침해 여부를 판단하는 실시예를 중심으로 설명한다. The present invention is characterized by comparing a patent and a target technology, for example, a first patent and a second patent, a patent, and a specific product to determine whether an infringement occurs. When comparing a patent with a specific product, the components of the specific product expressed in the claims may be compared with the patent. In the following description, a patent and a patent are compared with each other to focus on an embodiment for determining whether an infringement occurs.

특허 침해 판단 장치(100)는 로컬 서버, 컴퓨터 단말기와 같이 사용자가 해당 장치에 연결된 입력 장치를 이용하여 조작할 수 있는 장치이거나 또는 유선 또는 무선 인터넷 상에서 사용자가 접근하여 조작할 수 있는 장치가 될 수 있다. 따라서 특허 침해 판단 장치(100)는 회사 또는 가정 내에 로컬로 구비되거나 외부 사용자가 접근 가능하도록 유선 또는 무선 인터넷에 연결될 수 있다.The patent infringement determination apparatus 100 may be a device that a user can operate using an input device connected to the device, such as a local server or a computer terminal, or a device that can be accessed and operated by a user on a wired or wireless Internet. have. Therefore, the patent infringement determination apparatus 100 may be provided locally in a company or home, or may be connected to a wired or wireless Internet to be accessible to external users.

데이터 수집부(110)는 특허의 특허청구범위에 대한 정보를 추출한다. 여기서, 데이터 수집부(110)는 소정의 특허 데이터베이스로부터 특허청구범위에 대한 정보를 추출하거나 또는 특허 문서로부터 특허청구범위에 대한 정보를 추출할 수 있다. The data collection unit 110 extracts information about the claims of the patent. Here, the data collection unit 110 may extract information about the claims from a predetermined patent database or extract information about the claims from a patent document.

특허 데이터베이스는 특허 문서 및 서지적 정보, 예를 들면, 출원일, 출원번호, 출원인, 등록일, 특허청구범위에 대한 정보, 인용하는 발명, 인용되는 발명 등 다양한 정보를 저장하는 데이터베이스가 될 수 있다. 특허청구범위에 대한 정보는 청구항 번호, 청구항 내용, 청구항임을 나타내는 정보(예를 들면, "특허청구범위", "청구항 n", "We claim", "What is claimed is" 등과 같은 문구)를 포함할 수 있다. The patent database may be a database that stores various information such as patent documents and bibliographic information, for example, application date, application number, applicant, registration date, information about claims, invention cited, invention cited, and the like. Information about a claim includes a claim number, a claim content, and information indicating the claim (for example, phrases such as "claim", "claim n", "We claim", "What is claimed is", etc.). can do.

데이터 수집부(110)는 사용자로부터 입력받은 기술분야, 특정 특허번호, 소정의 검색식, 특정 특허분류 등에 상응하여 해당 기술분야에 해당하는 특허번호, 특정 특허번호, 소정의 검색식에 의해 검색된 특허번호, 특정 특허분류(IPC, USPC, F-Term, ECLA 등)에 상응하는 특허번호를 추출하고, 해당 특허번호에 상응하는 특허청구범위에 대한 정보를 추출할 수 있다.The data collection unit 110 corresponds to a technical field, a specific patent number, a predetermined search expression, a specific patent classification, etc. received from a user, and a patent searched by a patent number, a specific patent number, and a predetermined search expression corresponding to the corresponding technical field. Numbers, patent numbers corresponding to specific patent classifications (IPC, USPC, F-Term, ECLA, etc.) can be extracted, and information on claims corresponding to the corresponding patent numbers can be extracted.

또한, 데이터 수집부(110)가 특허 문서로부터 특허청구범위에 대한 정보를 추출하는 경우 데이터 수집부(110)는 데이터 구문분석 기법(data parsing technique)을 이용하여 특허청구범위에 대한 정보를 추출할 수 있다. 구문분석 기법은 다양한 방식이 있을 수 있으며, 예를 들면, 낱말 분석(lexical analysis) 결과로 만들어진 토큰들을 문법에 따라 분석하는 파싱(parsing) 작업을 수행하여 파싱 트리를 구성하는 방식이 될 수 있다. 본 발명에 따른 구문분석 기법은 특허 문서에서 상술한 청구항임을 나타내는 정보를 특허 문서에서 추적하고, 해당 단어와 연관된 구조에 기재된 특허청구범위에 대한 정보를 추출할 수도 있다.In addition, when the data collection unit 110 extracts information about the claims from the patent document, the data collection unit 110 may extract the information about the claims using a data parsing technique. Can be. The parsing technique may have various methods, for example, a method of constructing a parsing tree by performing a parsing operation of parsing tokens resulting from lexical analysis according to a grammar. The parsing technique according to the present invention may track information indicating that the claims are mentioned in the patent document in the patent document, and extract information about the claims described in the structure associated with the word.

벡터 구축부(120)는 특허청구범위에 대한 정보를 계층적 SAO 벡터로 변환한다. 본 실시예는 의미기반 구조적 텍스트 분석(semantic structural text analysis)을 활용하여 특허청구범위를 계층적 SAO 벡터로 변환할 수 있다. 계층적 SAO 벡터는 특허청구범위를 구성하는 특허 청구항들 사이의 인용관계를 트리 구조로, 각 특허 청구항의 내용을 SAO 벡터로 표현한다.The vector constructing unit 120 converts the information about the claims into a hierarchical SAO vector. This embodiment can convert the claims into a hierarchical SAO vector using semantic structural text analysis. The hierarchical SAO vector expresses the citation relationship between the patent claims constituting the claims in a tree structure and the contents of each patent claim in an SAO vector.

계층적 SAO 벡터에 대해서 설명하면 다음과 같다. 특허청구범위는 하나 이상의 독립항과 독립항 또는 다른 종속항을 한정하거나 구체화하는 종속항으로 구성되며, 청구항 사이의 인용관계는 특정 구문("제n항에 있어서" 등)으로 구분 가능하다. 계층적 SAO 벡터는 특허 청구항에 대응되는 노드들로 구성되며, 노드들은 특허 청구항 사이의 인용관계에 따라 링크로 연결되어 트리 구조를 가진다(도 3 참고).A hierarchical SAO vector is described as follows. A claim is made up of one or more independent claims and dependent claims that define or specify an independent or other dependent claim, and the citation relationship between the claims can be distinguished by a particular phrase ("in claim n", etc.). The hierarchical SAO vector is composed of nodes corresponding to the patent claims, and the nodes have a tree structure connected by links according to the citation relationship between the patent claims (see FIG. 3).

특허에 대응되는 노드를 레벨 0에 생성하고, 독립항에 대응되는 노드를 레벨 1에 생성하며, 레벨 0에 위치한 노드와 링크로 연결한다. 종속항에 대응되는 노드를 종속항이 참조하는 청구항의 하위 레벨에 생성하고, 참조하는 청구항에 대응되는 노드와 링크로 연결한다. 본 실시예에 따른 벡터 구축부(120)는 청구항에 대응되는 노드 정보와 이들을 서로 결합시키는 링크 정보를 별도의 저장부에 저장할 수 있다.A node corresponding to the patent is created at level 0, a node corresponding to the independent claim is created at level 1, and a link is made with a node located at level 0. A node corresponding to a dependent claim is generated at a lower level of a claim referred to by the dependent claim, and a link is made with a node corresponding to the referenced claim. The vector constructing unit 120 according to the present exemplary embodiment may store node information corresponding to the claims and link information for combining them with each other in a separate storage unit.

계층적 SAO 벡터의 각 노드는(레벨 0에 위치한 노드 제외) 대응되는 특허 청구항의 내용을 나타내는 SAO 벡터를 가지며, SAO 벡터는 다음과 같은 절차를 통해 생성될 수 있다. 즉, 벡터 구축부(120)는 특허청구범위로부터 SAO 관련 키워드를 추출하고, 추출된 키워드를 바탕으로 SAO 집합(SAO_j : j=1 내지 n)을 도출하며, 특허 청구항이 SAO_j를 포함하면 1을, 포함하지 않으면 0을 할당하는 방식으로 SAO 벡터를 생성할 수 있다. Each node of the hierarchical SAO vector (except for nodes located at level 0) has a SAO vector representing the contents of the corresponding patent claim, and the SAO vector can be generated by the following procedure. That is, the vector constructing unit 120 extracts SAO-related keywords from the claims, derives a SAO set (SAO _j : j = 1 to n) based on the extracted keywords, and if the patent claim includes SAO _j , SAO vectors can be generated by assigning 1, or 0 if not included.

이를 위하여 도 1b에 도시된 바와 같이, 벡터 구축부(120)는 특허청구범위로부터 SAO 관련 키워드를 추출하는 키워드 추출부(121)와, 추출된 키워드를 조합하여 SAO_j(j=1 내지 n) 집합을 생성하는 SAO 집합 생성부(123)와, 특허청구범위에 포함된 청구항이 생성된 SAO_j를 포함하는 경우 1, 생성된 SAO_j를 포함하지 않는 경우 0을 할당하여 SAO 벡터를 생성하는 SAO 벡터 생성부(125)를 포함할 수 있다.To this end, as illustrated in FIG. 1B, the vector constructing unit 120 combines the keyword extracting unit 121 for extracting SAO-related keywords from the claims, and the extracted keywords, SAO _j (j = 1 to n). and SAO set generation unit 123 to generate a set, SAO assigning a 0 if it does not include the first, generated SAO _j if it contains SAO _j of the claims are created including the claims to generate SAO vector The vector generator 125 may be included.

또한, 벡터 구축부(120)는 특허청구범위에 포함되는 복수의 청구항의 인용 관계를 트리 구조로 구축하는 트리 구조 생성부(127)를 더 포함할 수 있다. 트리 구조 생성부(127)는 상술한 바와 같은 청구항에 대응되는 노드 정보와 이들을 서로 결합시키는 링크 정보를 저장할 수 있다.In addition, the vector constructing unit 120 may further include a tree structure generating unit 127 for constructing a citation relationship of the plurality of claims included in the claims in a tree structure. The tree structure generation unit 127 may store node information corresponding to the above-described claims and link information combining them.

또한, 벡터 구축부(120)는 특허청구범위에 포함되는 복수의 청구항의 인용 레벨에 상응하여 노드의 레벨을 설정하는 노드 설정부(129)를 더 포함할 수 있다. 노드 설정부(129)는 상술한 바와 같이 각 노드의 레벨을 특허, 독립항, 종속항에 따라 구분하여 설정함으로써, 특허청구범위가 하나의 트리 구조를 형성하도록 할 수 있다. 여기서, 청구항의 인용 레벨은 청구항의 종속관계를 나타내는 레벨이 될 수 있다. 예를 들면, 레벨 1은 독립항, 레벨 2는 독립항을 직접 인용하는 종속항(제 1 종속항), 레벨 3은 제 1 종속항을 직접 인용하는 종속항(제 2 종속항) 등과 같이 인용 관계에 따라 레벨이 결정될 수 있다.In addition, the vector constructing unit 120 may further include a node setting unit 129 that sets the level of the node corresponding to the citation levels of the plurality of claims included in the claims. As described above, the node setting unit 129 sets levels of each node according to patents, independent claims, and dependent claims so that the claims form one tree structure. Here, the citation level of the claims may be a level indicating the dependency of the claims. For example, a level 1 refers to an independent term, a level 2 refers to a dependent term directly referencing an independent term (first dependent term), a level 3 refers to a dependent term directly referencing a first dependent term (second dependent term), or the like. The level can be determined accordingly.

침해 판단부(130)는 변환된 계층적 SAO 벡터에 트리 매칭 알고리즘을 적용하고 대상 기술과의 유사도를 측정하여 특허 침해 가능성을 판단한다. 즉, 침해 판단부(130)는 계층적 SAO 벡터를 대상으로 트리 매칭(tree matching) 알고리즘에 기반한 특허 청구항 별 비교(claim-by-claim comparison)를 수행함으로써 특허청구범위 사이의 유사도를 측정하고 특허 침해 가능성을 평가한다. Infringement determination unit 130 determines the possibility of patent infringement by applying a tree matching algorithm to the transformed hierarchical SAO vector and measuring the similarity with the target technology. In other words, the infringement determination unit 130 performs a claim-by-claim comparison based on a tree matching algorithm on a hierarchical SAO vector to measure the similarity between claims and patents. Evaluate the potential for infringement.

여기서, 트리 매칭 알고리즘은 특허 청구항 매칭 및 구조 조정(Matching claims and adjusting structure), 요소 유사도 측정(measuring component similarities), 최종 유사도 도출(deriving final similarity)의 세 단계를 포함할 수 있다. 본 명세서에서는 특허 청구항을 매칭시키는데 이용되는 유사도를 제 1 유사도, 유사 여부 판별을 위해 측정되는 요소 유사도를 제 2 유사도, 최종 특허 침해 가능성을 판단하기 위해 산출되는 유사도를 제 3 유사도로 지칭할 수 있다.Here, the tree matching algorithm may include three steps: matching claims and adjusting structure, measuring component similarities, and deriving final similarity. In the present specification, the similarity used to match a patent claim may be referred to as a first similarity, an element similarity measured for determining similarity, a second similarity, and a similarity calculated for determining a final patent infringement possibility as a third similarity. .

도 1c에 도시된 바와 같이, 침해 판단부(130)는 노드의 레벨이 동일한 청구항을 대상으로 제 1 유사도를 측정하여 제 1 유사도가 높은 순서대로 서로 매칭시키는 노드 매칭부를 더 포함할 수 있다. 여기서, 제 1 유사도는 다양한 기준으로 측정될 수 있으며, 예를 들면, 자세히 후술할 바와 같은 코사인 유사도 지수가 될 수 있다. As illustrated in FIG. 1C, the infringement determination unit 130 may further include a node matching unit configured to measure the first similarity with respect to the claims having the same level of nodes and match each other in the order of high first similarity. Here, the first similarity may be measured based on various criteria. For example, the first similarity may be a cosine similarity index as described later in detail.

또한, 침해 판단부(130)는 매칭된 청구항을 요소로 하여 제 2 유사도를 측정함으로써 유사 여부를 판별하는 유사도 측정부를 더 포함할 수 있다. 제 2 유사도는 요소의 레벨, 노드의 개수, 가중치, 제 1 유사도 등을 이용하여 산출되며, 이하에서 자세히 설명한다. In addition, the infringement determination unit 130 may further include a similarity measuring unit for determining whether or not similarity by measuring the second similarity using the matched claim as an element. The second similarity is calculated using the level of the element, the number of nodes, the weight, the first similarity, and the like, and will be described in detail below.

또한, 침해 판단부(130)는 제 2 유사도를 이용하여 제 3 유사도를 도출하여 상기 제 1 특허와 상기 제 2 특허간 특허 침해 가능성을 판단하는 특허 침해 판단부를 더 포함할 수 있다. 여기서, 제 3 유사도는 제 2 유사도에 청구항의 개수를 반영하여 산출되며, 이하에서 자세히 설명한다. In addition, the infringement determination unit 130 may further include a patent infringement determination unit determining a possibility of patent infringement between the first patent and the second patent by deriving a third similarity using the second similarity. Here, the third similarity is calculated by reflecting the number of claims in the second similarity, which will be described in detail below.

제어 유닛(140)은 상술한 데이터 수집부(110), 벡터 구축부(120) 및 침해 판단부(130)의 각 기능이 효율적으로 운용될 수 있도록 각 기능부를 제어한다.
The control unit 140 controls each function unit so that the functions of the above-described data collection unit 110, the vector construction unit 120, and the infringement determination unit 130 can be efficiently operated.

도 1d는 본 발명의 실시예에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 방법의 흐름도이다. 이하에서 서술할 각 단계는 특허 침해 판단 장치(100)의 각 기능부가 주체가 되어 수행할 수 있다.1D is a flowchart of a method for determining patent infringement based on semantic based patent claim analysis according to an embodiment of the present invention. Each step to be described below may be performed by each functional unit of the patent infringement determination apparatus 100 as a subject.

단계 S110에서, 데이터 수집부(110)는 특허 데이터베이스 또는 특허 문서로부터 특허의 특허청구범위에 대한 정보를 추출한다. 여기서, 특허 데이터베이스는 상술한 바와 같이 특허 문서 및 서지적 정보, 예를 들면, 출원일, 출원번호, 출원인, 등록일, 특허청구범위에 대한 정보, 인용하는 발명, 인용되는 발명 등 다양한 정보를 저장하여 온라인으로 접근가능한 데이터베이스가 될 수 있다. 또한, 특허 문서로부터 특허청구범위에 대한 정보를 추출하는 경우 특허 문서는 텍스트 추출이 가능한 특허 문서가 될 수 있다. In step S110, the data collection unit 110 extracts information on the claims of the patent from the patent database or the patent document. Here, the patent database is stored online by storing various information such as patent documents and bibliographic information, for example, application date, application number, applicant, registration date, information about claims, invention cited, invention cited, etc. It can be a database accessible by. In addition, in the case of extracting information on a claim from a patent document, the patent document may be a patent document capable of text extraction.

단계 S120에서, 벡터 구축부(120)는 특허청구범위에 대한 정보를 계층적 SAO 벡터로 변환한다. 계층적 SAO 벡터는 상술한 바와 같이 특허청구범위를 구성하는 특허 청구항들 사이의 인용관계를 트리 구조로 나타내고, 각 특허 청구항의 내용을 SAO 벡터로 표현하며, SAO 벡터는 상황에 맞게 미리 설정된 차원의 벡터가 될 수 있다.In step S120, the vector constructing unit 120 converts the information about the claims into a hierarchical SAO vector. As described above, the hierarchical SAO vector represents a citation relationship between the patent claims constituting the claims as a tree structure, expresses the contents of each patent claim as an SAO vector, and the SAO vector has a dimension set in advance according to the situation. It can be a vector.

단계 S130에서, 침해 판단부(130)는 변환된 계층적 SAO 벡터에 트리 매칭 알고리즘을 적용하고 대상 기술과의 유사도를 측정하여 특허 침해 가능성을 판단한다. 여기서, 트리 매칭 알고리즘은 특허 청구항 매칭 및 구조 조정, 요소 유사도 측정, 최종 유사도 도출의 세 단계를 포함할 수 있으며, 각 단계에서 측정되는 유사도는 서로 다른 식에 의해 측정되는 유사도가 될 수 있다.
In operation S130, the infringement determination unit 130 determines the possibility of patent infringement by applying a tree matching algorithm to the transformed hierarchical SAO vector and measuring similarity with the target technology. Here, the tree matching algorithm may include three steps of patent claim matching and restructuring, element similarity measurement and final similarity derivation, and the similarity measured in each step may be similarity measured by different equations.

본 발명의 실시예에 따르면, 특허간 청구항 비교를 지원하기 위해 의미기반 특허 청구항 분석 방법이 제시된다. 최근에 소프트웨어 기술이 발전하면서 언어학자들은 종래 텍스트 분석 방법을 개선시키고 있다.According to an embodiment of the present invention, a method of analyzing a semantic patent claim is provided to support comparison of claims between patents. Recent advances in software technology have helped linguists to improve traditional text analysis methods.

이러한 방법 중 하나인 의미기반 텍스트 분석은 개별 문장의 절들로부터 다양한 레벨에서 텍스트의 단어 및 구의 상관관계를 검사하며, 문서 의미를 보다 잘 이해시켜주는 의미 기반 구조를 제시할 수 있다. One such method, semantic-based text analysis, can examine the correlation of words and phrases in text at various levels from individual sentence clauses, and can suggest a semantic-based structure that better understands the meaning of a document.

특히, 본 명세서에서의 '의미기반'이란 용어는, 특허 침해를 판단하고 그 위험성을 나타내는 더 적합한 정보를 얻기 위해 다양한 특허 청구항 요소들간 및 특허들간 의존 관계를 파악할 수 있는 방법의 사용을 지칭하는데 사용될 수 있다. In particular, the term 'meaning-based' herein is used to refer to the use of a method that can identify dependencies between various patent claim elements and between patents in order to determine patent infringement and obtain more relevant information indicative of its risk. Can be.

본 발명의 실시예에 따른 방법은 청구항들간 구조화되지 않은 텍스트 정보와 구조화된 종속 관계를 나타내는 계층적 SAO 벡터 및 청구항간 비교를 가능하게 하는 트리 매칭 알고리즘을 기반으로 하며, 이는 침해 예측의 예비단계로서, 전문가가 수작업으로 정독할 특허를 선별할 수 있다.The method according to an embodiment of the present invention is based on a tree matching algorithm that enables comparison between claims and hierarchical SAO vectors representing structured dependency and unstructured text information between claims, which is a preliminary step in predicting infringement. Experts can select patents for manual reading.

본 발명의 실시예에 따라 제안된 방법은 데이터 수집 및 예비적 전처리 단계와, 계층적 SAO 벡터 생성 단계 및 트리 매칭 알고리즘을 이용한 청구항간 특허 비교 단계를 포함할 수 있다. 첫 번째 단계는 데이터 수집 및 예비적 전처리 단계이다. 일단 조사 대상 기술 분야가 선택되면, 관련된 특허는 전자 문서 포맷으로 수집될 수 있다. The proposed method according to an embodiment of the present invention may include a data collection and preliminary preprocessing step, a hierarchical SAO vector generation step and an inter-claim patent comparison step using a tree matching algorithm. The first step is data collection and preliminary preprocessing. Once the technical field to be investigated is selected, related patents can be collected in an electronic document format.

특허발명의 보호범위를 특정하는 청구항은 다른 정보, 예를 들면, 요약서, 발명의 상세한 설명 및 인용 관계 등과 섞여 있는데, 이들은 선택 후 특허 문서를 구문분석하여 제거된다.Claims specifying the protection scope of a patented invention are intermingled with other information, such as a summary, a description of the invention, and citation relationships, which are removed by parsing the patent document after selection.

둘째, 특허 청구항들은 자체 개발된 구조적 텍스트 마이닝을 위한 자바 프로그램을 이용하여 계층적 SAO 벡터로 변환된다. 마지막으로, 키워드 발생과 청구항의 의미기반 관계들을 고려하여 트리 매칭 알고리즘으로부터 개발된 유사도 지표를 이용하여 특허간 청구항들의 비교를 수행함으로써 특허 침해 가능성이 조사된다.Second, patent claims are converted into hierarchical SAO vectors using Java programs for structural text mining developed in-house. Finally, the possibility of patent infringement is investigated by comparing the claims between patents using the similarity index developed from the tree matching algorithm in consideration of keyword generation and semantic-based relationships of the claims.

본 발명의 실시예에 따라 개발된 계층적 SAO 벡터와 트리 매칭 알고리즘은 데이터와 방법론적 제한을 각각 극복함으로써 종래 기술에 따른 방법을 강화할 수 있다. 이러한 방법들은 서로 다른 키워드 리스트들을 통계적으로 검사하고 종래 기술에 따른 키워드 기반 기법과 성능을 비교함으로써 실용성이 검증된다. Hierarchical SAO vectors and tree matching algorithms developed in accordance with embodiments of the present invention may reinforce the method according to the prior art by overcoming data and methodological limitations respectively. These methods are validated by statistically inspecting different keyword lists and comparing their performance with keyword-based techniques according to the prior art.

새로운 기술 데이터베이스와 지능화된 컴퓨팅 알고리즘은 특허 엔지니어링 및 관리가 더 이상 기술 및 관리 전문가에만 의존하지 않음을 의미할 수 있다. 델파이와 같은 판단 및 합의 방법은 제한된 활용에 그치며, 최근에는 연구자와 산업 전문가에게 생산적이고 잘 조직된 정보를 제공하기 위해 지능화된 컴퓨팅 기법을 사용하는 시도가 증가되고 있다. New technology databases and intelligent computing algorithms can mean that patent engineering and management are no longer dependent on technology and management professionals. Delphi's decision-making and consensus methods are of limited use, and in recent years, attempts have been made to use intelligent computing techniques to provide productive and well-organized information to researchers and industry professionals.

종래 기술에 따른 방법과 비교하면, 본 발명의 실시예에 따른 의미기반 특허 청구항 분석 방법은 각 청구항을 고려 대상으로 한다는 차이점이 있다. 이는 특히 특허 무효 상황, 특허 소송 및 R&D 탐색에서 유용하게 사용될 수 있으며, 더 일반적인 모델 개발을 위한 출발점이 될 수 있다. Compared to the method according to the prior art, the semantic-based patent claim analysis method according to an embodiment of the present invention has the difference that each claim is considered. This can be particularly useful in patent invalidation, patent litigation and R & D exploration, and can be a starting point for developing more general models.

이하에서 본 발명의 일반적인 배경, 제안된 기법을 설명한 후 DNA 칩 기술에 대해 실제 본 발명의 실시예를 적용하고 결론을 맺는 순서로 본 발명을 설명한다. Hereinafter, the present invention will be described in order of applying and concluding the embodiment of the present invention with respect to DNA chip technology after explaining the general background of the present invention and the proposed technique.

특허 침해는 문헌적으로는 다양한 정의와 범위를 언급함에도 불구하고, 특허 발명의 무단 생산, 사용 또는 양도와 관련된 일반 용어이며, 특허 분쟁, 민사 소송, 형사 소송, 특허 무효 소송 등 다양한 상황과 관련된다. Patent infringement is a general term related to the unauthorized production, use, or transfer of a patent invention, although it refers to various definitions and scopes in literature, and is related to various situations such as patent disputes, civil litigation, criminal litigation, and patent invalid litigation. .

특허 침해 판단은 청구항 분석에 집중되는데, 청구항은 특허가 기술 용어로 부여하는 보호범위를 정의하며, 특허 침해를 판단하는데 사용되는 체크리스트에서 중요한 역할을 수행한다.Determination of patent infringement concentrates on the analysis of claims, which define the scope of protection afforded by technical terms and play an important role in the checklist used to determine patent infringement.

미국 특허청(USPTO : The United States Patent and Trademark Office)에는 1976년부터 약 6백만개의 특허가 기록되어 있으며, 1년에 그 수가 약 15만건씩 증가하고 있다. 이러한 수는 침해를 판단하기 위한 일반적인 방법인 수작업으로 정독하기에는 현실적으로 불가능한 시간과 노동을 요구한다.The United States Patent and Trademark Office (USPTO) has recorded about 6 million patents since 1976, with an increase of about 150,000 per year. These numbers require time and labor that is practically impossible to peruse by hand, a common way of judging violations.

결과적으로, 전문가의 수작업을 줄이고 이를 시스템화하기 위한 몇 가지 시도들이 수행되었는데, 그 중에 하나는 수작업으로 정독할 필요가 있는 특허를 자동으로 선별하는 것이다. As a result, several attempts have been made to reduce the manual effort and systematize it, one of which is the automatic selection of patents that need to be read by hand.

특허는 크게 두 가지 카테고리인 구조화된 아이템과 구조화되지 않은 아이템으로 구분될 수 있는 분석 아이템을 포함한다. 구조화된 아이템은, 예를 들면, 특허번호, 출원일, 발명자 및 출원인(양수인) 등이며, 일관된 의미 및 형식으로 표현되며, 비구조화된 아이템은, 예를 들면, 발명의 상세한 설명 및 청구항 등이며, 텍스트로 표현되어 있고, 서로 다른 구조와 형식을 가질 수 있다. Patents include analysis items that can be divided into two broad categories: structured items and unstructured items. Structured items are, for example, patent numbers, filing dates, inventors and applicants (assignees), and the like, and are represented in a consistent sense and form, and unstructured items are, for example, detailed descriptions and claims of the invention, It is expressed in text and can have different structures and formats.

특허 침해 판단을 위한 주요 기법들은, 도 2에 도시된 바와 같이, 데이터 타입과 사용된 방법들에 따라 크게 두 가지 카테고리로 나뉠 수 있다. The main techniques for determining patent infringement can be divided into two categories according to the data type and the methods used, as shown in FIG. 2.

첫 번째 그룹은 비구조 아이템을 구조 아이템으로 변형하기 위해 텍스트 마이닝 기법을 활용하였으며, 특허간 유사도를 찾기 위한 특허 키워드 벡터들을 서로 비교함으로써 특허 침해 위험성을 분석하였다.The first group used text mining techniques to transform unstructured items into structural items, and analyzed the patent infringement risk by comparing patent keyword vectors to find similarity between patents.

Yoon, Yoon & Park (2002)(On the development and application of a selforganizing feature map-based patent map. R&D management, 32(4), 291-300.)은 SOFM(selforganizing feature map)을 사용하여 청구항 포인트 맵을 제안하였다. 키워드 벡터는 특허 청구항에서 도출되었고, SOFM 알고리즘에 의해 클러스터 및 시각화되었으며, 청구항 포인트 맵이 2차원 디스플레이에 특허와 관련된 주요 기술을 위치시키도록 하였다. Yoon, Yoon & Park (2002) (On the development and application of a selforganizing feature map-based patent map.R & D management, 32 (4), 291-300.) Claim point maps using a selforganizing feature map (SOFM). Suggested. Keyword vectors were derived from patent claims, clustered and visualized by SOFM algorithms, and allowed the claim point map to place key technologies related to patents on a two-dimensional display.

Huang, Ke & Yang (2008)(Structure clustering for Chinese patent documents. Expert Systems with Applications, 34(4), 2290-2297.)은 중국 특허 문서에 대해 유사한 작업을 하였으며, 특허 문서의 명백한 요소와 청구항의 암시적 구조를 반영한 구조적 SOFM을 제안하였다. Huang, Ke & Yang (2008) (Structure clustering for Chinese patent documents.Expert Systems with Applications, 34 (4), 2290-2297.) Worked similarly on Chinese patent documents, We propose a structural SOFM that reflects the implicit structure.

Lee, Yoon & Park (2009)(An approach to discovering new technology opportunities: Keyword-based patent map approach. Technovation, 29(6/7), 481-497)은 특허의 키워드 벡터를 클러스터하고 시각화하기 위해서 SOFM 대신 PCA(principal component analysis)를 사용하였으며, Bergman et al. (2008)(Evaluating the risk of patent infringement by means of semantic patent analysis: The case of DNA chips. R&D Management, 38(5), 550-562.)은 키워드 벡터기반 기법의 성능을 향상시키기 위해서 의미기반 특허 분석을 제안하였고, 이를 SAO 분석과 통합하여 개념을 확장시켰으며, 특허간 유사도는 동일한 SAO 수로부터 계산될 수 있음을 보였다.Lee, Yoon & Park (2009) (An approach to discovering new technology opportunities: Keyword-based patent map approach.Technovation, 29 (6/7), 481-497) instead of SOFM to cluster and visualize keyword vectors of patents. Principal component analysis (PCA) was used and Bergman et al. (2008) (Evaluating the risk of patent infringement by means of semantic patent analysis: The case of DNA chips.R & D Management, 38 (5), 550-562.) An analysis was proposed, and the concept was extended by integrating it with the SAO analysis, and showed that the similarity between patents can be calculated from the same SAO number.

유사한 기술적 노력이 구조화된 정보를 추출하고(Nishida & Takamatsu, 1982)(Structure-information extraction from patent-claim sentences. Information Processing and Management, 18(1), 1-13.), 특허를 카테고리화하며(Kim & Choi, 2007)(Patent document categorization based on semantic structural information, Information Processing and Management, 43(5), 1200-1215.), 특허 검색하는데(Kang, Na, Kim, & Lee, 2007)(Cluster-based patent retrieval. Information Processing and Management, 43(5), 1173-1182.) 사용되었다. Similar technical efforts extract structured information (Nishida & Takamatsu, 1982) (Structure-information extraction from patent-claim sentences.Information Processing and Management, 18 (1), 1-13.), And categorize patents ( Kim & Choi, 2007) (Patent document categorization based on semantic structural information, Information Processing and Management, 43 (5), 1200-1215.), Searching for patents (Kang, Na, Kim, & Lee, 2007) (Cluster- based patent retrieval.Information Processing and Management, 43 (5), 1173-1182.)

두 번째 기법은 특허 침해 가능성을 나타낼 수 있는 정보를 얻기 위해서 구조화된 특허 아이템(인용, 분류 등)간 관계를 통계적으로 분석하며, 여기서, 특허 침해 가능성은 특허가 선출원 특허를 많이 인용할수록 높아진다. The second technique statistically analyzes the relationship between structured patent items (quotations, classifications, etc.) in order to obtain information that may indicate the possibility of patent infringement, where the probability of patent infringement increases as the patent cites more prior patents.

Sternitzke, Bartkowski, & Schramm (2008)(Visualizing patent statistics by means of social network analysis tools. World Patent Information, 30(2), 115-131.)은 사회 네트워크 분석을 특허 인용 데이터에 적용하여 Rohm(니치아가 소제기한)의 특허가 그 기술 네트워크에서 80개 이상의 특허를 인용하고 있으며, 특히 니치아가 보유한 특허를 많이 인용하고 있음을 보여준다. Kasravi & Risov (2009)(Multivariate patent similarity detection. Proceedings of Hawaii International Conference on System Sciences, 42(2), 1132-1139.)은 텍스트 요소 중에서 인용, 분류 및 키워드와 같은 몇 가지 특허 아이템을 활용하여 주요 기술의 트렌드를 모니터링하며, 유사 특허를 찾고, 기술 이력을 추적하였다. Sternitzke, Bartkowski, & Schramm (2008) (Visualizing patent statistics by means of social network analysis tools.World Patent Information, 30 (2), 115-131.) Applies social network analysis to patent citation data to help Rohm (Nichia The patent claims cited more than 80 patents in the technical network, especially those held by Nichia. Kasravi & Risov (2009) (Multivariate patent similarity detection. Proceedings of Hawaii International Conference on System Sciences, 42 (2), 1132-1139.) Utilizes several patent items, such as citations, classifications, and keywords, among the text elements. Monitor trends in technology, find similar patents and track technology history.

Shin & Park (2005)(Generation and application of patent claim map: text mining and network analysis. Journal of Intellectual Property Rights, 10, 198-205.)은 사회 네트워크 분석을 활용하여 인용과 분류 데이터에 기반한 특허 청구항 맵을 개발하였으며, 여기서, 노드는 특허번호, 분류 또는 출원인(양수인)을 표현하며 전문가에 의해 판단되는 유사도를 나타낸다.Shin & Park (2005) (Generation and application of patent claim map: text mining and network analysis.Journal of Intellectual Property Rights, 10, 198-205.) Utilizes social network analysis to map patent claims based on citation and classification data. Where a node represents a patent number, a classification or an applicant (assignee) and represents a similarity judged by an expert.

이러한 모든 방법들은 어떤 상황에서 유용할 수는 있으나 다음과 같은 한계를 가지고 있기 때문에, 본 발명은 이러한 한계를 극복하기 위해 제시된다. All these methods may be useful in some situations, but have the following limitations, so the present invention is presented to overcome these limitations.

종래 기술에서는 소송 과정에서 핵심이 되는 청구항 데이터가 불충분하게 고려된다(데이터 한계). 몇 가지 연구((Yoon, Yoon, & Park, 2002; Shin & Park, 2005; Huang, Ke, & Yang, 2008))에서는 특허 침해를 검토하기 위해 이들을 사용하기도 하지만, 반구조(semi-structured) 청구항 데이터(구조화되지 않은 요소들은 종속 관계와 같은 구조적 방식으로 결합된다)가 여전히 고려되지 않고 있다. In the prior art, the claim data at the heart of the litigation process is insufficiently considered (data limit). Some studies (Yoon, Yoon, & Park, 2002; Shin & Park, 2005; Huang, Ke, & Yang, 2008) have used them to examine patent infringement, but have semi-structured claims. Data (unstructured elements are combined in the same structural way as dependencies) is still not taken into account.

종래 기술들은 대부분 종래 통계학적 방법들이 불충분하게 적용된 버전이었으며, 일반 목적을 위한 특허 분석을 지원할 수는 있으나, 침해에 집중한 방법은 아니었다(방법론적 한계). 이들은 이해하고 운용하기에는 너무 복잡한 경우가 많으며, 데이터를 잘 다루지 못한다. Most of the prior arts were inadequately applied versions of conventional statistical methods and could support patent analysis for general purposes, but were not focused on infringement (methodological limitations). They are often too complex to understand and operate, and they don't handle data well.

종래 기술에 따른 방법의 성능은 거의 검증되지 못했다. 텍스트 마이닝 성능은 키워드 정의에 따라 변동이 있을 수 있으며, 종래 기술에 따른 방법들은 외부 검증을 받고 산업 전문가에게 유용하게 사용되려면, 강건성을 보다 확보할 필요가 있다(실용성 이슈).
The performance of the method according to the prior art is hardly validated. Text mining performance may vary according to keyword definitions, and prior art methods need to be more robust to be externally verified and useful to industry professionals (utility issues).

이하에서는 본 발명의 실시예에 따른 의미기반 특허 청구항 분석 모델의 기본 개념을 설명한다. 도 1d에 도시된 바와 같이, 본 실시예는 특허 침해 가능성을 판단하기 위해서, 데이터 구문분석 기법, 구조적 텍스트 마이닝 및 트리 매칭 알고리즘과 같은 다양한 방법을 활용한다. Hereinafter, the basic concept of a semantic-based patent claim analysis model according to an embodiment of the present invention will be described. As shown in FIG. 1D, this embodiment utilizes various methods, such as data parsing techniques, structural text mining, and tree matching algorithms, to determine the potential for patent infringement.

종래의 많은 분석 방법들과 복잡한 알고리즘은 실제 잘못된 개념을 이용하고 부정확하게 사용될 수 있는 반면, 본 실시예는 다음과 같은 세 가지 단계를 포함하여 이를 극복할 수 있다. 즉, 데이터 수집 및 전처리 단계와, 청구항의 종속적 의미기반 관계뿐만 아니라 텍스트 정보를 표현할 수 있는 계층적 SAO 벡터로 특허 청구항을 변형하는 단계와, 트리 매칭 알고리즘에 기반하여 유사도 지표를 통해 특허 침해 가능성을 판단하는 단계가 본 실시예에 포함될 수 있다.While many conventional analytical methods and complex algorithms can actually use the wrong concept and be used incorrectly, this embodiment can overcome this by including the following three steps. That is, the data collection and preprocessing step, the step of transforming a patent claim into a hierarchical SAO vector capable of representing textual information as well as the dependent semantic relationship of the claim, and the similarity index based on the tree matching algorithm, The determining may be included in this embodiment.

국가마다 특허 청구항의 기재 요건이 다르지만, 본 실시예는 USPTO 데이터 베이스의 데이터에 근거한다. 이는 다른 국가에 출원되는 특허는 동시에 미국에도 출원되는 경우가 많기 때문에, 미국이 세계에서 가장 큰 시장이며 가장 대표적이기 때문이다. 미국 데이터베이스는 검색 용어 및 신뢰성 측면에서 잘 정비되어 있으며, 역사적으로도 1975년 문서까지 전자 문서로 제공되며, 전체 특허 문서와 연결되는 50개 타이틀이 화면에 출력되도록 하고 있다. Although the requirements for describing patent claims vary from country to country, this embodiment is based on data in the USPTO database. This is because the United States is the world's largest market and the most representative, because patents in other countries are often filed in the United States at the same time. The US database is well-organized in terms of search terms and reliability, and historically provided electronically up to 1975, with 50 titles linked to the entire patent document on screen.

미국에서는 주변한정주의(peripheral claiming)라 불리는 방법이 주로 이용되는데, 이는 청구항이 발명의 주변 및 특허권의 전체 영역을 정의한다는 의미이며, 이러한 개념은 특허 소송에서 매우 중요한 것이다. In the United States, a method called peripheral claiming is mainly used, which means that the claims define the periphery of the invention and the entire scope of the patent, which is very important in patent litigation.

이러한 청구항들은 구조 또는 기능에 따라 서로 연결된 성분으로 나뉠 수 있다. 독립항은 일반적으로 다른 청구항을 참조하지 않으며 자립가능한 청구항이고, 종속항은 일반적으로 발명의 구체적 특징을 표현하며 하나 이상의 다른 독립항 또는 종속항을 인용하는 청구항이다. 서로 다른 청구항간의 인용 관계는 특허 분석 및 법적 보호범위를 정확히 분석하기 위한 중요한 요소가 될 수 있다.
These claims may be divided into components connected to one another according to structure or function. Independent claims are generally self-supporting claims that do not refer to other claims, and dependent claims generally express specific features of the invention and are claims claiming one or more other independent or dependent claims. The citation relationship between different claims can be an important factor in accurately analyzing patent analysis and legal protection.

인터넷 서비스가 발전함에 따라 전자적으로 특허 데이터베이스에 더 쉽게 접근할 수 있다. 선택된 분야의 특허 문서는 관련된 검색 조건에 상응하여 USPTO 데이터베이스로부터 수집된다. 이 단계에서, 어떤 특허 데이터는 비구조화, 즉, 단지 텍스트로만 표현되어 있으며, 청구항이 요약서, 발명의 상세한 설명 및 인용 정보와 같이 불필요한 요소로부터 분리될 수 있도록 처리할 필요가 있다. 여기서, 청구항은 참조(reference) 정보와 상세한 설명 사이에 위치할 수 있으며, 일반적으로 "특허청구범위", "We claim" 또는 "What is claimed is"와 같은 구절로 시작한다. As Internet services evolve, it is easier to access patent databases electronically. Patent documents of the selected field are collected from the USPTO database corresponding to the relevant search conditions. At this stage, some patent data is unstructured, i.e. represented only in text, and needs to be processed so that claims can be separated from unnecessary elements such as abstracts, detailed descriptions of the invention, and citation information. Here, a claim may be placed between reference information and a detailed description and generally starts with phrases such as “claims”, “We claim” or “What is claimed is”.

독립항과 종속항은 인용 문구, 예를 들면, "제n항에 있어서", "the method of claim n wherein", "according to claim n" 등과 같은 문구로 구분되기 때문에, 특허 청구항은 텍스트 정보뿐만 아니라 청구항간 의미기반(즉, 종속성) 관계를 표현하는 계층적 SAO 벡터로 변형될 수 있다. Since the independent claims and the dependent claims are categorized into quoted phrases, such as, for example, "the method of claim n wherein", "according to claim n", and so on, the patent claims are not only textual information, It can be transformed into a hierarchical SAO vector representing semantic (ie, dependency) relationships between claims.

도 3을 참조하면, 노드와 링크가 각각 특허 청구항과 관계에 대응하는 트리 구조가 도시된다. 특히, 특허번호가 루트 노드가 되고, 첫 번째 레벨의 노드는 독립항을 나타내며, 계속되는 레벨의 노드는 종속항을 나타낸다. Referring to FIG. 3, a tree structure is shown in which nodes and links respectively correspond to patent claims and relationships. In particular, the patent number becomes the root node, the node of the first level represents the independent claim, and the node of the subsequent level represents the dependent claim.

다음과 같은 4단계 키워드 추출 과정을 통해 청구항을 구조적 텍스트 마이닝하여 각 노드에 대해 키워드 벡터를 생성할 수 있다. 보조 단어(supplementary word) 제거 단계, 주요 단어(stem word) 특정 단계, 통계적 분석 실행 단계 및 키워드 벡터 생성 단계가 본 방법에 포함될 수 있다. 도 3에 도시된 바와 같이, 청구항이 특정 키워드를 포함하는 부분에 상응하는 관련 벡터 필드에 1이 할당된다. The keyword vector can be generated for each node by structural text mining of the claims through the four-step keyword extraction process as follows. The method may include a supplementary word removal step, a stem word specification step, a statistical analysis step, and a keyword vector generation step. As shown in Fig. 3, 1 is assigned to the associated vector field corresponding to the portion of the claim containing the particular keyword.

특허 청구항 요소들(특허청구범위의 각 청구항들)은 도 3에 도시된 바와 같이, 구조적인 계층적 SAO 벡터로 표현된다. 이는 종속적인 관계를 나타내기 위해 다양한 형태로 표현될 수 있으며, 예를 들면, 루트와 첫 번째 레벨의 노드를 가진 단순 트리 형태부터, 첫 번째 및 그 이후 레벨에서 다중 노드(multiple child node)를 가진 더 복잡한 형태까지 표현될 수 있다.Patent claim elements (each claim in the claims) are represented by a structural hierarchical SAO vector, as shown in FIG. It can be expressed in a variety of forms to represent dependent relationships, for example, from a simple tree with root and first level nodes, with multiple child nodes at the first and subsequent levels. More complex forms can be expressed.

마지막 단계는 이러한 벡터들을 이용하여 특허 침해 가능성을 확인하기 위해 청구항을 서로 비교한다. 도 4를 참조하면, 트리 매칭 알고리즘이 다음과 같은 세가지 서브 단계를 포함하면서 두 특허의 청구항들간 유사도를 측정하는데 사용된다. 서브 단계는 청구항 매칭 및 구조 조정 단계, 요소 유사도 측정 단계 및 최종 유사도 도출 단계를 포함할 수 있으며, 제안된 알고리즘은 다음과 같은 의사 코드로 구현될 수 있다.The final step is to compare the claims with each other to identify possible patent infringement using these vectors. Referring to Figure 4, a tree matching algorithm is used to measure the similarity between the claims of the two patents, including the following three substeps. The sub-steps may include claim matching and restructuring steps, element similarity measurement steps, and final similarity derivation steps, and the proposed algorithm may be implemented with the following pseudo code.

0: CoSim(nodeA, nodeB) {0: CoSim (nodeA, nodeB) {

1: return DotProduct(nodeA,nodeB) / (|NodeA||NodeB|)1: return DotProduct (nodeA, nodeB) / (| NodeA || NodeB |)

2: } // Measuring the cosine similarity between two nodes2:} // Measuring the cosine similarity between two nodes

3: Multi_Sim_Level(patA, patB, simBase) {3: Multi_Sim_Level (patA, patB, simBase) {

4: maxNodeCount = MAX(nodeCount(patA), nodeCount(patB));4: maxNodeCount = MAX (nodeCount (patA), nodeCount (patB));

5: return Multi_Sim(patA, patB, simBase) / maxNodeCount;5: return Multi_Sim (patA, patB, simBase) / maxNodeCount;

6: }6:}

7: Multi_Sim (setA,setB, simBase) {7: Multi_Sim (setA, setB, simBase) {

8: foreach(ai in SetA) {8: foreach (ai in SetA) {

9: foreach(bj in SetB) {9: foreach (bj in SetB) {

10: nodeSim = CoSim(ai, bj) * simBase;10: nodeSim = CoSim (ai, bj) * simBase;

11: if (HasChild(ai) or HasChild(bj)) {11: if (HasChild (ai) or HasChild (bj)) {

12: nodeSim *= 0.5;12: nodeSim * = 0.5;

13: nodeSim += Multi_Sim_Level(chidSet(ai), childSet(bj), simBase *1/2);13: nodeSim + = Multi_Sim_Level (chidSet (ai), childSet (bj), simBase * 1/2);

14: }14:}

15: patSim = nodeSim + Multi_Sim(SetA - ai, SetB - bj, SimBase)15: patSim = nodeSim + Multi_Sim (SetA-ai, SetB-bj, SimBase)

16: }16:}

17: }17:}

18: return patSim;18: return pat Sim;

19: }
19:}

서브 단계 1(S110)은 청구항 매칭 및 구조 조정 단계이다. 청구항들간 서로 다른 종속성은 서로 다른 계층적 SAO 벡터를 생성하며, 이러한 벡터들은 청구항들을 비교하기 위해 조정될 필요가 있다. 독립항, 즉, 첫 번째 레벨의 청구항이 먼저 매칭되며 유사도에 대해 순차적인 종속항이 서로 매칭된 후 이하의 트리 구조의 요소들이 매칭된다. 본 실시예에 따르면, 구조화되지 않은 문서간 유사도를 측정하기 위해 다음과 같은 코사인 유사도 지표를 채택한다.
Sub step 1 S110 is a claim matching and restructuring step. Different dependencies between the claims produce different hierarchical SAO vectors, which need to be adjusted to compare the claims. The independent claim, that is, the claim of the first level, is matched first, and the sequential dependent claims are matched with each other for similarity, and then the elements of the following tree structure are matched. According to this embodiment, the following cosine similarity index is adopted to measure the similarity between unstructured documents.

(1)

(One)

여기서, X는 제 1 특허의 청구항의 SAO 벡터, Y는 제 2 특허의 청구항의 SAO 벡터이다. SAO_j 집합은 (subject, action, object)으로 구성된다. 여기서, subject는 주체와 관련된 키워드, action은 행동과 관련된 키워드, object는 행동의 목적과 관련된 키워드로 정의된다. 예를 들면, 영문의 경우 subject는 array of biopolymers, action은 form, object는 microarray인 SAO 집합은 (array of biopolymers, form, microarray)가 될 수 있다. 키워드 추출부(121)에서는 subject, action, object와 관련된 키워드를 특허청구범위의 각 청구항에서 추출한다. 만약 subject에 해당하는 키워드가 L개, action에 해당하는 키워드가 M개, object에 해당하는 키워드가 N개 추출되었다면, SAO 집합 생성부에서는 이들의 조합을 바탕으로 총 LxMxN개의 SAO 집합을 생성할 수 있다.Where X is the SAO vector of the claims of the first patent and Y is the SAO vector of the claims of the second patent. SAO _j set is composed of (subject, action, object). Here, subject is defined as a keyword related to the subject, action is a keyword related to the action, and object is a keyword related to the purpose of the action. For example, in English, the subject may be an array of biopolymers, an action is a form, and an object is a microarray. The SAO set may be (array of biopolymers, form, microarray). The keyword extraction unit 121 extracts keywords related to subject, action, and object from each claim of the claims. If L keywords for subject, M keywords for action, and N keywords for object are extracted, the SAO set generator can generate a total of LxMxN SAO sets based on these combinations. have.

도 4의 (a)를 참조하면, 특허 A와 B의 계층적 SAO 벡터를 입력 데이터로 이용하는 방법이 도시된다. 도 4의 (b)를 참조하면, 특허 A의 첫 번째 레벨 노드 1, 5, 6이 특허 B의 노드 1, 4와 유사도 측면에서 비교된다.Referring to FIG. 4A, a method of using hierarchical SAO vectors of patents A and B as input data is illustrated. Referring to FIG. 4B, first level nodes 1, 5, and 6 of patent A are compared with nodes 1 and 4 of patent B in terms of similarity.

가장 유사한 매칭은 A1:B1, A6:B4이며, 이후 특허 B 노드에는 해당 레벨에 더 이상의 청구항이 없기 때문에 특허 A의 노드 5는 계층적 SAO 벡터에 추가된 특허 B의 더미 노드 7에 매칭된다. 여기서, 노드 7은 키워드 벡터가 성분이 모두 0인 {0, 0, … , 0}이 될 수 있다. 이와 동일한 과정이 두 특허간 모든 청구항들이 서로 매칭될 때까지 이하의 레벨에서도 적용될 수 있다.
The most similar matches are A1: B1, A6: B4, and since node B of patent B has no further claims at that level, node 5 of patent A matches dummy node 7 of patent B added to the hierarchical SAO vector. Here, node 7 is a keyword vector {0, 0,... , 0}. This same process can be applied at the following levels until all claims between the two patents match each other.

서브 단계 2(S120)는 요소 유사도 측정 단계이다. 도 4의 (c)를 참조하면, 대응되는 청구항 요소의 유사도가 측정되는 방법이 도시된다. 각 청구항 요소 및 각 요소의 각 레벨은 발명의 서로 다른 측면을 나타내기 때문에, 본 발명의 실시예에 따른 모델은 각 레벨에서 노드들 간의 유사도 및 차이점을 반영한다. Sub-step 2 (S120) is an element similarity measuring step. Referring to FIG. 4C, it is shown how the similarity of the corresponding claim elements is measured. Because each claim element and each level of each element represents a different aspect of the invention, the model according to an embodiment of the invention reflects the similarity and differences between nodes at each level.

첫 번째 레벨의 노드들은 가장 일반적인 내용을 포함하며, 그들의 유사도는 가중치 0.5가 부여된다. 두 번째 레벨의 노드들은 첫 번째 레벨의 노드들 내용의 구체화된 요소들을 포함하여 유사도 가중치 0.25가 부여되며, 이후 레벨의 노드들에 대해서도 같은 방식으로 가중치가 부여된다. 여기서, 부여된 가중치는 하나의 예가 될 수 있으며, 다른 수치의 가중치가 본 발명에 적용될 수도 있다. 유사도 레벨이 0부터 최대 1까지 되기 위해서 마지막 레벨에 대해서는 노드 가중치가 이전 레벨 노드의 가중치와 같게 부여된다. 또한 같은 레벨에 있는 요소에 따라 i번째 청구항 요소의 유사도는 다음과 같이 도출될 수 있다.
The nodes of the first level contain the most general content, and their similarity is given a weight of 0.5. Nodes of the second level are weighted with a similarity weight of 0.25, including specified elements of the contents of the nodes of the first level, and weighted in the same way for nodes of the next level. Here, the weight assigned may be one example, and other numerical weights may be applied to the present invention. In order for the similarity level to be from 0 to a maximum of 1, the node weight is given the same weight as the previous level node for the last level. In addition, according to the elements at the same level, the similarity of the i th claim element may be derived as follows.

(2)

여기서, L은 청구항 요소에 포함된 레벨의 수이며, N_j는 j번째 레벨에서의 노드의 수이다. w(j)는 j번째 레벨에 대한 유사도 가중치이며, w(N_j)는 레벨 j에 위치한 노드의 유사도 가중치이다. 대응노드(L_j)간 유사도는 매칭된 노드 간 상기 코사인 유사도 지수 값이다.Where L is the number of levels contained in the claim element and N _j is the number of nodes at the j th level. w (j) is the similarity weight for the j th level and w (N _j ) is the similarity weight of the node located at level j. The similarity between corresponding nodes (L _j ) is the cosine similarity index value between matched nodes.

유사도 측정 방법은 분석 상황에 따라 달라질 수 있으며, 본 발명의 실시예에 따른 분석 방법을 청구항의 유효성 판단에 사용하면 동일 청구항 수를 기초로 유사도 지표를 생성할 수 있고, 수식 (1)과 같은 코사인 지표를 사용하여 전체 유사도를 판단할 수도 있다.The similarity measurement method may vary depending on the analysis situation, and when the analysis method according to an embodiment of the present invention is used to determine the validity of a claim, a similarity index may be generated based on the same number of claims, and a cosine such as Equation (1) may be used. Indicators can also be used to determine overall similarity.

또한, w(j)는 다음과 같이 정해질 수 있다.
In addition, w (j) can be determined as follows.

(3)

서브 단계 3((S130))은 최종 유사도 도출 단계이다. 어떤 특허가 다른 특허를 침해할 가능성은 청구항 요소들의 유사도를 합산하여 도출된다. 여기서 그 합산은 다시 침해 상황에 의존할 수도 있다. Sub step 3 (S130) is a final similarity derivation step. The likelihood of a patent infringing another patent is derived by summing the similarity of the claim elements. The summation here again may depend on the situation of infringement.

최종 유사도 지표는 단순하게 동일한 청구항의 전체 수가 될 수도 있다. 예를 들면, 본 실시예는 첫 번째 레벨의 노드 수의 역수를 가중치로 하여, 최종 유사도를 도출하는데 가중 합산 방식을 사용할 수 있다. 도 4(d)에서, 각 특허는 3개의 첫 번째 레벨의 노드를 가지고 있으므로(더미 노드 7을 포함하여), 가중치는 1/3이다. The final similarity indicator may simply be the total number of identical claims. For example, the present embodiment may use a weighted summation method to derive the final similarity by weighting the inverse of the number of nodes of the first level. In FIG. 4 (d), since each patent has three first level nodes (including dummy node 7), the weight is 1/3.

제 3 유사도인 최종 유사도 지표는 다음과 같이 정의될 수 있다.
The final similarity index, which is the third similarity, may be defined as follows.

(4)

여기서, n은 상기 특허청구범위에 포함된 청구항(요소)의 개수가 될 수 있다. 이러한 유사도 지표의 값이 클수록 두 기술간 유사도가 큰 것으로 판단될 수 있다.Here, n may be the number of claims (elements) included in the claims. It may be determined that the greater the value of the similarity index, the greater the similarity between the two technologies.

제안된 방법에 따라 도출된 유사도는 청구항 레벨뿐만 아니라 키워드의 발생에도 영향을 받는다. 두 개의 특허의 높은 레벨의 청구항이 유사하면 할수록 최종 유사도 지표는 더 높아질 것이다.
The similarity derived according to the proposed method is influenced not only by the claim level but also by the occurrence of keywords. The higher the higher level claims of the two patents are similar, the higher the final similarity indicator will be.

본 발명의 실시예에 따른 분석 방법을 설명하기 위해서 Affymetrix Inc.과 Synteni Inc. 사이에 있었던 소송에서 문제된 DNA 칩 특허에 대한 비교 분석 결과를 제시한다. In order to explain the analysis method according to an embodiment of the present invention, Affymetrix Inc. and Synteni Inc. We present a comparative analysis of the DNA chip patents in question.

(1) 기술 범위, (2) 이 분야에서 특허 침해 발생 회수의 증가 경향, (3) 소송 결과가 잘 정리된 문서, (4) 본 발명의 실시예를 설명하기 위한 특허 수의 적절함 및 (5) 다양한 산업에 대해 차세대 제품 개발을 위한 기술의 중요성을 고려하면 이러한 비교 분석은 적절한 것으로 판단된다.(1) technical scope, (2) increasing trend in the number of patent infringement occurrences in this field, (3) well documented litigation results, (4) appropriateness of the number of patents to describe embodiments of the present invention, and ( 5) Considering the importance of technology for the development of next-generation products for various industries, this comparative analysis seems appropriate.

미국특허청 데이터베이스에서 DNA 칩 관련해서 99개의 특허 문서를 추출하였으며, 이들은 청구항과 제목으로 구문분석하여 분리되었다. 컴파일되는 데이터는 도 5에 도시된 바와 같이 4개의 세트로 구성된다. Ninety-nine patent documents were extracted from the US Patent and Trademark Database regarding DNA chips, which were separated by parsing by claim and title. The compiled data consists of four sets as shown in FIG.

세트 1은 Affymetrix Inc.(US-5807522)와 Synteni Inc. (US-5800992, US-5143853, US-5445934) 사이의 침해 소송과 관련된 4개의 핵심 특허로 구성된다. Set 1 includes Affymetrix Inc. (US-5807522) and Synteni Inc. (US-5800992, US-5143853, US-5445934) consisting of four key patents relating to infringement litigation.

세트 2는 세트 1의 핵심 특허와 직접적으로 관련된 7개의 특허가 추가되어 11개의 확장된 핵심 특허를 구성한다. 추가된 특허 중, US-5695940, US-5525464 및 US-5700637은 Affymetrix Inc.의 다른 두 개의 소송과 관련된 특허이고, US-5763263과 US-5830645는 Synteni의 US-5807522와 매우 비슷한 내용이며, US-5424186과 US-5510270은 Affymetrix의 패밀리 특허의 일부분이다. Set 2 constitutes 11 extended core patents with the addition of seven patents directly related to the core patent of Set 1. Of the additional patents, US-5695940, US-5525464 and US-5700637 are related to two other litigations of Affymetrix Inc., US-5763263 and US-5830645 are very similar to Synteni's US-5807522, and US -5424186 and US-5510270 are part of Affymetrix's family patent.

세트 3은 세트 1의 핵심 특허와 비슷한 주제로 추출된 비슷한 주제를 가진 39개의 특허를 더 포함하여 전체 50개 특허로 구성된다. 세트 4는 비교 목적을 위해 무작위로 추출된 49개의 특허를 더 포함하여 전체 99개의 특허로 구성된다. 이러한 데이터는 DNA 칩 소송에 대한 더 자세한 설명을 포함하는 Bergmann et al. (2008)(Evaluating the risk of patent infringement by means of semantic patent analysis: The case of DNA chips. R&D Management, 38(5), 550-562.)에 소개된 적이 있는 데이터이다.
Set 3 consists of a total of 50 patents, including 39 more patents with similar subject matter extracted from subjects similar to Set 1's core patent. Set 4 consists of a total of 99 patents, further including 49 patents randomly extracted for comparison purposes. Such data include Bergmann et al. (2008) (Evaluating the risk of patent infringement by means of semantic patent analysis: The case of DNA chips.R & D Management, 38 (5), 550-562.).

상술한 바와 같이, 조직화된 문맥이 키워드 리스트 정의 방법을 결정할 것이다. 본 실시예를 실제 상황에 적용하여 검증하기 위한 목적에 맞게, 각 세트 1, 2, 3의 특허들로부터 세 개의 키워드 리스트를 추출하기 위해서 전문가와 텍스트 마이닝 프로그램을 활용하였다. As mentioned above, the organized context will determine how to define the keyword list. For the purpose of verifying this embodiment by applying it to a real situation, an expert and a text mining program were used to extract three keyword lists from each of sets 1, 2, and 3 patents.

세트 1 리스트는 핵심 특허 주제에 가장 맞으며, 세트 2 리스트는 밀접하게 관련된 특허의 용어를 포함하고, 세트 3 리스트는 같은 주제에 대한 특허에서 발견되는 일반적인 용어를 포함한다. 이러한 리스트를 기초로, 특허 청구항은 발명자들이 자체 개발한 자바 구조 텍스트 마이닝 프로그램에 의해 계층적 SAO 벡터로 자동으로 변환된다. 도 6을 참조하면, 특허 US-5807522로부터 도출된 키워드 벡터의 샘플이 도시된다. The set 1 list best fits the core patent subject, the set 2 list contains terms of closely related patents, and the set 3 list contains general terms found in patents on the same subject. Based on this list, patent claims are automatically converted into hierarchical SAO vectors by a Java-structured text mining program developed by the inventors. Referring to FIG. 6, a sample of keyword vectors derived from patent US-5807522 is shown.

99개의 특허가 둘씩 짝을 지어 비교되었으며, 생성할 수 있는 쌍의 수가 매우 크고(99.98/2=4,851) 비교 작업이 복잡해서, 구조적으로나 프로세스적으로도 수작업으로 비교하는 것은 현실적으로 불가능하기 때문에, 특허 쌍간의 유사도를 자동으로 계산하기 위한 자바 기반 프로그램을 개발하였다.99 patents were compared in pairs, and because the number of pairs that can be generated is very large (99.98 / 2 = 4,851), the comparison is complex and it is practically impossible to compare them both structurally and processally. We have developed a Java-based program to automatically calculate the similarity between pairs.

이는 99*99 유사도 행렬로 나타났으며, 공간이 부족해서 전체를 도시하지는 않았으나, 도 7, 도 8 및 도 9는 각 세트 1, 2 및 3로부터 추출한 키워드 리스트에 기초한 세 개의 유사도 행렬의 일부분을 나타낸다.
This is shown as a 99 * 99 similarity matrix, which does not show the entirety due to lack of space, but FIGS. 7, 8, and 9 illustrate portions of three similarity matrices based on keyword lists extracted from sets 1, 2, and 3, respectively. Indicates.

본 발명의 실시예에 의한 방법과 종래의 키워드 벡터기반 기법으로 특허청구범위 간 유사도를 측정하였다. 종래의 키워드 벡터기반 기법의 경우, 특허청구범위에서 키워드를 추출함에 있어서 subject, action, object의 구별 없이 TF-IDF(Term Frequency-Inverse Dopcument Frequency)를 기준으로 키워드를 추출한다. 또한 특허청구범위를 나타내는 키워드 벡터를 구축함에 있어서 개별 특허청구항에 대한 고려 없이 전체 특허청구범위를 하나의 단위로 간주한다. The similarity between the claims was measured by the method according to the embodiment of the present invention and the conventional keyword vector-based method. In the conventional keyword vector-based technique, keywords are extracted based on TF-IDF (Term Frequency-Inverse Dopcument Frequency) without distinguishing between subject, action, and object in extracting keywords from the claims. In addition, in constructing keyword vectors representing claims, the entire claims are regarded as one unit without consideration of individual claims.

유사도 지표에 의한 차이를 통제하기 위하여 종래의 키워드 벡터 기반 기법 역시 코사인 유사도 지표를 사용하여 키워드 벡터 간 유사도를 측정한다. 두 기법의 성능을 비교하기 위하여 Affymetrix Inc. 특허에 대한 몇 가지 분석이 수행되었다. In order to control the difference by the similarity index, the conventional keyword vector based method also measures the similarity between keyword vectors using the cosine similarity index. To compare the performance of the two techniques, see Affymetrix Inc. Several analyzes of patents were performed.

첫째, 세트 1, 2 및 3으로부터 추출된 키워드 리스트에 대해 순위 정확도(ranking similarities)의 관점에서, 본 발명의 실시예에 따른 방법은 핵심 특허와 직접적으로 관련된 세트 2 그룹으로부터 전체 10, 8 및 5개의 특허를 식별하였으며, 이는 각 임계 평균값인 0.078, 0.014 및 0.001을 상회하는 것인 반면, 종래 기술에 따른 기법은 각 8, 6, 5개의 특허만 식별하였다. First, in terms of ranking similarities for keyword lists extracted from sets 1, 2 and 3, the method according to an embodiment of the present invention provides a total of 10, 8 and 5 from a set 2 group directly related to the core patent. Patents were identified, which was above the respective critical mean values of 0.078, 0.014 and 0.001, whereas the prior art technique identified only 8, 6 and 5 patents each.

둘째, 유의성 관점에서, 도 10은 표본 크기가 동일하지 않은 양측 검정 t 테스트(독립 표본 t 검정) 결과를 나타낸다. 즉, 이는 독립 표본 t 검정을 통해 특허 US-5807522와 세트 2에 속한 특허들의 평균 유사도와 특허 US-5807522와 세트 3에만 속한 특허들의 평균 유사도의 차이를 비교한 결과이다. 이에 따르면, 본 발명의 실시예에 따른 방법은 핵심 특허와 이와 직접적으로 관련된 특허(세트 1 및 2) 및 비슷한 주제의 특허(세트 3)의 유사도 사이에 유의한 차이를 나타내는 반면, 종래 기술에 따른 키워드 벡터 기반 기법은 세트 1 키워드 리스트에서만 유의한 차이를 나타내고 있다. Second, in terms of significance, FIG. 10 shows the results of a two-tailed t test (independent sample t test), in which sample sizes are not equal. That is, the result of comparing the average similarity between the patents belonging to patent US-5807522 and set 2 and the average similarity between patents belonging to patent US-5807522 and set 3 only through an independent sample t test. According to this, the method according to the embodiment of the present invention shows a significant difference between the similarity between the core patent and the patents directly related thereto (Set 1 and 2) and the patent of similar subject matter (Set 3), The keyword vector based technique shows a significant difference only in the set 1 keyword list.

여기서, t 검정을 수행하는 목적은, 본 발명의 실시예에 의한 방법이 종래의 키워드 벡터기반 기법에 비하여 세트 2에 속한 특허들과 특허 US-5807522와의 유사도를 세트 3에 속한 특허들과 특허 US-5807522와의 유사도 보다 얼마나 더 높게 평가하는지를 통계적으로 확인하는 것이다. 이를 위해 두 집단의 유사도 평균에 유의한 차이가 있는지 여부를 확인하는 것이다.Here, the purpose of performing the t test is that the method according to the embodiment of the present invention has a similarity between the patents in set 2 and patent US-5807522 compared to the conventional keyword vector-based technique. It is statistically determined how much higher the similarity with -5807522 is. The purpose of this study is to determine whether there is a significant difference in the similarity means of the two groups.

t 검정은 두 집단의 평균이 같은지를 검정하는 통계적 방법으로, 독립 표본 t 검정과 대응 표본 t 검정이 있다. 즉, 독립 표본 t 검정은 두 집단을 독립인 것으로 간주하고 각각의 표본 평균을 계산한 후, 표본 평균의 차이를 이용, 검정하는 방법이다. 만약, 두 집단이 독립이 아니라고 한다면 대응 표본 t 검정을 수행하는 것이 보다 정확한 검정 결과를 얻을 수 있다. 예를 들면, 집단 1은 기존 약을 복용한 환자 20명의 심장 박동수, 집단 2는 신약을 복용한 "같은" 환자 20명의 심장 박동수라고 한다면, 독립 표본 t 검정은 두 집단의 평균 심장 박동수를 각각 계산한 후, 집단 1의 평균과 집단 2의 평균을 검정한다. 대응 표본 t 검정은 환자 20명 각각의 (기존 약 복용시 심장박동수-신약 복용시 심장박동수)를 계산한 후, 차이에 대한 평균을 계산하여, 이 평균이 0과 같은지 다른지를 검정한다. 따라서 기본적으로 대응 표본 t 검정을 수행하기 위해서는 두 집단의 자료 수가 같아야 한다. 본 발명에서는 비교 대상인 세트 2에 속한 특허와 세트 3에 속한 특허의 1) 개수가 다르기 때문에 대응 표본 t 검정을 수행할 수 없으며, 2) 실험적 측면에서도 서로 다른 특허이기 때문에 독립으로 간주하고, 독립 표본 t 검정을 사용하는 것이 보다 적합하다.The t test is a statistical method for testing whether two groups have the same mean. There are an independent sample t test and a corresponding sample t test. In other words, the independent sample t test considers two groups as independent, calculates the mean of each sample, and then uses the difference of the sample means to test. If the two groups are not independent, then a corresponding sample t test can yield more accurate test results. For example, if group 1 is the heart rate of 20 patients taking existing medication, and group 2 is the heart rate of 20 "same" patients taking new medication, the independent sample t-test calculates the average heart rate of the two groups, respectively. The mean of group 1 and the mean of group 2 are then tested. The corresponding sample t-test calculates the heart rate for each of the 20 patients (heart rate at the time of taking the existing drug—heart rate at the time of taking the new drug) and then calculates the mean for the difference to test whether the mean is equal to or different from zero. Therefore, in order to perform the corresponding sample t test, the two groups must have the same number of data. In the present invention, 1) the number of patents belonging to set 2 and 3 belonging to the comparison target is different, and therefore, the corresponding sample t test cannot be performed. It is more appropriate to use the t test.

또한, 종래 기술에 따른 기법은 세트 2와 세트 3의 특허들보다 무작위로 추출된 세트 4와 세트 2의 특허 간 더 높은 유사도를 발견하고 있는데, 이는 수작업으로 정독할 특허를 잘못 선별하는 것이며, 키워드 벡터기반 기법의 열등성을 나타내는 것이다. In addition, the technique according to the prior art finds a higher similarity between the patents of Set 4 and Set 2 which are randomly extracted than the Patents of Set 2 and Set 3, which incorrectly selects the patents to be manually read. It represents the inferiority of vector-based techniques.

마지막으로, 적중률(hit ratio), 즉, 정확히 분류된 경우를 분석하면, 상위 n개의 유사 특허 중 핵심 특허 및 핵심과 직접적으로 관련 있는 특허의 개수를 계산할 때, 도 11, 도 12 및 도 13을 참조하면, 모든 경우에 키워드 벡터기반 기법보다 제안된 의미기반 특허 청구항 분석 방법의 수행 성능이 좋은 것으로 나타났다. 이러한 결과를 종합해 보면, 본 발명의 실시예에 따른 방법은 정확성 및 강건성(robustness) 측면에서 성능이 좋은 것으로 검증되며, 특허 침해 가능성 판단을 위해 보다 정확한 지표를 제시하는 것으로 나타난다.
Finally, when analyzing the hit ratio, that is, the case classified correctly, when calculating the number of core patents and directly related patents among the top n similar patents, FIGS. 11, 12, and 13 For reference, in all cases, the performance of the proposed semantic based patent claim analysis method is better than that of the keyword vector based method. Taken together, the method according to the embodiment of the present invention is verified to have good performance in terms of accuracy and robustness, and appears to provide more accurate indicators for determining the possibility of patent infringement.

상술한 바와 같이 관련 특허 수가 매우 많기 때문에, 특허 침해 위험성을 감지하기 위해 수작업으로 정독하는 것은 시간 소모가 많고 노동 집약적 작업이 되어 비현실적이다. 본 발명의 실시예에 따른 방법은 조사가 더 필요한 침해 가능한 특허를 선별하기 위해 보다 효율적이며 자동화된 방법을 제안한다. As described above, because the number of related patents is very large, manual reading to detect the risk of patent infringement is time consuming and labor intensive, making it impractical. The method according to an embodiment of the present invention proposes a more efficient and automated method for screening infringing patents requiring further investigation.

본 발명의 실시예에 따른 방법은 계층적 SAO 벡터에 기반하며, 이는 특허 청구항간 의미기반(종속적) 관계뿐만 아니라 텍스트 정보까지 분석한다. 이는 청구항간 비교로부터 유사도 지표를 개발할 뿐만 아니라 잠재적 특허 침해를 설명하기 위한 트리 매칭 알고리즘을 제시한다. The method according to an embodiment of the present invention is based on a hierarchical SAO vector, which analyzes textual information as well as semantic (dependent) relationships between patent claims. This not only develops similarity indicators from comparisons between claims, but also presents a tree matching algorithm to account for potential patent infringement.

DNA 칩 기술에 대한 비교 분석 결과 본 발명의 실시예에 따른 방법은 종래 기술에 따른 기법보다 정확성 및 강건성 측면에서 상당한 이점이 있은 것으로 나타난다.Comparative analysis of the DNA chip technology shows that the method according to the embodiment of the present invention has a significant advantage in terms of accuracy and robustness over the technique according to the prior art.

본 발명은 기여도 및 잠재적 유용성 측면에서 주목된다. 첫째, 본 발명의 실시예에 따른 방법은 청구항 간 특허 비교를 통해 특허 침해 위험성을 분석하는 첫 번째 시도로서 기술 관리 이론에 기여할 수 있다. 본 발명은 청구항의 의미기반 관계를 고려하여 특허 키워드 비교에 의존하는 종래기술의 한계를 극복한다. The present invention is noted in terms of contribution and potential utility. First, the method according to an embodiment of the present invention may contribute to the technology management theory as a first attempt to analyze the risk of patent infringement through patent comparison between claims. The present invention overcomes the limitations of the prior art relying on patent keyword comparisons in view of the semantic-based relationship of the claims.

또한, 본 발명은 특허 침해 정보를 추출하고 표현하며 분석하는 성능이 우수한 것으로 나타났으며, 특허 침해 가능성을 빠르게 조사하는데 도움이 되는 자체 개발된 소프트웨어에 의해 이의 기능상 효율성이 강화된다. 또한, 본 발명의 실시예에 따른 방법은 지식 재산 관리 및 R&D 스크리닝과 같은 다양한 연구 분야에 활용될 수 있으며, 보다 일반적인 모델을 개발하는데 출발점이 될 수도 있다. In addition, the present invention has shown excellent performance in extracting, expressing and analyzing patent infringement information, and its functional efficiency is enhanced by self-developed software which helps to quickly investigate the possibility of patent infringement. In addition, the method according to an embodiment of the present invention can be used in various research fields, such as intellectual property management and R & D screening, and may be a starting point for developing a more general model.

둘째, 본 발명의 실시예에 따르면, 방법론적 측면에서, 반구조화된 문서를 구조적 데이터로 변형하기 위해 계층적 SAO 벡터가 개발되고, 이들간 유사도를 측정하기 위해 고안된 알고리즘이 제시된다. 본 발명은 특허 침해에 집중되어 개발되었으나, 이에 한정되지 않으며 다른 목적을 위해 다른 형태의 문서에 적용될 수도 있다.Secondly, according to an embodiment of the present invention, hierarchical SAO vectors are developed for transforming semi-structured documents into structural data, and algorithms designed to measure similarity between them are presented. Although the present invention has been developed focusing on patent infringement, it is not limited thereto and may be applied to other types of documents for other purposes.

웹에서 XML 형태로 표현되는 많은 수의 비즈니스 문서 및 기술 문서들이 아직 분석되지 않았는데, 이러한 점은 비즈니스 또는 기술의 공백을 나타낼 수 있고 새로운 비즈니스를 창출하고 새로운 기술을 개발하는 잠재적 소스로서 중요하며 미래에 연구되어야할 분야이다. 본 발명에 따른 계층적 SAO 벡터와 트리 매칭 알고리즘은 이러한 노력에 도움이 될 수 있다.A large number of business and technical documents, represented in XML form on the Web, have not yet been analyzed, which may represent a gap in business or technology, important as a potential source for creating new businesses and developing new technologies, and in the future This is the area to be studied. Hierarchical SAO vectors and tree matching algorithms in accordance with the present invention can help with this effort.

본 발명이 제시하는 다양한 가능성에도 불구하고, 이러한 연구는 아직 초기 단계이다. 본 발명의 성능을 향상시키기 위해 다음과 같은 점을 고려할 수 있다. 즉, 전문가의 의견이 본 발명의 정확성 및 신뢰성을 상당히 개선시킬 수 있으나, 본 발명은 아직 관리적 및 기술적 판단이 반영되지 않았다. 따라서 SAO 분석과 같은 다른 형태의 방법을 통합하여 본 발명의 정확성을 개선시킬 수 있을 것이다.Despite the various possibilities presented by the present invention, this research is still in its infancy. In order to improve the performance of the present invention, the following points can be considered. That is, although the opinions of experts can significantly improve the accuracy and reliability of the present invention, the present invention has not yet reflected administrative and technical judgments. Thus, other types of methods, such as SAO analysis, may be incorporated to improve the accuracy of the present invention.

또한, 한계 측면에서, 다음과 같은 영역이 향후 연구될 필요가 있다. 첫째, 본 발명의 의미기반 특허 청구항 분석의 성능을 향상시키기 위해서 시스템적 기법이 개발될 필요가 있으며, SAO 분석과 같은 다른 방법들 및 전문가 입력 방법을 통합하는 방안이 자세히 정의될 수 있다. In addition, in view of limitations, the following areas need to be studied in the future. First, in order to improve the performance of the semantic-based patent claim analysis of the present invention, a systematic technique needs to be developed, and a method of integrating other methods such as SAO analysis and expert input method can be defined in detail.

둘째, 특허에 대한 제품 침해에도 본 발명을 적용하기 위해서는, 제품의 구성요소와 특허 정보를 연관시키는 방법 및 제품/기술 트리 및 질적 기능 배치와 같은 분석 툴이 구체적으로 필요하며, 이는 제품과 기술의 특징 및 연관성을 유용하게 나타낼 수 있다.Second, in order to apply the present invention to product infringement against patents, there is a specific need for a method of associating a product's components with patent information, and an analysis tool such as a product / technical tree and a qualitative function arrangement. Features and associations can be usefully represented.

셋째, 다른 유형의 특허 침해에 관련된 다른 지표를 개발하는 것은 본 발명의 유용성 범위를 확장하고 다양화하는데 기여할 것이다. 마지막으로, 다양한 범위를 가지는 기술의 특허에 대해 더 많은 검증을 하면 본 발명의 외부 타당성을 높이는데 도움이 될 수 있다. 하지만 본 발명이 제시하는 정확성과 강건성이 연구 및 실제 상황에 실질적으로 기여하고 있음은 분명하다.
Third, developing other indicators related to different types of patent infringement will contribute to expanding and diversifying the scope of usefulness of the present invention. Finally, more verification of patents of various scopes may help to increase the external validity of the present invention. However, it is clear that the accuracy and robustness proposed by the present invention substantially contribute to the research and actual situation.

그 외 본 발명의 실시예에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 장치에 대한 구체적인 시스템 구성도, 임베디드 시스템, O/S 등의 공통 플랫폼 기술과 통신 프로토콜, I/O 인터페이스 등 인터페이스 표준화 기술 등에 대한 구체적인 설명은 본 발명이 속하는 기술 분야의 통상의 지식을 가진자에게 자명한 사항이므로 생략하기로 한다.
In addition, a detailed system configuration diagram of a patent infringement determination apparatus based on semantic-based patent claim analysis according to an embodiment of the present invention, a common platform technology such as an embedded system, an O / S, and interface standardization technology such as a communication protocol and an I / O interface Detailed description of the present invention will be omitted since it is obvious to those skilled in the art to which the present invention pertains.

본 발명에 따른 의미기반 특허 청구항 분석에 기반한 특허 침해 판단 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 즉, 기록 매체는 컴퓨터에 상술한 각 단계를 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체가 될 수 있다.The patent infringement determination method based on the semantic-based patent claim analysis according to the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer-readable medium. That is, the recording medium may be a computer-readable recording medium having recorded thereon a program for causing a computer to execute the steps described above.

상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합한 형태로 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM, DVD와 같은 광기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.The computer readable medium may include a program command, a data file, a data structure, etc. alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as a hard disk, a floppy disk and a magnetic tape, optical recording media such as CD-ROM and DVD, magnetic recording media such as a floppy disk Optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

또한, 상술한 각 구성 요소는 물리적으로 인접한 하나의 부품으로 구현되거나 서로 다른 부품으로 구현될 수도 있다. 후자의 경우 각 구성 요소는 인접하거나 또는 서로 다른 구역에 위치하여 제어될 수 있으며, 이 경우 본 발명은 각 구성 요소를 제어하는 별도의 제어수단 또는 제어실을 구비하여 유선 또는 무선으로 각 구성요소를 제어할 수도 있다.
In addition, each of the above-described components may be implemented in one physically adjacent component or may be implemented in different components. In the latter case, each component may be controlled by being located adjacent or in different zones. In this case, the present invention is provided with a separate control means or control room for controlling each component to control each component by wire or wirelessly. You may.

상기한 바에서, 각 실시예에서 설명한 각 구성요소 및/또는 기능은 서로 복합적으로 결합하여 구현될 수 있으며, 해당 기술 분야에서 통상의 지식을 가진 자라면 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.
In the above description, the respective components and / or functions described in the embodiments may be combined and combined, and those skilled in the art will understand that the present invention It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

100 : 특허 침해 판단 장치 110 : 데이터 수집부
120 : 벡터 구축부 121 : 키워드 추출부
123 : SAO 집합 생성부 125 : SAO 벡터 생성부
127 : 트리 구조 생성부 129 : 노드 설정부
130 : 침해 판단부 131 : 노드 매칭부
133 : 유사도 측정부 135 : 특허 침해 판단부
140 : 제어 유닛100: patent infringement determination device 110: data collection unit
120: vector construction unit 121: keyword extraction unit
123: SAO set generator 125: SAO vector generator
127: tree structure generation unit 129: node setting unit
130: infringement determination unit 131: node matching unit
133: similarity measurement unit 135: patent infringement determination unit
140: control unit

Claims

A data collection unit for extracting information on the claims of the patent;
A vector constructing unit converting the information on the claims to a hierarchical subject-action-object vector; And
Including a violation determination unit for applying a tree matching algorithm to the transformed hierarchical SAO vector and measuring the similarity with the target technology to determine the possibility of patent infringement,
The vector construction unit,
A tree structure generation unit for constructing a citation relationship of a plurality of claims included in the claims in a tree structure; And
Further comprising a node setting unit for setting the level of the node corresponding to the citation level of the plurality of claims included in the claims,
The infringement determination unit determines a possibility of patent infringement by applying a tree matching algorithm to the transformed hierarchical SAO vector and measuring the similarity between the claims between the first patent and the second patent,
The infringement determination unit may further include a node matching unit configured to measure a first similarity with respect to a claim having the same level of the node and match each other in the order of the highest similarity,
The first similarity is measured by the following cosine similarity index,

X is the SAO vector of the claims of the first patent, Y is the SAO vector of the claims of the second patent,
The infringement determination unit,
And a similarity measuring unit for determining whether or not similarity is obtained by measuring a second similarity defined as the following formula using the matched claim as an element.

Where L is the number of levels contained in the claim element, Nj is the number of nodes at the j-th level, w (j) is the similarity weight for the j-th level, w (Nj) is the similarity weight of the node at level j The similarity between corresponding nodes Lj is the cosine similarity index value between matched nodes.

The method of claim 1,
The vector construction unit,
A keyword extraction unit for extracting SAO related keywords from the claims;
A SAO set generating unit generating a SAO _j (j = 1 to n) set by combining the extracted keywords; And
If it contains SAO _j the claims included in the claims, the generated first, the generated if they do not contain the SAO _j means for allocating zero including SAO vector generation unit for generating a SAO vector based on patent claim Analysis Based on patent infringement judgment device.

delete

The method of claim 1,
And w (N _j ) is an inverse of the number of nodes located at level j (1 / N _j ).

The method of claim 1,
The w (j) is a patent infringement determination apparatus based on the semantic-based patent claim analysis, characterized in that the following formula.

The method of claim 1,
The infringement determination unit,
Based on the meaning-based patent claim analysis further comprising a patent infringement determination unit for determining the likelihood of patent infringement between the first patent and the second patent by deriving a third similarity using the second similarity as follows. Patent infringement judgment device.

Where n is the number of elements included in the claims.

The method of claim 1,
The infringement determination unit,
Apparatus for determining patent infringement based on analysis of semantic-based patent claims, wherein a dummy node is added to the second patent when there is no claim corresponding to the level of the node of the first patent among the claims of the second patent.

13. The method of claim 12,
Apparatus for determining patent infringement based on semantic based patent claim analysis, wherein the SAO vector of the dummy node is a 0 vector.

delete

In the method of determining patent infringement by a patent infringement determination apparatus based on semantic-based patent claim analysis,
Extracting information about the claims of the patent;
Converting information about the claims into a hierarchical subject-action-object (SAO) vector; And
Applying a tree matching algorithm to the transformed hierarchical SAO vector and measuring similarity with a target technology to determine a possibility of patent infringement,
The vector conversion step,
Constructing a citation relationship of a plurality of claims included in the claims in a tree structure; And
Setting a level of the node corresponding to the level of citation of the plurality of claims included in the claims;
Determining the possibility of patent infringement,
Applying a tree matching algorithm to the transformed hierarchical SAO vector and determining the likelihood of patent infringement by measuring the similarity between the claims between the first and second patents,
A first similarity is measured with respect to the claim of the same level of the node, and the first similarity is matched with each other in order of high first,
The first similarity is measured by the following cosine similarity index,

X is the SAO vector of the claims of the first patent, Y is the SAO vector of the claims of the second patent,
The method for determining patent infringement based on a semantic-based patent claim analysis further comprising a similarity measuring unit for determining whether the similarity is determined by measuring a second similarity defined as the following formula using the matched claim as an element.

16. The method of claim 15,
The information extraction step,
Extracting SAO-related keywords from the claims;
Combining the extracted keywords to generate a SAO _j (j = 1 to n) set; And
1, the means based on patent claim analyzed further comprising generating a SAO vector by assigning a 0 if it does not contain the generated SAO _j if it contains SAO _j the claims included in the claims, the generated Based on patent infringement determination method.

delete

16. The method of claim 15,
In determining the possibility of patent infringement,
Determination of patent infringement based on semantic-based patent claim analysis characterized by determining a possibility of patent infringement between the first patent and the second patent by deriving a third similarity defined using the second similarity as follows. Way.

Where n is the number of elements included in the claims.

A program of instructions that can be executed by a digital processing device is tangibly implemented and digitally processed to perform a patent infringement determination method based on the semantic-based patent claim analysis according to any one of claims 15, 16 and 21. Recording medium recording a program that can be read by the device.