KR20110018169A

KR20110018169A - Apparatus and method for forecasting patent trouble

Info

Publication number: KR20110018169A
Application number: KR1020090075828A
Authority: KR
Inventors: 김도전
Original assignee: (주)윕스; 사단법인 한국전자정보통신산업진흥회
Priority date: 2009-08-17
Filing date: 2009-08-17
Publication date: 2011-02-23
Also published as: KR101160995B1

Abstract

PURPOSE: A patent dispute apparatus for acquiring prediction information through an estimating unit is provided to predict the possibility of trouble through various information about the patent. CONSTITUTION: An input data storage unit(110) stores input data by an analysis unit. An output data storage unit(150) stores output data including analysis information about dispute possibility. A map learning unit(120) generates a mining model in order that the input data is classified by the output data. A prediction unit(130) inputs input data about prediction target patent to learned neural network. The prediction unit acquires dispute prediction information.

Description

Apparatus and method for forecasting patent trouble}

본 발명은 특허 정보 이용에 관한 것으로, 보다 상세하게 특허에 대한 다양한 정보를 이용하여 향후 분쟁 가능성을 예측하는 방법 및 장치에 관한 것이다.The present invention relates to the use of patent information, and more particularly, to a method and apparatus for predicting the possibility of future dispute using various information about a patent.

특허 문서는 특허청을 통해 출원 공개 또는 등록 공개되는 문서들을 지칭한다. 이와 같은 특허 문서들은 인터넷을 통해 난립하는 일반적인 정보들과는 달리 특정한 기술적 흐름 또는 특정한 출원인의 연구 개발 방향 등과 같은 동향을 파악하는데 매우 유용하게 사용된다.Patent documents refer to documents that are published or registered through the Patent Office. Such general patent documents are very useful for identifying trends such as a specific technical flow or a specific applicant's research and development direction, unlike general information that is scattered through the Internet.

그러나, 현재 단순히 특허 동향을 조사 분석하여 제공할 뿐, 조사 분석된 정보들을 2차 가공하여 특허 분쟁 가능성을 예측하여 제공하지 못하는 문제가 있다.However, at present, there is a problem in that the patent analysis is merely provided by analyzing and providing the patent trend, and the information analyzed and analyzed is not processed to predict and provide a possibility of patent dispute.

본 발명은 특허에 대한 다양한 정보를 이용하여 분쟁 가능성을 예측하여 제 공할 수 있는 특허 분쟁 예측 장치 및 방법을 제공하기 위한 것이다.The present invention is to provide a patent dispute prediction apparatus and method that can be provided by predicting the possibility of dispute using a variety of information on the patent.

본 발명의 일 측면에 따르면, 특허에 대한 분쟁 가능성을 예측할 수 있는 장치가 제공된다.According to one aspect of the present invention, an apparatus capable of predicting the possibility of dispute over a patent is provided.

본 발명의 실시예에 따르면, 특허권자 정보, 기술분류 정보 또는 이들의 조합을 포함하는 입력 데이터를 분석 단위별로 구분하여 저장하는 입력 데이터 저장부; 분쟁 가능성에 관한 분류 정보를 포함하는 출력 데이터를 상기 분석 단위별로 구분하여 저장하는 출력 데이터 저장부; 및 상기 입력 데이터가 상기 출력 데이터에 맞게 분류되도록 마이닝 모델을 생성하는 지도 학습부를 포함하는 특허 분쟁 예측 장치가 제공될 수 있다.According to an embodiment of the present invention, an input data storage unit for storing input data including patent holder information, technical classification information, or a combination thereof for each analysis unit; An output data storage unit for dividing and storing output data including classification information on a possibility of dispute for each analysis unit; And a guidance learning unit configured to generate a mining model so that the input data is classified according to the output data.

상기 분류 정보는 과거 분쟁 유무이며, 상기 마이닝 모델은 신경망 모델이고, 상기 기술분류 정보는 기술 분류, 특허 분류 및 문헌 벡터를 포함할 수 있다.The classification information is a historical dispute, the mining model is a neural network model, the technical classification information may include a technical classification, patent classification and literature vector.

상기 문헌 벡터는 상기 분석 단위별 특허에 대한 청구항으로부터 추출된 각 키워드 셋의 TF(term frequency)-IDF(inverse document frequency) 값의 크기일 수 있다.The document vector may be a size of a term frequency (TF) -inverse document frequency (IDF) value of each keyword set extracted from a claim for the patent for each analysis unit.

상기 키워드 셋은 상기 분석 단위별 특허에 대한 청구항으로부터 추출된 키워드 셋의 대표화된 유사어를 차원일 수 있다.The keyword set may be a dimension of a representative similar word of a keyword set extracted from a claim for a patent for each analysis unit.

상기 분석 단위는 1특허 또는 복수의 특허이며, 상기 분석 단위가 복수의 특허인 경우, 상기 분석 단위는 동일 특허권자 소유의 특허로써, 패밀리 특허, 인용 관계에 있는 동일 특허권자 소유의 특허, 문헌 벡터간 유사도가 기준값 이상인 동일 특허권자 소유의 특허 중 어느 하나 또는 이들의 조합일 수 있다.The analysis unit is one patent or a plurality of patents, and when the analysis unit is a plurality of patents, the analysis unit is a patent owned by the same patent holder, and the similarity between family patents, patents owned by the same patent owner in a citation relationship, and document vectors. May be any one or a combination of patents owned by the same patent holder with a reference value or more.

상기 분석 단위가 복수의 특허인 경우, 상기 동일 특허권자 소유의 특허를 군집화하고, 상기 군집화된 각 특허에 대한 입력 데이터의 대표값을 산출하여 상기 분석 단위에 상응하는 입력 데이터를 구성할 수 있다.When the analysis unit is a plurality of patents, the patents owned by the same patent holder may be clustered, and a representative value of input data for each of the clustered patents may be calculated to configure input data corresponding to the analysis unit.

상기 대표값은 평균값, 최대값, 최소값 및 합계 중 어느 하나를 이용하여 산출될 수 있다.The representative value may be calculated using any one of an average value, a maximum value, a minimum value, and a sum.

상기 입력 데이터는 적어도 둘 이상의 기술분류를 포함할 수 있다.The input data may include at least two technical classifications.

본 발명의 다른 측면에 따르면, 특허 분쟁 예측 장치가 특허에 대한 분쟁을 예측하는 방법이 제공된다.According to another aspect of the present invention, a method for predicting a dispute over a patent by a patent dispute prediction apparatus is provided.

본 발명의 실시예에 따르면, 특허 분쟁 예측 장치가 특허에 대한 분쟁을 예측하는 방법에 있어서, (a) 분석 단위별 특허에 대한 특허 정보 및 서지 정보 중 하나 이상을 이용하여 분쟁 예측을 위한 분석 단위별 특허에 대한 입력 데이터를 생성하여 데이터베이스에 등록하는 단계; (b) 상기 입력 데이터 중에서 등록된 특허에 상응하는 입력 데이터를 특허의 분쟁 여부를 예측하는 신경망에 입력하여 출력된 분쟁 예측 정보와 상기 등록된 특허에 상응하여 기설정된 분쟁 정보를 이용하여 상기 신경망의 가중치를 변경하여 학습하는 단계; 및 (c) 예측 대상 특허에 상응하는 입력 데이터를 상기 학습된 신경망에 입력하여 상기 예측 대상 특허에 대한 분쟁 예측 정보를 획득하는 단계를 포함하는 특허 분쟁 예측 방법이 제공될 수 있 다.According to an embodiment of the present invention, in a method in which a patent dispute prediction apparatus predicts a dispute over a patent, (a) an analysis unit for dispute prediction using one or more of patent information and bibliographic information on a patent for each analysis unit Generating input data for each patent and registering the same in a database; (b) inputting the input data corresponding to the registered patent among the input data into the neural network for predicting the dispute of the patent, and using the dispute prediction information outputted and the dispute information preset according to the registered patent, Learning by changing weights; And (c) inputting the input data corresponding to the predicted patent to the learned neural network to obtain dispute prediction information about the predicted patent.

상기 (a) 단계는, 연관 규칙 알고리즘을 이용하여 상기 특허의 특허 정보에서 키워드 셋을 추출하는 단계; 상기 추출된 키워드 셋을 이용하여 TF(term frequency) 및 IDF(inverse document frequency)를 산출하는 단계; 및 상기 산출된 TF 및 IDF를 이용하여 상기 특허에 상응하는 문헌 벡터를 구성하는 단계를 포함할 수 있다.Step (a) may include extracting a keyword set from patent information of the patent using an association rule algorithm; Calculating a terminal frequency (TF) and an inverse document frequency (IDF) using the extracted keyword set; And constructing a literature vector corresponding to the patent using the calculated TF and IDF.

상기 분석 단위는 1특허 또는 다특허이며, 상기 분석 단위가 다특허인 경우, 상기 (a) 단계는, If the analysis unit is one patent or multi-patent, and the analysis unit is a multi-patent, step (a),

상기 분석 단위가 다특허인 경우, 상기 (a) 단계는, 상기 특허에 동일 특허권자 소유의 유사 특허를 군집화하는 단계; 및 상기 군집화된 유사 특허에 대한 대표값을 산출하는 단계를 포함할 수 있다.If the analysis unit is a multi-patent, step (a), the step of clustering similar patents owned by the same patent holder to the patent; And calculating representative values for the clustered similar patents.

상기 유사 특허를 추출하는 단계는, 패밀리 특허, 설정된 특허에 상응하여 직간접적으로 인용 또는 피인용된 특허 또는 문헌 벡터의 유사도가 기준값 이상인 특허 중 하나 이상을 유사 특허로써 추출하는 단계이다.The extracting of the similar patent may include extracting, as a similar patent, at least one of a family patent, a patent that is directly or indirectly cited or cited in correspondence with a set patent, or a patent having a similarity of a reference vector or higher.

상기 입력 데이터는 상기 특허에 상응하는 문헌 벡터를 포함하여 복수의 기술 분류를 포함하되, 상기 (b) 단계 및 상기 (c) 단계는 상기 복수의 기술 분류 중 적어도 둘 이상의 기술 분류를 상기 신경망에 입력할 수 있다.The input data includes a plurality of technical classifications, including a literature vector corresponding to the patent, wherein steps (b) and (c) input at least two or more technical classifications of the plurality of technical classifications into the neural network. can do.

본 발명에 따른 특허 분쟁 예측 장치 및 방법을 제공함으로써, 특허에 대한 다양한 정보를 이용하여 분쟁 가능성을 예측하여 제공할 수 있다.By providing a patent dispute prediction apparatus and method according to the present invention, it is possible to predict and provide a possibility of dispute using a variety of information about the patent.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.As the invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all transformations, equivalents, and substitutes included in the spirit and scope of the present invention. In the following description of the present invention, if it is determined that the detailed description of the related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

이하, 본 발명의 실시예를 첨부한 도면들을 참조하여 상세히 설명하기로 한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 특허 분쟁 예측 장치의 내부 구성을 걔략적으로 도시한 블록도이고, 도 2는 본 발명의 실시예에 따른 데이터 등록부의 내부 구성을 개략적으로 도시한 도면이며, 도 3은 본 발명의 실시예에 따른 문헌 벡터를 구성하는 방법을 나타낸 순서도이고, 도 4는 본 발명의 실시예에 따른 유사 특허를 클러스터링하는 방법을 나타낸 순서도이다.1 is a block diagram schematically illustrating an internal configuration of a patent dispute prediction apparatus according to an embodiment of the present invention, and FIG. 2 is a diagram schematically showing an internal configuration of a data registration unit according to an embodiment of the present invention. 3 is a flowchart illustrating a method of constructing a document vector according to an embodiment of the present invention, and FIG. 4 is a flowchart illustrating a method of clustering similar patents according to an embodiment of the present invention.

도 1을 참조하면, 본 실시예에 따른 특허 분쟁 예측 장치(100)는 입력 데이터 저장부(110), 지도 학습부(120), 예측부(130), 보정부(140) 및 출력 데이터 저장부(150)를 포함하여 구성된다.Referring to FIG. 1, the patent dispute prediction apparatus 100 according to the present embodiment includes an input data storage unit 110, a map learning unit 120, a prediction unit 130, a correction unit 140, and an output data storage unit. And 150.

입력 데이터 저장부(110)는 분석 단위별 특허에 대한 특허 정보 및 서지 정보 중 하나 이상을 이용하여 분쟁 예측을 위한 입력 데이터를 가공하고, 가공된 입력 데이터를 데이터베이스(미도시)에 등록하는 기능을 수행한다.The input data storage unit 110 processes input data for dispute prediction by using one or more of patent information and bibliographic information on patents for each analysis unit, and registers the processed input data in a database (not shown). To perform.

예를 들어, 본 실시예에 따른 특허 분쟁 예측 장치(100)에서 생성하는 마이닝 모델이 신경망 모델인 경우, 입력 데이터는 출원일, 등록일, 청구항 수, 독립항수, 기술 분류, IPC와 같은 특허 분류, 문헌 벡터, 특허지수, 특허 소멸 여부, 권리 이전 정보, 특허권자 정보 등을 포함할 수 있다.For example, when the mining model generated by the patent dispute prediction apparatus 100 according to the present embodiment is a neural network model, the input data may be a filing date, a registration date, a claim number, an independent claim number, a technical classification, a patent classification such as an IPC, and a document. It may include a vector, a patent index, patent expiry status, rights transfer information, patent holder information, and the like.

본 실시예에서는 기술분류, 특허 분류 및 문헌 벡터를 기술분류 정보라 칭하기로 한다.In this embodiment, technical classification, patent classification, and document vector will be referred to as technical classification information.

또한, 본 실시예에서 특허 정보는 특허의 특허 문서에서 획득될 수 있는 정보인 것으로 칭하며, 이외의 해당 특허에 상응하여 각국 특허청에서 발행되거나 각 기업에 의해 획득되는 정보를 모두 통칭하여 서지정보라 칭하기로 한다. In addition, in this embodiment, the patent information is referred to as information that can be obtained from the patent document of the patent, and all the information issued by the respective patent offices or acquired by each company corresponding to the corresponding patent other than collectively referred to as bibliographic information Shall be.

또한, 입력 데이터는 특허권자 정보를 포함할 수 있으며, 특허권자 정보는 특허권자 명, 특허권자의 출원건수 또는/및 등록건수, 기술이전 건수, 과거 분쟁 건수 등을 포함할 수 있다.In addition, the input data may include patent holder information, and the patent holder information may include the name of the patent holder, the number of patent holders' and / or registrations, the number of technology transfers, and the number of past disputes.

다른 예를 들어, 마이닝 모델이 의사결정트리 모델과 같은 선형모델인 경우, 입력 데이터는 출원 체류기간(즉, 출원일과 등록일 사이의 기간), IPC-기술 분류 대응 확률값(전체 IPC 중 특정 IPC가 특정 기술 분류에 속하는 비율) 등과 같이 변수 상호간의 관계를 나타내는 별도의 변수를 더 포함할 수 있다.In another example, if the mining model is a linear model, such as a decision tree model, the input data may include the application stay period (i.e., the period between the filing date and the registration date), the IPC-technical classification corresponding probability value (the specific IPC of the entire IPC is specified). It may further include a separate variable representing the relationship between the variables, such as the ratio belonging to the technical classification).

또한, 입력 데이터는 설정되는 분석 단위에 따라 하나의 특허에 대한 특허 정보 및 서지 정보를 이용하여 가공될 수도 있으며, 복수의 특허에 대한 특허 정보 및 서지 정보를 이용하여 가공될 수도 있다. 즉, 분석 단위는 1특허 또는 복수의 특허로 설정될 수 있다.In addition, the input data may be processed using patent information and bibliographic information on a single patent, and may be processed using patent information and bibliographic information on a plurality of patents according to an analysis unit to be set. That is, the analysis unit may be set to one patent or a plurality of patents.

도 2를 참조하여 입력 데이터 저장부(110)에 대해 보다 상세하게 설명하기로 한다.The input data storage unit 110 will be described in more detail with reference to FIG. 2.

입력 데이터 저장부(110)는 도 2에서 보여지는 바와 같이, 설정 모듈(210), 벡터 산출 모듈(220) 및 등록 모듈(230)을 포함하여 구성된다.As shown in FIG. 2, the input data storage unit 110 includes a setting module 210, a vector calculating module 220, and a registration module 230.

설정 모듈(210)은 특허 분석 단위를 설정하는 기능을 수행한다. 즉, 설정 모듈(210)은 사용자에 의해 입력된 정보에 상응하여 분석 단위를 설정한다. 예를 들 어, 분석 단위는 1특허 또는 다특허 중 어느 하나로 설정된다.The setting module 210 performs a function of setting a patent analysis unit. That is, the setting module 210 sets the analysis unit in accordance with the information input by the user. For example, the analysis unit is set to either one patent or multiple patents.

1특허는 하나의 특허에 상응하여 입력 데이터를 가공하여 등록하는 것을 나타내며, 다특허는 복수의 특허를 이용하여 입력 데이터를 가공하여 등록하는 것을 나타낸다.One patent refers to processing and registering input data corresponding to one patent, and multiple patents refers to processing and registering input data using a plurality of patents.

설정된 분석 단위에 따라 등록 모듈(230)은 하나의 특허 또는 복수의 특허에 상응하여 입력 데이터를 등록할 수 있다. The registration module 230 may register input data corresponding to one patent or a plurality of patents according to the set analysis unit.

본 실시예에서는 입력 데이터가 특허 정보 및 서지 정보 중 하나 이상을 이용하여 가공되어 등록되는 것을 가정하여 설명하나 이외에도 사용자에 의해 입력될 수도 있다.In the present embodiment, it is assumed that the input data is processed and registered using at least one of patent information and bibliographic information. However, the input data may be input by the user.

벡터 산출 모듈(220)은 각 특허에 상응하여 문헌 벡터를 구성하는 기능을 수행한다. 도 3을 참조하여 벡터 산출 모듈(220)이 문헌 벡터를 구성하는 방법에 대해 설명하기로 한다.The vector calculation module 220 performs a function of constructing a document vector corresponding to each patent. Referring to FIG. 3, a method of configuring the document vector by the vector calculation module 220 will be described.

벡터 산출 모듈(220)은 문헌 벡터를 구성하기 위해, 각 특허에 대한 특허 정보(예를 들어, 청구항)에 대해 미리 정해진 방법에 따라 키워드 셋을 추출한다(단계 310). 물론, 키워드 셋은 청구항 이외에도 다양한 특허 정보를 이용하여 추출될 수 있으나, 본 명세서에서는 이해와 설명의 편의를 도모하기 위해 키워드 셋이 특허의 청구항을 이용하여 추출되는 것을 가정하여 설명하기로 한다.예를 들어, 키워드 셋은 연관 규칙 알고리즘을 이용하여 추출되며, 연관 규칙 알고리즘은 apriori 알고리즘, aprioriTID 알고리즘, aprioriHybrid 알고리즘, DHP 알고리즘 중 어느 하나일 수 있다. 연관 규칙 알고리즘은 당업자에게는 자명한 사항이므로 이에 대해 서는 별도의 설명은 생략하기로 한다.The vector calculation module 220 extracts a keyword set according to a predetermined method for patent information (eg, a claim) for each patent in order to construct a literature vector (step 310). Of course, the keyword set may be extracted using various patent information in addition to the claims, but in the present specification, it is assumed that the keyword set is extracted using the claims of the patent for convenience of understanding and explanation. For example, a keyword set is extracted using an association rule algorithm, and the association rule algorithm may be any one of an apriori algorithm, aprioriTID algorithm, aprioriHybrid algorithm, and DHP algorithm. Since the association rule algorithm is obvious to those skilled in the art, a separate description thereof will be omitted.

또한, 키워드 셋은 설정 조건에 따라 상이하게 추출될 수 있다. 여기서, 설정 조건은 문장 단위 및 절 단위 중 어느 하나일 수 있다.Also, the keyword set may be extracted differently according to the setting condition. Here, the setting condition may be any one of a sentence unit and a clause unit.

만일 문장 단위로 설정된 경우, 키워드 셋은 각 특허의 각 청구항을 기본 단위로 하여 추출된다. 그러나 설정 조건이 절 단위인 경우, 키워드 셋은 청구항 내에 정해진 기호(예를 들어, 콤마, 세미콜론, 콜론 등)에 따라 각 청구항을 분리하여 추출된다. If set in sentence units, a keyword set is extracted based on each claim of each patent as a base unit. However, if the set condition is a clause unit, the keyword set is extracted separately from each claim according to a symbol (eg, comma, semicolon, colon, etc.) defined in the claim.

또한, 키워드 셋은 워드(word) 단위로 추출될 수도 있으며, 구(pharse) 단위로 추출될 수도 있다.Also, the keyword set may be extracted in a word unit or in a phrase unit.

연관 규칙 알고리즘을 이용하여 키워드 셋을 추출하는 방법은 당업자에게는 자명한 사항이므로, 키워드 셋을 추출하는 상세한 방법에 대해서는 설명을 생략하기로 한다. Since a method of extracting a keyword set using an association rule algorithm is obvious to those skilled in the art, a detailed description of a detailed method of extracting a keyword set will be omitted.

키워드 셋이 추출되면, 벡터 산출 모듈(220)은 추출된 키워드 셋을 이용하여 TF-IDF를 산출하고, 이를 이용하여 해당 특허에 상응하는 문헌 벡터를 구성한다(단계 320).When the keyword set is extracted, the vector calculation module 220 calculates the TF-IDF using the extracted keyword set, and constructs a document vector corresponding to the patent using the extracted keyword set (step 320).

즉, 벡터 산출 모듈(220)은 해당 특허의 특허 정보에서 키워드 셋이 나타나는 빈도를 산출한다. 이를 TF(term frequency)라 칭하기로 한다. That is, the vector calculation module 220 calculates the frequency of occurrence of the keyword set in the patent information of the patent. This will be referred to as TF (term frequency).

이어, 벡터 산출 모듈(220)은 키워드 셋이 존재하는 각 특허의 빈도를 산출하는데, 이를 DF(document frequency)라 칭하기로 한다. 여기서, 벡터 산출 모듈(220)은 빈도가 낮은 키워드 셋에 대해 높은 값으로 산출되도록 조정하고, 빈도 가 높은 키워드 셋에 대해 낮은 값이 산출되도록 각각 조정하여 IDF(inverse document frequency)를 산출한다. Subsequently, the vector calculation module 220 calculates the frequency of each patent for which the keyword set exists, which will be referred to as a document frequency (DF). Here, the vector calculation module 220 calculates an inverse document frequency (IDF) by adjusting to generate a high value for a keyword set with a low frequency and adjusting a low value for a keyword set with a high frequency.

이와 같이, IDF가 산출되면, 벡터 산출 모듈(220)은 TF에 IDF를 곱하여 각 키워드 셋의 가중치를 산출한다.As such, when the IDF is calculated, the vector calculation module 220 calculates the weight of each keyword set by multiplying the TF by the IDF.

이어, 벡터 산출 모듈(220)은 가중치 집합을 문헌 벡터로써 구성할 수 있다.Subsequently, the vector calculation module 220 may configure the weight set as a document vector.

여기서, 벡터 산출 모듈(220)은 키워드 셋이 너무 방대해지는 경우, 시스템의 전체적인 처리 속도를 저하시키며, 문헌 벡터가 왜곡되어 구성되므로, 이를 방지하기 위해 기등록된 유사어 정보를 이용하여 키워드 셋을 그룹화하여 클러스터링할 수 있다(단계 330).Here, the vector calculation module 220 reduces the overall processing speed of the system when the keyword set becomes too large, and since the document vector is distorted, the keyword set is grouped using the similarly registered similar information to prevent this. Clustering (step 330).

예를 들어, 벡터 산출 모듈(220)은 유사어 정보를 이용하여 키워드 셋을 그룹화하고, IDF가 임계치 이상인 키워드 셋을 이용하여 문헌 벡터를 구성할 수 있다.For example, the vector calculation module 220 may group keyword sets using similar word information and construct a literature vector using keyword sets having an IDF greater than or equal to a threshold.

또한, 벡터 산출 모듈(220)은 구성된 문헌 벡터를 미리 정해진 알고리즘을 이용하여 클러스터링할 수 있다. 예를 들어, 벡터 산출 모듈(220)은 K-mean, K-medoids, cliqeue, Proclus, pcluster 중 어느 하나의 알고리즘을 이용하여 구성된 문헌 벡터를 클러스터링할 수 있다. 이들 알고리즘은 당업자에게는 자명한 사항이므로 이에 대한 별도의 설명은 생략하기로 한다. In addition, the vector calculation module 220 may cluster the configured document vector using a predetermined algorithm. For example, the vector calculation module 220 may cluster the document vector constructed using any one algorithm of K-mean, K-medoids, cliqeue, Proclus, and pcluster. Since these algorithms are obvious to those skilled in the art, a separate description thereof will be omitted.

등록 모듈(230)은 설정된 분석 단위에 상응하여 특허의 특허 정보 및 서지 정보를 이용하여 입력 데이터를 가공하여 데이터베이스에 등록하는 기능을 수행한다.The registration module 230 processes the input data using the patent information and the bibliographic information of the patent corresponding to the set analysis unit, and registers the input data in the database.

등록 모듈(230)은 문헌 벡터를 제외한 나머지 입력 데이터를 모두 가공하여 데이터베이스에 등록할 수 있다. The registration module 230 may process all the input data except for the literature vector and register it in the database.

예를 들어, 등록 모듈(230)은 분석 단위가 1특허인 경우, 해당 특허에 대한 특허 정보 및 서지 정보를 이용하여 입력 데이터를 생성하여 데이터베이스에 등록할 수 있다. For example, when the analysis unit is one patent, the registration module 230 may generate input data by using patent information and bibliographic information on the patent and register it in a database.

그러나, 등록 모듈(230)은 분석 단위가 복수의 특허인 경우, 설정된 특허에 상응하는 유사 특허를 추출하고, 이를 군집화여 데이터베이스에 등록할 수 있다. 이하, 도 4를 참조하여 이에 대해 보다 상세히 설명하기로 한다.However, when the analysis unit is a plurality of patents, the registration module 230 may extract similar patents corresponding to the set patents, and register them in a clustered database. Hereinafter, this will be described in more detail with reference to FIG. 4.

단계 410에서 등록 모듈(230)은 설정된 특허에 상응하는 유사 특허를 데이터베이스에서 추출한다. 예를 들어, 등록 모듈(230)은 설정된 특허의 패밀리 특허, 동일 특허권자(또는 발명자)의 특허, 설정된 특허에 상응하여 직간접적으로 인용 또는 피인용된 특허 또는 문헌 벡터의 유사도가 기준값(예를 들어, 0.9) 이상인 특허 중 하나 이상의 특허를 유사 특허로써 추출할 수 있다.In operation 410, the registration module 230 extracts a similar patent corresponding to the set patent from the database. For example, the registration module 230 may provide a reference value (for example, similarity between a family patent of a set patent, a patent of the same patent owner (or inventor), a patent or a literature vector that is directly or indirectly cited or cited according to the set patent). , 0.9) or more patents can be extracted as similar patents.

단계 420에서 등록 모듈(230)은 유사 특허들을 군집화한다.In step 420, registration module 230 clusters similar patents.

예를 들어, 등록 모듈(230)은 추출된 유사 특허에 상응하는 입력 데이터들을 각각 추출한다. 그리고, 등록 모듈(230)은 유사 특허에 상응하여 추출된 각 입력 데이터들을 대표화한다.For example, the registration module 230 extracts input data corresponding to the extracted similar patent, respectively. And, the registration module 230 represents each input data extracted corresponding to the similar patent.

예를 들어, 등록 모듈(230)은 각각 추출된 입력 데이터들은 최소값, 평균값, 최대값 및 합계 중 하나 이상을 이용하여 대표화할 수 있다. 물론, 이외에도 중간값 과 같이 다양한 방법을 이용하여 대표화할 수 있음은 당연하다.For example, the registration module 230 may represent each extracted input data using one or more of a minimum value, an average value, a maximum value, and a sum. Of course, in addition to the median it can be represented using a variety of methods.

단계 430에서 등록 모듈(230)은 군집된 특허들을 비교하여 중복을 제거한다. 즉, 등록 모듈(230)은 군집된 특허들이 완전하게 동일한 경우 중복된 것으로 판단하여 이를 제거한다.In operation 430, the registration module 230 compares the clustered patents and removes duplicates. That is, the registration module 230 determines that the clustered patents are duplicated if they are completely identical and removes them.

다시, 도 1을 참조하여, 입력 데이터 저장부(110)는 분석 조건별 특허의 특허 정보 및 서지 정보 중 하나 이상을 이용하여 입력 데이터를 가공하여 데이터베이스에 등록할 수 있다.Again, referring to FIG. 1, the input data storage unit 110 may process input data by using one or more of patent information and bibliographic information of patents for each analysis condition and register the input data in a database.

또한, 본 실시예에 따른 입력 데이터 저장부(110)는 생성된 입력 데이터를 정해진 방법에 따라 정규화하여 데이터베이스에 등록할 수도 있다. 예를 들어, 입력 데이터는 0~1 사이의 값으로 정규화되어 데이터베이스에 등록될 수 있다.In addition, the input data storage unit 110 according to the present embodiment may normalize the generated input data according to a predetermined method and register it in a database. For example, the input data can be normalized to a value between 0 and 1 and registered in the database.

지도 학습부(120)는 입력 데이터 저장부(110)에 의해 등록된 입력 데이터 들 중에서 등록 특허에 상응하는 입력 데이터들을 형성된 신경망(neural network)에 입력하여 해당 신경망을 학습시키는 기능을 수행한다. 신경망의 동작에 대해서는 당업자에게는 자명한 사항이므로 이에 대한 별도의 설명은 생략하기로 한다. 본 실시예에서는 지도 학습부(120)가 마이닝 모델로써 신경망을 이용하는 것을 가정하여 이를 중점으로 설명하나 이외에도 의사결정트리 모델과 같은 선경 모델을 이용할 수도 있다.The supervised learning unit 120 performs a function of learning the neural network by inputting input data corresponding to a registered patent among input data registered by the input data storage unit 110 to a formed neural network. Since the operation of the neural network is obvious to those skilled in the art, a separate description thereof will be omitted. In the present embodiment, it is assumed that the map learning unit 120 uses a neural network as a mining model. However, the map learning unit 120 may also use a predecessor model such as a decision tree model.

즉, 지도 학습부(120)는 입력 데이터 저장부(110)에 등록된 입력 데이터가 하기에서 설명되는 출력 데이터 저장부(150)에 등록된 출력 데이터에 맞게 분류되도록 마이닝 모델(예를 들어, 신경망)을 생성하고, 이를 학습시키는 기능을 수행한다.That is, the map learning unit 120 may include a mining model (eg, a neural network) such that input data registered in the input data storage unit 110 is classified according to output data registered in the output data storage unit 150 described below. ) And create a function to train it.

지도 학습부(120)는 등록 특허에 상응하는 입력 데이터를 형성된 신경망에 입력하여 출력된 분쟁 예측 정보와 당해 등록 특허에 상응하여 기설정된 분쟁 정보를 이용하여 신경망의 각 연결정보에 대한 가중치를 변경하여 당해 신경망을 학습시킨다. The supervising learning unit 120 inputs input data corresponding to the registered patent to the formed neural network to change the weight of each connection information of the neural network by using the dispute prediction information outputted and the dispute information preset according to the registered patent. Train the neural network.

이외에도 지도 학습부(120)는 분쟁 가능성에 관한 분류 정보를 포함하는 출력 데이터에 상응하도록 당해 입력 데이터에 저장부(110)에 등록된 입력 데이터를 이용하여 신경망을 학습시킬 수도 있다. 여기서, 분류 정보는 과거 분쟁 유무, 경고장 발/착송 정보, 표준 특허 및 유망 특허 중 하나 이상일 수 있다.In addition, the map learning unit 120 may train the neural network using input data registered in the storage unit 110 in the input data so as to correspond to output data including classification information regarding the possibility of dispute. Here, the classification information may be one or more of past disputes, warning letter issued / sent information, standard patents and promising patents.

또한, 지도 학습부(120)는 입력 데이터 중에서 적어도 둘 이상의 기술분류 정보를 이용하여 신경망을 학습시킬 수 있다. 전술한 바와 같이, 기술분류 정보는 기술 분류, 특허 분류 및 문헌 벡터로써, 해당 기술분류 정보는 최소 두개의 조합으로 신경망에 입력된다. 즉, 기술 분류 및 특허 분류, 특허 분류 및 문헌 벡터, 문헌 벡터 및 기술 분류가 동시에 신경망에 입력되어 학습된다.In addition, the map learning unit 120 may train the neural network using at least two technical classification information from the input data. As described above, the technical classification information is a technical classification, a patent classification, and a literature vector, and the technical classification information is input to the neural network in at least two combinations. That is, technology classification and patent classification, patent classification and document vector, document vector, and technology classification are simultaneously input to the neural network to be learned.

예측부(130)는 지도 학습부(120)에 의해 신경망 학습이 완료된 후, 예측 대상 특허에 상응하는 입력 데이터를 학습된 신경망에 입력하여 분쟁 예측 정보를 획득한다.After the neural network learning is completed by the map learning unit 120, the prediction unit 130 obtains dispute prediction information by inputting input data corresponding to the predicted patent to the learned neural network.

여기서, 예측부(130)는 예측 대상 특허에 상응하는 입력 데이터 중에서 분류 데이터는 복수의 분류 데이터를 함께 학습된 신경망에 입력하여 분쟁 예측 정보를 획득한다. Here, the prediction unit 130 obtains dispute prediction information by inputting a plurality of classification data into a neural network learned together among the classification data among the input data corresponding to the patent to be predicted.

보정부(140)는 예측부(130)에 의해 획득된 분쟁 예측 정보를 미리 정해진 방 법에 따라 보정하는 기능을 수행한다. 예를 들어, 보정부(140)는 오분류 행렬을 이용하여 신경망에 대한 정확도를 산출하고, 산출된 정확도를 예측부(130)에서 획득된 분쟁 예측 정보에 곱하여 보정한다. 본 실시예에서 오분류행렬(misclassification matrix)은 당업자에게 자명한 사항이므로 이를 이용하여 신경망에 대한 정확도를 산출하는 방법에 대한 별도 설명은 생략하기로 한다.The correction unit 140 corrects the dispute prediction information obtained by the prediction unit 130 according to a predetermined method. For example, the correction unit 140 calculates the accuracy of the neural network using the misclassification matrix, and corrects the calculated accuracy by multiplying the calculated accuracy by the dispute prediction information obtained by the prediction unit 130. In this embodiment, since the misclassification matrix is obvious to those skilled in the art, a separate description of a method for calculating the accuracy of the neural network using the same will be omitted.

출력 데이터 저장부(150)는 분류 단위별 특허에 대한 출력 데이터가 저장된다. 출력 데이터는 분쟁 가능성에 관한 분류 정보를 포함한다. 분류 정보는 과거 분쟁 유무, 과거 소송 유무, 심판 유무, 당사자계 재심사 유무, 경고 유무 등을 포함할 수 있다. The output data storage unit 150 stores the output data for the patent for each classification unit. The output data includes classification information regarding the likelihood of dispute. The classification information may include whether there is a past dispute, a past litigation, a judgment, whether a party re-examination, a warning or the like.

도 5는 본 발명의 실시예에 따른 특허 분쟁 예측 장치가 특허 분쟁을 예측하는 방법을 나타낸 순서도이다. 이하에서 설명되는 각각의 단계는 특허 분쟁 예측 장치의 각각의 내부 구성 요소에 의해 수행되어지나 이해와 설명의 편의를 도모하기 위해 특허 분쟁 예측 장치로 통칭하여 설명하기로 한다.5 is a flowchart illustrating a method of predicting a patent dispute by a patent dispute prediction apparatus according to an embodiment of the present invention. Each step described below is performed by each internal component of the patent dispute prediction apparatus, but will be collectively described as a patent dispute prediction apparatus for the convenience of understanding and explanation.

단계 510에서 특허 분쟁 예측 장치(100)는 분석 조건별 특허에 대한 특허 정보 및 서지 정보 중 하나 이상을 이용하여 분쟁 예측을 위한 입력 데이터를 가공하여 데이터베이스에 등록한다. 이는 이미 전술한 바와 동일하므로 이에 대한 별도의 설명은 생략하기로 한다.In operation 510, the patent dispute prediction apparatus 100 processes input data for dispute prediction using one or more of patent information and bibliographic information on a patent for each analysis condition and registers it in a database. Since this is the same as described above, a separate description thereof will be omitted.

단계 515에서 특허 분쟁 예측 장치(100)는 등록 특허에 대한 입력 데이터를 이용하여 신경망을 학습시킨다.In operation 515, the patent dispute prediction apparatus 100 trains a neural network using input data of a registered patent.

전술한 바와 같이, 등록 특허에 대한 입력 데이터를 신경망에 입력하여 출력된 분쟁 예측 정보와 기설정된 분쟁 정보를 이용하여 신경망의 각 연결정보들간의 가중치를 변경하여 신경망을 학습시킬 수 있다. As described above, the neural network can be trained by changing the weights of the connection information of the neural network using the dispute prediction information and the predetermined dispute information outputted by inputting input data for the registered patent into the neural network.

단계 520에서 특허 분쟁 예측 장치(100)는 예측 대상 특허의 입력 데이터를 학습된 신경망에 입력하여 당해 예측 대상 특허의 분쟁 예측 정보를 획득한다.In operation 520, the patent dispute prediction apparatus 100 obtains dispute prediction information of the prediction target patent by inputting input data of the prediction target patent to the learned neural network.

단계 525에서 특허 분쟁 예측 장치(100)는 신경망의 정확도를 이용하여 획득된 분쟁 예측 정보를 보정하여 출력한다.In operation 525, the patent dispute prediction apparatus 100 corrects and outputs the dispute prediction information obtained by using the accuracy of the neural network.

전술한 바와 같이, 오분류 행렬을 이용하여 신경망의 정확도가 산출될 수 있으며, 산출된 정확도가 신경망의 정확도로써 설정될 수 있다. 이에 따라 특허 분쟁 예측 장치(100)는 설정된 정확도와 예측 대상 특허에 대해 획득된 분쟁 예측 정보를 곱하여 당해 분쟁 예측 정보를 보정할 수 있다.As described above, the accuracy of the neural network can be calculated using the misclassification matrix, and the calculated accuracy can be set as the accuracy of the neural network. Accordingly, the patent dispute prediction apparatus 100 may correct the dispute prediction information by multiplying the set accuracy by the dispute prediction information obtained for the patent to be predicted.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야에서 통상의 지식을 가진 자라면 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined in the appended claims. It will be understood that the invention may be varied and varied without departing from the scope of the invention.

도 1은 본 발명의 실시예에 따른 특허 분쟁 예측 장치의 내부 구성을 걔략적으로 도시한 블록도.1 is a block diagram schematically illustrating an internal configuration of a patent dispute prediction apparatus according to an embodiment of the present invention.

도 2는 본 발명의 실시예에 따른 데이터 등록부의 내부 구성을 개략적으로 도시한 도면.2 is a diagram schematically showing an internal configuration of a data registration unit according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 따른 문헌 벡터를 구성하는 방법을 나타낸 순서도.3 is a flow chart illustrating a method of constructing a document vector according to an embodiment of the present invention.

도 4는 본 발명의 실시예에 따른 유사 특허를 클러스터링하는 방법을 나타낸 순서도.4 is a flow chart illustrating a method for clustering similar patents in accordance with an embodiment of the present invention.

도 5는 본 발명의 실시예에 따른 특허 분쟁 예측 장치가 특허 분쟁을 예측하는 방법을 나타낸 순서도.5 is a flowchart illustrating a method for predicting a patent dispute by a patent dispute prediction apparatus according to an embodiment of the present invention.

Claims

An input data storage unit for storing input data including patent holder information, technology classification information, or a combination thereof for each analysis unit;

An output data storage unit for dividing and storing output data including classification information on a possibility of dispute for each analysis unit; And

Patent dispute prediction device including a map learning unit for generating a mining model so that the input data is classified according to the output data.

According to claim 1,

The classification information is a patent dispute prediction device, characterized in that the presence of a past dispute.

According to claim 1,

The mining model is a patent dispute prediction device, characterized in that the neural network model.

According to claim 1,

The technical classification information is a patent dispute prediction device, characterized in that it comprises a technical classification, patent classification and literature vector.

5. The method of claim 4,

The document vector is a patent dispute prediction device, characterized in that the size of the TF (term frequency) -IDF (inverse document frequency) value of each keyword set extracted from the claim for the patent for each analysis unit.

6. The method of claim 5,

The keyword set is a patent dispute prediction device characterized in that the dimension of the representative similar word of the keyword set extracted from the claim for the patent for each analysis unit.

According to claim 1,

The analysis unit is a patent dispute prediction device, characterized in that one patent or a plurality of patents.

8. The method of claim 7,

When the analysis unit is a plurality of patents, the analysis unit is a patent owned by the same patent owner, any one of a family patent, a patent owned by the same patent owner in a citation relationship, a patent owned by the same patent owner whose similarity between document vectors is equal to or greater than a reference value, or It is a combination of these, The patent dispute prediction apparatus characterized by the above-mentioned.

The method of claim 8,

When the analysis unit has a plurality of patents, the patents owned by the same patent holder are clustered, and a representative value of input data for each of the clustered patents is calculated to form input data corresponding to the analysis unit. Patent dispute prediction device.

The method of claim 9,

And the representative value is calculated using any one of an average value, a maximum value, a minimum value, and a sum.

According to claim 1,

And the input data includes at least two technical classifications.

In the method of predicting a dispute over a patent, a patent dispute prediction device,

(a) generating input data of patents of analysis units for dispute prediction using at least one of patent information and bibliographic information of patents of analysis units and registering them in a database;

(b) inputting the input data corresponding to the registered patent among the input data into the neural network for predicting the dispute of the patent, and using the dispute prediction information outputted and the dispute information preset according to the registered patent, Learning by changing weights; And

(c) obtaining dispute prediction information about the predicted patent by inputting input data corresponding to the predicted patent to the learned neural network.

13. The method of claim 12,

In step (a),

Extracting a keyword set from patent information of the patent using an association rule algorithm;

Calculating a terminal frequency (TF) and an inverse document frequency (IDF) using the extracted keyword set; And

And constructing a literature vector corresponding to the patent using the calculated TF and IDF.

13. The method of claim 12,

The analysis unit is a patent dispute prediction method, characterized in that one patent or multi-patent.

The method of claim 14,

When the analysis unit is a multi-patent, step (a),

Clustering similar patents owned by the same patent holder to the patent; And

Calculating a representative value for the clustered similar patents.

The method of claim 15,

Clustering the similar patents,

A patent dispute prediction method comprising extracting, as a similar patent, at least one of a family patent, a patent that is directly or indirectly cited or cited according to a set patent, or a similarity of a reference vector or more.

The method of claim 9,

The input data includes a plurality of technical classifications, including literature vectors corresponding to the patent,

Step (b) and step (c) is a patent dispute prediction method, characterized in that for inputting at least two or more technical classifications of the plurality of technical classifications to the neural network.