TWI502382B

TWI502382B - A patent serch method applied with principal component analysis

Info

Publication number: TWI502382B
Application number: TW102127235A
Authority: TW
Inventors: Ming Yuan Kang; Min Chen Chiu
Original assignee: Ipplus Inc
Priority date: 2013-07-30
Filing date: 2013-07-30
Publication date: 2015-10-01
Also published as: TW201504827A

Description

Patent search method using principal component analysis

本發明係關於一種專利搜尋方法，並且特別地，關於一種應用主成分分析之專利搜尋方法。The present invention relates to a patent search method and, in particular, to a patent search method using principal component analysis.

專利內蘊含豐富的資訊，對企業的重要性早已被世界所公認。根據世界智慧財產權組織(WIPO)的分析報導，在各式專業期刊、雜誌、百科全書等有關技術發展的資料中，唯一能夠全盤公開技術核心者，僅有專利說明書。因此世界上所有的研發成果約有90%~95%均可在專利說明書中找到，其中有80%並未記載於其他雜誌期刊中，是以吸取專利資訊對企業研發、預測科技趨勢發展有莫大的助益。據世界智慧財產權組織進一步的調查顯示，善加利用專利資訊，可以縮短研發時間60%、節省研發經費40%。The patent contains a wealth of information, and the importance to the company has long been recognized by the world. According to the analysis of the World Intellectual Property Organization (WIPO), among the various technical publications, such as various professional journals, magazines, and encyclopedias, the only ones that can fully disclose the core of technology are patent specifications. Therefore, about 90%~95% of all research and development achievements in the world can be found in the patent specifications, 80% of which are not recorded in other magazines, which is a great way to absorb patent information and develop the technology trends. Help. According to further investigations by the World Intellectual Property Organization, the use of patent information can shorten the research and development time by 60% and save 40% of research and development expenses.

在現今科技產業蓬勃發展的情況下，面對強大的競爭壓力，「創新」一詞已成為全球競爭力的代表根源，不論國內國外的各大產業無不積極投入大量時間及人力資源來進行研發創新。然而，企業在進行研發創新之前，常利用專利檢索以瞭解研發標的所處技術領域之專利佈局，一方面可以避免自己的研發成果侵犯他人的專利，另一方面也可以做為未來企業申請專利之前案參考。In the current booming technology industry, in the face of strong competitive pressures, the term "innovation" has become the representative source of global competitiveness. No matter whether domestic and foreign industries are actively investing a lot of time and human resources. innovating. However, before conducting R&D innovation, companies often use patent search to understand the patent layout of the technical field in which the R&D target is located. On the one hand, they can avoid their own research and development results infringing on others' patents, and on the other hand, they can also apply for patents in future enterprises. Reference.

然而在專利檢索的過程中，往往需要不斷地調整檢索條件，透過各種同義字或是其上下位概念用詞做變換組合，或是加以排除特定詞的條件予以限縮。此外，現今普遍之專業用詞大多都已和以往所使用有所不同，且若遇有心人士於撰寫專利時以不常見、不常用的特殊用詞來敘述其技術特徵，此種刻意規避的手法將導致檢索的障礙相對提高。然而若想要藉由將檢索範圍擴大以解決上述之難題，則檢索結果精準度下滑的問題，亦將會隨之產生。However, in the process of patent search, it is often necessary to constantly adjust the search conditions, and to change the combination of various synonyms or their subordinate concepts, or to limit the conditions of specific words. In addition, most of the common professional terms used today are different from those used in the past, and if a person with a heart is writing a patent with special features that are not common and not commonly used to describe their technical characteristics, this deliberate circumvention method The barriers to search will be relatively increased. However, if the problem is to be solved by expanding the search range, the problem of the accuracy of the search results will also occur.

故本發明將針對上述問題予以有效的改善，盡可能的在不喪失專利檢索精準度的前提下，有效提升專利檢索之廣度，以提高專利檢索的可信度。Therefore, the present invention will effectively improve the above problems, and as far as possible, without increasing the accuracy of patent search, effectively improve the breadth of patent search, so as to improve the credibility of patent search.

本發明期望藉由提出一種應用主成分分析之專利檢索方法，以使專利檢索的廣度增加的前提下而不影響專利檢索的準確率。The present invention is expected to provide a patent search method using principal component analysis so as to increase the breadth of patent search without affecting the accuracy of patent search.

本發明提出一種應用主成分分析之專利檢索方法，其包含以下步驟：(S1)根據一檢索條件搜尋一專利資料庫以獲得N筆檢索專利；(S2)根據一預定引證網路規則針對該N筆檢索專利進行專利引證搜尋以獲得M筆引證專利；(S3)針對該M筆引證專利與該N筆檢索專利進行一專利相似度計算以產生一專利相似度矩陣；(S4)根據該專利相似度矩陣進行一主成分分析以篩選產生Q筆相似專利，並將該Q筆相似專利與該N筆檢索專利合併成為(Q+N)筆檢索結果；(S5)將該(Q+N)筆檢索結果取代步驟(S2)之該N筆檢索專利並重複步驟(S2)至步驟(S4)直到Q等於零為止，其中N、M均為大於一之自然數，Q為大或等於零之整數。The invention provides a patent search method using principal component analysis, which comprises the following steps: (S1) searching a patent database according to a search condition to obtain N search patents; (S2) targeting the N according to a predetermined citation network rule The patent search patent searches for a patent for patent citation; (S3) performs a patent similarity calculation for the M-reference patent and the N-search patent to generate a patent similarity matrix; (S4) is similar according to the patent The principal matrix analysis performs a principal component analysis to screen and generate a Q pen similar patent, and merges the Q pen similar patent with the N pen search patent into a (Q+N) pen search result; (S5) the (Q+N) pen The search result replaces the N pen search patent of step (S2) and repeats steps (S2) to (S4) until Q is equal to zero, where N and M are both natural numbers greater than one, and Q is an integer greater than or equal to zero.

本發明步驟(S3)之專利相似度計算包含有以下子步驟：(S31)建立一MxN之引證關聯矩陣；(S32)根據該MxN之引證關聯矩陣產生一Mx1之關聯強度矩陣，該Mx1之關聯強度矩陣包含每一引證專利各自對應之一關聯強度值；(S33)正規化該Mx1之關聯強度矩陣以產生每一引證專利各自對應之一正規化關聯強度值；(S34)自該M筆引證專利中篩選該正規化關聯強度值大於一之P筆引證專利，其中P為大於一之自然數；(S35)判斷P是否大於N，若是，則建立一PxN之專利相似度矩陣；若否，則建立一MxN之專利相似度矩陣。而該專利相似度矩陣係利用每一引證專利相較於N筆檢索專利各自所屬之一預定專利分類碼之關係所建立，該預定專利分類碼可以是國際專利分類碼或是美國專利分類碼。此外，該預定專利分類碼亦可以是主專利分類碼，或是次專利分類碼，或是主專利分類碼與次專利分類碼之組合。The patent similarity calculation of the step (S3) of the present invention comprises the following sub-steps: (S31) establishing a reference matrix of MxN; (S32) generating an associated intensity matrix of Mx1 according to the citation association matrix of the MxN, the association of the Mx1 The intensity matrix includes one of the associated strength values of each of the cited patents; (S33) normalizing the associated intensity matrix of the Mx1 to generate a normalized association strength value corresponding to each of the cited patents; (S34) from the M-citing The patent selects a patent of the P-referenced value whose normalized correlation strength value is greater than one, wherein P is a natural number greater than one; (S35) determines whether P is greater than N, and if so, establishes a PxN patent similarity matrix; if not, Then establish a patent similarity matrix of MxN. The patent similarity matrix is established by using the relationship between each cited patent and one of the predetermined patent classification codes of the N-search patents, and the predetermined patent classification code may be an international patent classification code or a US patent classification code. In addition, the predetermined patent classification code may also be a primary patent classification code, or a secondary patent classification code, or a combination of a primary patent classification code and a secondary patent classification code.

本發明步驟(S4)之主成分分析包含有以下子步驟：(41)根據該專利相似度矩陣計算一共變異數矩陣；(S42)根據該共變異數矩陣進行特徵分解以產生一特徵值及一特徵向量；(S43)計算一主成分係數以及一主成分解釋變異百分比；(S44)再根據該主成分係數以及該主成分解釋變異百分比選取該Q筆相似專利。The principal component analysis of the step (S4) of the present invention comprises the following substeps: (41) calculating a common variance matrix according to the patent similarity matrix; (S42) performing feature decomposition according to the common variance matrix to generate a feature value and a Feature vector; (S43) calculating a principal component coefficient and a principal component interpretation variation Percentage; (S44) and then selecting the Q-like similar patent according to the principal component coefficient and the percentage variation of the principal component interpretation.

此外，本發明所採用之該預定引證網路規則可以包含一被引證網路(Forward Citation)以及一引證網路(Backward Citation)，且該預定引證網路規則可以包含一層專利引證網路或二層專利引證網路或三層專利引證網路。In addition, the predetermined citation network rule adopted by the present invention may include a Forward Citation and a Backward Citation, and the predetermined citation network rule may include a layer of patent citation network or two. Layer patent citation network or three-tier patent citation network.

相較於習知技術，本發明藉由提出一種應用主成分分析之專利檢索方法，將主成分分析與專利引證網路結合應用於專利檢索上，即可在不喪失專利檢索精準度的前提下，有效提升專利檢索之廣度，以提高專利檢索的可信度。Compared with the prior art, the present invention proposes a patent search method using principal component analysis, and combines principal component analysis and patent citation network to apply for patent search, without losing the accuracy of patent search. Effectively enhance the breadth of patent searches to improve the credibility of patent searches.

S1~S5‧‧‧步驟S1~S5‧‧‧Steps

S31~S35‧‧‧步驟S31~S35‧‧‧Steps

S351~S352‧‧‧步驟S351~S352‧‧‧Steps

S41~S44‧‧‧步驟S41~S44‧‧‧Steps

10‧‧‧引證關聯矩陣10‧‧‧citation association matrix

12‧‧‧關聯強度矩陣12‧‧‧Correlation strength matrix

14‧‧‧正規化關聯強度矩陣14‧‧‧Normalized correlation strength matrix

圖一係繪示本發明專利檢索方法之流程圖。FIG. 1 is a flow chart showing the patent search method of the present invention.

圖二係繪示本發明之專利相似度計算之流程圖。Figure 2 is a flow chart showing the calculation of the similarity of the patent of the present invention.

圖三係繪示本發明之引證關聯矩陣及關聯強度正規化矩陣之示意圖。FIG. 3 is a schematic diagram showing the citation association matrix and the correlation strength normalization matrix of the present invention.

圖四係繪示本發明之主成分分析之流程圖。Figure 4 is a flow chart showing the principal component analysis of the present invention.

圖五係繪示本發明之具體實施例之第一主成分係數之示意圖。Figure 5 is a schematic diagram showing the first principal component coefficients of a particular embodiment of the present invention.

以下將對本發明所提一種應用主成分分析之專利檢索方法進行一細部的說明。請參閱圖一，圖一係繪示本發明專利檢索方法之流程圖。本發明提出一種應用主成分分析之專利檢索方法，其包含以下步驟：(S1)根據一檢索條件搜尋一專利資料庫以獲得N筆檢索專利，其中專利資料庫可以是美國專利商標局之專利資料庫，或者其他國家相關專利資料庫等，而檢索條件可以是一組關鍵字或是一組專利號碼等；(S2)根據一預定引證網路規則針對該N筆檢索專利進行專利引證搜尋以獲得M筆引證專利，其中預定引證網路規則可以是針對每一檢索專利藉由前向引證(Forward citation)以及後向引證(Backward citation)以進行專利引證搜尋以獲得M筆引證專利，此外預定引證網路規亦可以包含一層專利引證網路或二層專利引證網路或三層專利引證網路等；(S3)針對該M筆引證專利與該N筆檢索專利進行一專利相似度計算以產生一專利相似度矩陣；(S4)根據該專利相似度矩陣進行一主成分分析以篩選產生Q筆相似專利，並將該Q筆相似專利與該N筆檢索專利合併成為(Q+N)筆檢索結果；(S5)將該(Q+N)筆檢索結果取代步驟(S2)之該N筆檢索專利並重複步驟(S2)至步驟(S4)直到Q等於零為止，其中N、M均為大於一之自然數，Q為大或等於零之整數。A detailed description of the patent search method using principal component analysis proposed by the present invention will be given below. Please refer to FIG. 1. FIG. 1 is a flow chart showing the patent search method of the present invention. The invention provides a patent search method using principal component analysis, which comprises the following steps: (S1) searching a patent database according to a search condition to obtain an N-search patent, wherein the patent database may be a patent material of the US Patent and Trademark Office. a library, or a related patent database of another country, etc., and the search condition may be a set of keywords or a set of patent numbers, etc.; (S2) performing a patent citation search for the N search patents according to a predetermined citation network rule The M-cited patent, wherein the predetermined citation network rule may be a patent citation for each search patent by Forward citation and Backward citation to obtain a M-cited patent, in addition to the citation The network specification may also include a patent citation network or a second-tier patent citation network or a three-layer patent citation network; (S3) performing a patent similarity calculation for the M-cited patent and the N-search patent to generate a patent similarity matrix; (S4) performing a principal component analysis according to the patent similarity matrix to screen and generate a similar patent for the Q pen, and Q T N T is similar to the retrieval Patent Patent Merging into (Q+N) pen search results; (S5) replacing the (Q+N) pen search result with the N pen search patent of step (S2) and repeating step (S2) to step (S4) until Q is equal to zero , wherein N and M are both natural numbers greater than one, and Q is an integer greater than or equal to zero.

請參閱圖二及圖四，圖二係繪示本發明之專利相似度計算之流程圖，圖四係繪示本發明之引證關聯矩陣及關聯強度正規化矩陣之示意圖。本發明專利檢索方法之步驟(S3)之專利相似度計算包含有以下子步驟：(S31)建立一MxN之引證關聯矩陣10(如圖四所示)，其中A1至An代表N筆檢索專利，B1至Bm代表M筆引證專利，假若A1檢索專利與B3引證專利存有引證關係(不論為前向引證或後向引證)，則對應到引證關聯矩陣10內的引證數值(Cij)即為1，若當A1檢索專利和B2引證專利不具有任何引證關係時，引證關聯矩陣內10的引證數值(Cij)便為0，故引證關聯矩陣10係為一個由0和1所組成之矩陣；(S32)根據該MxN之引證關聯矩陣10產生一Mx1之關聯強度矩陣12，該Mx1之關聯強度矩陣12包含每一引證專利各自對應之一關聯強度值(i =1~n ；j =1~m )，其中關聯強度值 (Relationship Strength,RS)定義為B1至Bm中任一引證專利的所有引證數值(Cij)之和(如圖四所示)，也就是說，當引證專利中的任一專利與N筆檢索專利存在有越多的引證關係，其引證數值(Cij)的加總勢必越大，同時亦代表著該筆引證專利與N筆檢索專利之關聯強度值(RSj)越高；(S33)正規化該Mx1之關聯強度矩陣12以產生每一引證專利各自對應之一正規化關聯強度值並形成一Mx1正規化關聯強度矩陣14，正規化關聯強度值(RSnol_j)定義為關聯強度(RSj)除以平均關聯強度(RSave)；(S34)自該M筆引證專利中篩選該正規化關聯強度值大於一之P筆引證專利，其中P為大於一之自然數；(S35)判斷P是否大於N，(S351)若是，則建立一PxN之專利相似度矩陣；(S352)若否，則建立一MxN之專利相似度矩陣。此外，本發明以國際專利分類號(IPC)作為計算相似度的依據，其具有主部(section)、主類(class)、次類(subclass)、主目(group)以及次目(subgroup)五個階層，本發明以此為相似度評分依據，定義專利相似度為介於0到1之間的值，以一個階層為單位依序進行位階的判別。而該專利相似度矩陣係利用每一引證專利相較於N筆檢索專利各自所屬之一預定專利分類碼之關係所建立，該預定專利分類碼可以是國際專利分類碼或是美國專利分類碼。該預定專利分類碼亦可以是主專利分類碼，或是次專利分類碼，或是主專利分類碼與次專利分類碼之組合。Referring to FIG. 2 and FIG. 4, FIG. 2 is a flow chart showing the patent similarity calculation according to the present invention, and FIG. 4 is a schematic diagram showing the citation association matrix and the correlation strength normalization matrix of the present invention. The patent similarity calculation of the step (S3) of the patent search method of the present invention comprises the following sub-steps: (S31) establishing a MxN citation association matrix 10 (shown in FIG. 4), wherein A1 to An represent N-search patents, B1 to Bm represent the M-cited patent. If the A1 search patent has a citation relationship with the B3 citation patent (whether it is a forward citation or a backward citation), the citation value (Cij) corresponding to the citation association matrix 10 is 1 If the A1 search patent and the B2 citation patent do not have any citation relationship, the citation value (Cij) in the citation association matrix is 0, so the citation association matrix 10 is a matrix composed of 0 and 1; S32) generating, according to the reference matrix 10 of the MxN, an associated intensity matrix 12 of Mx1, where the associated intensity matrix 12 of the Mx1 includes one of the associated strength values of each of the cited patents ( i =1~ n ; j =1~ m ), where the relationship strength (RS) is defined as the sum of all the cited values (Cij) of any of the cited patents B1 to Bm (as shown in Figure 4) That is to say, when there is more citation relationship between any patent in the cited patent and the N-search patent, the sum of the citation value (Cij) is bound to increase, and it also represents the patent and N pen. Searching for the associated strength value (RSj) of the patent; (S33) normalizing the associated intensity matrix 12 of the Mx1 to generate a normalized correlation strength value corresponding to each of the cited patents and forming an Mx1 normalized correlation strength matrix 14, The normalized correlation strength value (RSnol_j) is defined as the correlation strength (RSj) divided by the average correlation strength (RSave); (S34) from the M-cited patent, the normalized association strength value is greater than one of the P-citing patents, wherein P is a natural number greater than one; (S35) determining whether P is greater than N, (S351) if yes, establishing a patent similarity matrix of PxN; (S352) if not, establishing a patent similarity matrix of MxN. In addition, the present invention uses the International Patent Classification Number (IPC) as a basis for calculating similarity, having a main section, a main class, a subclass, a group, and a subgroup. The five levels, the present invention uses this as the basis for the similarity score, defines the patent similarity as a value between 0 and 1, and sequentially determines the ranks in units of one hierarchy. The patent similarity matrix is established by using the relationship between each cited patent and one of the predetermined patent classification codes of the N-search patents, and the predetermined patent classification code may be an international patent classification code or a US patent classification code. The predetermined patent classification code may also be a primary patent classification code, or a secondary patent classification code, or a combination of a primary patent classification code and a secondary patent classification code.

請參閱圖三，圖三係繪示本發明之主成分分析之流程圖。本發明專利檢索方法之步驟(S4)之主成分分析包含有以下子步驟：(41)根據該專利相似度矩陣計算一共變異數矩陣，而其計算方法即若有兩個隨機的變數x _1i 和x _2i ，且i=1…N，則兩變數之間的共變異數定義為：若將共變異數推廣到高維的空間時，可以得到共變異矩陣(Covariance matrix)，如下：x _i =[x _1i …x _Mi ]^T ； (S42)根據該共變異數矩陣進行特徵分解以產生一特徵值及一特徵向量，其中藉由針對共變異數矩陣S進行特徵分解，以得出數據的主成分(即特徵向量)與其權值(即特徵值)，利用特徵方程式|λI -S |=0可以求得共變異數矩陣S的特徵值，並依照特徵值的大小依序排列可以得到λ ₁ λ ₂ … λ _p 0共p 個特徵值，再分別求出對應於λ _i 的特徵向量e ₁ ,e ₂ ,…,e _p ，特徵值的含意為新變數之變異數，原始資料的總變異數與新的總變異數相同，即為特徵值的的總和；(S43)計算一主成分係數以及一主成分解釋變異百分比，其中主成分係數即在新變數與原始變數轉換之間的係數，代表原始變數對於新變數的影響力與重要性，故係數越大表示對原始變數的影響力越大，係數的定義如下：(i ,j =1,2,…,p )其中，l _ij 為第j 個變數在第i 個主成份的負荷，λ _i 為第i 個主成份的特徵值，e _ij 為特徵向量e _i 的第j 個值。經過主成份萃取後已可得到每個主成份下的每個變數之係數，如此就可得到經過主成份轉換後的主成份，例如，X ₁ 在第m個主成份為y _1m =l _m
1 x ₁₁ +l _m
2 x ₁₂ +…+l _m
1 x _1p ，可以下式表示：而主成分解釋變異百分比表示了主成份在整個數據中占多大的比重，變數取的多，雖然能保留原始資料大部分的資訊，但是卻無法達到維度化簡的目的；變數取的少，可以有效的降低維度，方便分析，但可能遺失一些有用的資訊。λ _i 表示第i 個特徵值，則第i 個主成份的解釋變異百分比和前i個主成份的累積解釋變異百分比分別為： (S44)再根據該主成分係數以及該主成分解釋變異百分比選取該Q筆相似專利。Please refer to FIG. 3, which is a flow chart showing the principal component analysis of the present invention. The principal component analysis of the step (S4) of the patent search method of the present invention comprises the following substeps: (41) calculating a total variance matrix according to the patent similarity matrix, and the calculation method is if there are two random variables x _{1 i} And x _{2 i} , and i=1...N, then the covariation between the two variables is defined as: If the covariation number is generalized to a high dimensional space, a Covariance matrix can be obtained as follows: x _i = [ x _{1 i} ... x _Mi ] ^T ; (S42) performing eigen decomposition according to the covariance matrix to generate an eigenvalue and a eigenvector, wherein eigen decomposition is performed on the covariance matrix S to obtain a principal component (ie, a eigenvector) of the data and a weight thereof. (ie, the eigenvalue), using the characteristic equation | λI - S | = 0, the eigenvalues of the covariance matrix S can be obtained, and λ ₁ can be obtained according to the order of the eigenvalues. λ ₂ ... λ _p 0 total p eigenvalues, and then find the eigenvectors e ₁ , e ₂ ,..., e _p corresponding to λ _i , the meaning of the eigenvalue is the variance of the new variable, the total variance of the original data and the new total The variance number is the same, which is the sum of the eigenvalues; (S43) calculates a principal component coefficient and a principal component interpretation variation percentage, wherein the principal component coefficient is the coefficient between the new variable and the original variable transformation, representing the original variable for the new The influence and importance of variables, so the larger the coefficient, the greater the influence on the original variables. The coefficients are defined as follows: (I, j = 1,2, ... , p) where, l _ij is a j-th variable load in the i-th principal component, λ _i is the i-th eigenvalue of the principal component, e _ij is the eigenvectors e _i The jth value. After the main component extraction, the coefficient of each variable under each main component can be obtained, so that the main component after the main component conversion can be obtained. For example, X ₁ is y _{1 m} = l _m in the mth main component . ₁ x ₁₁ + l _{m 2} x ₁₂ +...+ l _{m 1} x _{1 p} , which can be expressed as: The percentage of variance of principal component interpretation indicates how much the principal component accounts for the whole data. The variables take more. Although most of the information of the original data can be retained, the purpose of dimension simplification cannot be achieved. Effectively reduce the dimensions for analysis, but may lose some useful information. [lambda] _i represents the i-th feature value, the percent explained variance of the i-th principal component and the cumulative percent of explained variance of the i-th principal component before were: (S44) selecting the Q-like similar patent according to the principal component coefficient and the percentage variation of the principal component interpretation.

以下將針對本發明之一具體實施例進行說明。本發明所提出一種應用主成分分析之專利檢索方法可以被應用在檢索「直通矽晶穿孔」(Through-Silicon Via,TSV)為一檢索標的，檢索資料庫為一美國公告專利資料庫上之實例應用。Hereinafter, a specific embodiment of the present invention will be described. The patent search method using principal component analysis proposed by the present invention can be applied to search for "Through-Silicon Via" (TSV) as a search target, and the search database is an example of a US patent database. application.

於本發明步驟(S1)中，首先根據一檢索條件為ABST/(("trough-silicon via$"or TSV or TSVs)and("integrated circuit$"))於2013/4/16在美國公告專利資料庫中進行進階檢索，檢索結果獲得N筆檢索專利，其中N等於44。In the step (S1) of the present invention, the patent is first announced in the United States according to a search condition of ABST/(("trough-silicon via$" or TSV or TSVs) and ("integrated circuit$") in 2013/4/16. An advanced search is performed in the database, and the search result is obtained by N search patents, where N is equal to 44.

於本發明步驟(S2)中，根據一預定引證網路規則針對44筆檢索專利進行專利引證搜尋，該預定引證網路規則採用前向引證(Forward Citation)以及後向引證(Backward Citation)並使用一層引證網路規則，藉由針對44筆檢索專利進行專利引證搜尋以獲得M筆引證專利，其中M等於406。In the step (S2) of the present invention, a patent citation search is performed on 44 search patents according to a predetermined citation network rule, and the citation network rule adopts Forward Citation and Backward Citation and uses A layer of citation network rules, through a patent citation search for 44 search patents to obtain a M-cited patent, where M equals 406.

於本發明步驟(S3)中，針對該406筆引證專利與該44筆檢索專利進行一專利相似度計算以產生一專利相似度矩陣。其中本發明所採用之專利相似度計算包含有以下子步驟：(S31)建立一406x44之引證關聯矩陣；(S32)根據該406x44之引證關聯矩陣產生一406x1之關聯強度矩陣，該406x1之關聯強度矩陣包含每一引證專利各自對應之一關聯強度值，即經由引證關聯矩陣內的引證數值(Cij)計算出該406筆引證專利中每件引證專利的關聯強度(RSj)，之後針對該406筆引證專利之各自關聯強度(RSj)後得到平均關聯強度(RSave)為1.8705；(S33)正規化該406x1之關聯強度矩陣以產生每一引證專利各自對應之一正規化關聯強度值；(S34)自該406筆引證專利中篩選該正規化關聯強度值大於一之P筆引證專利，其中P等於115；(S35)判斷P是否大於N，(S351)若是，則建立一PxN之專利相似度矩陣，亦即建立一115x44之專利相似度矩陣。於實際應用上，本實施例以國際專利分類號(IPC)為相似度評分依據，定義專利相似度為介於0到1之間的值，以一個階層為單位依序進行位階的判別，每一位階增加0.2的相似值。In the step (S3) of the present invention, a patent similarity calculation is performed on the 406-stroke patent and the 44 search patents to generate a patent similarity matrix. The patent similarity calculation used in the present invention includes the following sub-steps: (S31) establishing a 406x44 citation association matrix; (S32) generating a 406x1 correlation strength matrix according to the 406x44 citation association matrix, the 406x1 correlation strength The matrix contains one of the associated strength values of each of the cited patents, that is, the correlation strength (RSj) of each of the cited patents in the 406 cited patents is calculated via the citation value (Cij) in the citation reference matrix, and then the 406 pens are The average correlation strength (RSave) of the cited patents (RSj) is 1.8705; (S33) normalizes the associated intensity matrix of 406x1 to generate a normalized correlation strength value corresponding to each of the cited patents; (S34) From the 406 cited patents, the patent of the P-referenced patent with the normalized correlation strength value greater than one is selected, wherein P is equal to 115; (S35) determining whether P is greater than N, (S351), if yes, establishing a PxN patent similarity matrix That is, a 115x44 patent similarity matrix is established. In practical applications, this embodiment uses the International Patent Classification Number (IPC) as the basis for the similarity score, and defines the patent similarity as a value between 0 and 1, and sequentially determines the ranks in units of one level. One order increases the similarity value of 0.2.

請參閱圖五，圖五係繪示本發明之具體實施例之第一主成分係數之示意圖。於本發明步驟(S4)中，根據該115x44之專利相似度矩陣進行主成分分析。其中本發明所採用之主成分分析包含有以下子步驟：(41)根據該115x44之專利相似度矩陣計算一共變異數矩陣；(S42)根據該共變異數矩陣進行特徵分解以產生一特徵值及一特徵向量；(S43)計算一主成分係數以及一主成分解釋變異百分比，於本實施例中的第一主成分為：Y ₁ =4.149506X ₁ +3.432632X ₂ +.....+3.720075X ₁₁₅ ，其中，X 項的各個係數即代表所對應的引證專利之權重值，權重值越大即表示該引證專利與該44筆檢索專利的相似度越高，故本發明之第一主成分之係數以折線圖的方式呈現，如圖五所示；(S44)根據該主成分係數以及該主成分解釋變異百分比選取Q筆相似專利，其中主成分係數之第一係數峰值4.590093所對應之引證專利共計有9筆，第二係數峰值3.807923有1筆，第三係數峰值3.798306包含3筆，而第四係數峰值3.641812計有8筆，經分析判斷後可知，由於第四係數峰值所對應之8筆專利以及第三係數峰值中的1筆專利皆非本實施例之檢索標的，因此本實施例選取第一係數峰值與第二係數峰值所對應之引證專利共12筆相似專利，亦即Q等於12，最後再將該12筆相似專利與該44筆檢索專利合併成為56筆檢索結果；(S5)將該56筆檢索結果取代步驟(S2)之該44筆檢索專利並重複步驟(S2)至步驟(S4)直到相似專利筆數等於零為止。Referring to FIG. 5, FIG. 5 is a schematic diagram showing a first principal component coefficient of a specific embodiment of the present invention. In the step (S4) of the present invention, principal component analysis is performed based on the patent similarity matrix of 115x44. The principal component analysis used in the present invention includes the following sub-steps: (41) calculating a common variance matrix according to the 115x44 patent similarity matrix; (S42) performing feature decomposition according to the common variance matrix to generate a eigenvalue and a feature vector; (S43) calculating a principal component coefficient and a principal component interpretation variation percentage, the first principal component in this embodiment is: Y ₁ =4.149506 X ₁ +3.432632 X ₂ +.....+3.720075 X ₁₁₅ , wherein each coefficient of the X term represents the weight value of the corresponding cited patent, and the greater the weight value, the higher the similarity between the cited patent and the 44 search patents, so the first principal component of the present invention The coefficient is presented in the form of a line graph, as shown in FIG. 5; (S44) according to the principal component coefficient and the percentage variation of the principal component interpretation, a Q-like patent is selected, wherein the first coefficient of the principal component coefficient corresponds to the peak of 4.590093. There are a total of 9 patents, a second coefficient peak of 3.807923 has 1 pen, a third coefficient peak of 3.798306 contains 3 pens, and a fourth coefficient peak of 3.641118 has 8 pens. After analysis and judgment, it is known that the fourth coefficient peak The eight patents corresponding to the value and one of the third coefficient peaks are not the search target of the present embodiment. Therefore, the present embodiment selects 12 similar patents for the cited patent corresponding to the first coefficient peak and the second coefficient peak. , that is, Q is equal to 12, and finally the 12 similar patents and the 44 search patents are combined into 56 search results; (S5) the 56 search results replace the 44 search patents of the step (S2) and repeated Step (S2) to step (S4) until the number of similar patents is equal to zero.

藉由以上較佳具體實施例之詳述，係希望能更加清楚描述本發明之特徵與精神，而並非以上述所揭露的較佳具體實施例來對本發明之範疇加以限制。相反地，其目的是希望能涵蓋各種改變及具相等性的安排於本發明所欲申請之專利範圍的範疇內。因此，本發明所申請之專利範圍的範疇應根據上述的說明作最寬廣的解釋，以致使其涵蓋所有可能的改變以及具相等性的安排。The features and spirit of the present invention will be more apparent from the detailed description of the preferred embodiments. On the contrary, the intention is to cover various modifications and equivalents within the scope of the invention as claimed. Therefore, the scope of the patented scope of the invention should be construed in the broadest

S1~S5‧‧‧步驟S1~S5‧‧‧Steps

Claims

A patent search method using principal component analysis includes the following steps: (S1) searching a patent database according to a search condition to obtain N search patents; (S2) searching for the N patents according to a predetermined citation network rule Performing a patent citation search to obtain a M-cited patent; (S3) performing a patent similarity calculation for the M-cited patent and the N-search patent to generate a patent similarity matrix; and (S4) according to the patent similarity matrix Performing a principal component analysis to screen and generate a similar patent for Q pen, and combining the similar patent of Q pen with the N pen search patent into a (Q+N) pen search result; wherein N and M are both natural numbers greater than one, Q For a maximum or equal to zero integer, the predetermined citation network rule may include a Forward Citation Network and a Backward Citation Network.

The patent search method of claim 1, wherein the patent similarity calculation comprises the following sub-steps: (S31) establishing a MxN citation association matrix; (S32) generating an Mx1 according to the MxN citation association matrix Correlating the intensity matrix, the associated intensity matrix of the Mx1 includes one of the associated strength values of each of the cited patents; (S33) normalizing the associated intensity matrix of the Mx1 to generate a normalized correlation strength value corresponding to each of the cited patents; (S34) screening, from the M-cited patent, a patent of the P-referenced patent having a normalized correlation strength value greater than one, wherein P is a natural number greater than one; and (S35) determining whether P is greater than N, and if not, establishing one MxN patent similarity matrix.

The patent search method according to claim 2, wherein in the sub-step (S35), it is determined whether P is greater than N, and if so, a patent similarity matrix of PxN is established.

The patent search method according to claim 1, wherein the patent similarity matrix uses one of the cited patents to compare each of the cited patents with one of the N patents. The relationship is established.

The patent search method of claim 4, wherein the predetermined patent classification code may be an international patent classification code or a US patent classification code.

For example, the patent search method described in claim 4, wherein the predetermined patent classification code may be a primary patent classification code, or a secondary patent classification code, or a combination of a primary patent classification code and a secondary patent classification code.

The patent search method of claim 1, wherein the principal component analysis comprises the following substeps: (S41) calculating a common variance matrix according to the patent similarity matrix; (S42) performing characteristics according to the common variance matrix Decomposing to generate a feature value and a feature vector; (S43) calculating a principal component coefficient and a principal component interpretation variation percentage; and (S44) selecting the Q pen similar patent according to the principal component coefficient and the principal component interpretation variation percentage.

The patent search method according to claim 1, further comprising the steps of: (S5) replacing the (Q+N) pen search result with the N-search patent of the step (S2) and repeating the step (S2) to Step (S4) until Q is equal to zero.

The patent search method as claimed in claim 1, wherein the predetermined citation network rule may further comprise a layer of patent citation network or a second layer patent citation network or a three-layer patent citation network.