TWI772975B - Automatic similarity comparison and interpretation method of contracts - Google Patents

Automatic similarity comparison and interpretation method of contracts Download PDF

Info

Publication number
TWI772975B
TWI772975B TW109140785A TW109140785A TWI772975B TW I772975 B TWI772975 B TW I772975B TW 109140785 A TW109140785 A TW 109140785A TW 109140785 A TW109140785 A TW 109140785A TW I772975 B TWI772975 B TW I772975B
Authority
TW
Taiwan
Prior art keywords
sentence
parsed
statement
contract
data
Prior art date
Application number
TW109140785A
Other languages
Chinese (zh)
Other versions
TW202221556A (en
Inventor
蘇豐文
李紀寬
林原逵
胡寶鈺
林學敏
楊善妍
Original Assignee
國立清華大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 國立清華大學 filed Critical 國立清華大學
Priority to TW109140785A priority Critical patent/TWI772975B/en
Publication of TW202221556A publication Critical patent/TW202221556A/en
Application granted granted Critical
Publication of TWI772975B publication Critical patent/TWI772975B/en

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Executing Machine-Instructions (AREA)
  • Synchronisation In Digital Transmission Systems (AREA)
  • Hardware Redundancy (AREA)

Abstract

An automatic similarity comparison and interpretation method of contracts is executed by a computer system that stores a to-be-parsed contract data and multiple reference contract data. The to-be-analyzed contract data includes a plurality of to-be-parsed sentences, each reference contract data includes a plurality of reference sentences, and the method includes the following steps:(A) For each to-be-parsed sentence, the computer system compares the to-be-parsed sentence with the reference sentences to obtain a target reference sentence that is the most similar to the to-be-parsed sentence from the reference sentences in the reference contract data; and (B) For each to-be-parsed sentence, according to the to-be-parsed sentence and the target reference sentence, the computer system uses an parse model for parsing the relevance of the two sentences to generate a parsing result including the relevance of the to-be-parsed sentence and the target reference sentence.

Description

契約書的自動相似度比對與解析方法Automatic similarity comparison and analysis method of contract

本發明是有關於一種解析方法,特別是指一種契約書的自動相似度比對與解析方法。The present invention relates to an analysis method, in particular to an automatic similarity comparison and analysis method for contracts.

法律契約是我們日常生活中非常重要的文件。契約是雙方當事人基於意思表示合致而成立的法律行為,為私法自治的主要表現,規定了雙方的責任和義務。由於涉及法律,若沒有詳細考慮契約內容,進而創建和簽署契約,可能會冒著陷入不公平或爭議情況的風險,並招致後來的訴訟費用。Legal contracts are very important documents in our daily life. A contract is a legal act established by both parties based on the agreement of their wills. It is the main manifestation of the autonomy of private law and stipulates the responsibilities and obligations of both parties. Because of the law involved, creating and signing a contract without careful consideration of its content may risk getting into an unfair or contentious situation and incurring later litigation costs.

然而,缺乏足夠法律知識的個人用戶難以發現契約中法律問題的不利之處以及契約聲明背後的意圖。即使是專業律師,也可能仍需要花費大量時間來解析新契約。However, it is difficult for individual users who lack sufficient legal knowledge to discover the disadvantages of legal issues in contracts and the intent behind contract statements. Even professional lawyers can still spend a lot of time parsing new deeds.

因為要理解法律聲明並發現潛在的缺陷,需要大量的法律和領域知識,對於現有的電腦難以實現取代人類心智活動來解析契約。Because a great deal of legal and domain knowledge is required to understand legal statements and spot potential flaws, it is difficult for existing computers to replace human mental activity to parse contracts.

因此,本發明的目的,即在提供一種能自動解析契約書的契約書的自動相似度比對與解析方法。Therefore, the purpose of the present invention is to provide an automatic similarity comparison and analysis method for a contract that can automatically analyze the contract.

於是,本發明契約書的自動相似度比對與解析方法,藉由一電腦系統來執行,該電腦系統儲存一待解析契約書資料,及多筆參考契約書資料,該待解析契約書資料包括多個待解析語句,每一參考契約書資料包括多個參考語句,該方法包含一步驟(A)及一步驟(B)。Therefore, the automatic similarity comparison and analysis method of the contract of the present invention is performed by a computer system, and the computer system stores a contract data to be analyzed and a plurality of reference contract data, and the to-be-analyzed contract data includes: A plurality of sentences to be parsed, each reference contract document includes a plurality of reference sentences, and the method includes a step (A) and a step (B).

在該步驟(A)中,對於每一待解析語句,該電腦系統將該待解析語句與該等參考契約書資料的該等參考語句進行相似度比對,以從該等參考契約書資料的該等參考語句中獲得一與該待解析語句最相似的目標參考語句。In the step (A), for each sentence to be parsed, the computer system compares the similarity between the sentence to be parsed and the reference sentences of the reference contract data, so as to obtain the data from the reference contract data. A target reference sentence most similar to the to-be-parsed sentence is obtained from the reference sentences.

在該步驟(B)中,對於每一待解析語句,該電腦系統根據該待解析語句與對應該待解析語句的目標參考語句,利用一用於解析二語句之關聯性的解析模型產生一包括該待解析語句與該目標參考語句之關聯性的解析結果。In the step (B), for each sentence to be parsed, the computer system generates an analytic model including The parsing result of the association between the to-be-parsed statement and the target reference statement.

本發明的功效在於:對於每一待解析語句,藉由該電腦系統將該待解析語句與該等參考契約書資料的該等參考語句進行相似度比對,以獲得該目標參考語句,並根據該待解析語句與該目標參考語句利用該解析模型產生該解析結果,讓使用者能根據對應該待解析契約書資料的該等待解析語句的解析結果,了解該待解析契約書資料與該等參考契約書資料之間的關聯性。The effect of the present invention is: for each sentence to be parsed, the computer system compares the similarity between the sentence to be parsed and the reference sentences of the reference contract data to obtain the target reference sentence, and according to The to-be-parsed statement and the target reference statement use the parsing model to generate the parsing result, so that the user can understand the to-be-parsed contract data and the reference based on the parsing result of the pending parsing statement corresponding to the to-be-parsed contract data Correlation between contract documents.

在本發明被詳細描述之前,應當注意在以下的說明內容中,類似的元件是以相同的編號來表示。Before the present invention is described in detail, it should be noted that in the following description, similar elements are designated by the same reference numerals.

參閱圖1,說明用來實施本發明契約書的自動相似度比對與解析方法的一實施例之一電腦系統1,該電腦系統1包含一儲存單元11及一電連接該儲存單元11的處理單元12。在本實施例中,該電腦系統1之實施態樣例如為一個人電腦、一伺服器或一雲端主機,但不以此為限。Referring to FIG. 1, a computer system 1 for implementing the automatic similarity comparison and analysis method of the contract of the present invention is described. The computer system 1 includes a storage unit 11 and a process for electrically connecting the storage unit 11. unit 12. In this embodiment, the implementation of the computer system 1 is, for example, a personal computer, a server or a cloud host, but not limited thereto.

該儲存單元11儲存多筆訓練資料、一包括多種同義詞的同義詞資料、一待解析契約書資料,及多筆參考契約書資料,每一訓練資料包括一第一語句、一第二語句,及一相關於該第一語句與該第二語句關聯性的標註,該待解析契約書資料包括多個待解析語句,每一參考契約書資料包括多個參考語句。The storage unit 11 stores a plurality of training data, a synonym data including a plurality of synonyms, a contract to be parsed data, and a plurality of reference contract data, each training data includes a first sentence, a second sentence, and a Regarding the indication of the association between the first sentence and the second sentence, the contract document to be parsed includes a plurality of sentences to be parsed, and each reference contract document includes a plurality of reference sentences.

值得注意的是,在本實施例中,每一訓練資料的標註例如有六類,分別為相同、第一語句涵蓋第二語句、第二語句涵蓋第一語句、相反、不相干,及其他,該等參考契約書資料是經過法律驗證的,且可分為13類,分別為關於用戶帳戶的準則、關於隱私和契約資訊的安全條款、有關契約成立的充分條件的條款、缺陷保修條款、危險負擔解除合同的權利以及雙方的索賠、免責條款、關於契約和第三方供應商的效力,服務提供商在退貨時的賠償或補償、退貨、業務的定義和說明、爭議解決與救濟渠道、採購代理、關於管轄權和適用法律,及知識產權,但不以此為限。It should be noted that, in this embodiment, the labels of each training data, for example, have six categories, which are the same, the first sentence covers the second sentence, the second sentence covers the first sentence, opposite, irrelevant, and others, These reference contract materials are legally verified and can be divided into 13 categories, namely guidelines regarding user accounts, security clauses regarding privacy and contract information, clauses regarding sufficient conditions for contract formation, defect warranty clauses, dangers Burden of right to rescind the contract and claims of both parties, disclaimers, validity of contracts and third-party suppliers, indemnification or compensation of service providers in the event of returns, returns, definition and description of business, dispute resolution and relief channels, purchasing agents , with respect to jurisdiction and applicable law, and intellectual property rights, without limitation.

參閱圖2、3,本發明契約書的自動相似度比對與解析方法的該實施例包含一模型建立程序及一契約書解析程序。Referring to FIGS. 2 and 3 , the embodiment of the automatic similarity comparison and analysis method for contracts of the present invention includes a model building program and a contract analysis program.

參閱圖1、2,該模型建立程序包含步驟21~23,以下說明該模型建立程序的步驟。Referring to Figures 1 and 2, the model building procedure includes steps 21 to 23, and the steps of the model building procedure are described below.

在步驟21中,對於每一訓練資料,該處理單元12根據該訓練資料的該第一語句與該第二語句中的至少一目標單字,以及該同義詞資料,獲得至少一對應該至少一目標單字的同義詞組。In step 21, for each training data, the processing unit 12 obtains at least a pair of at least one target word according to at least one target word in the first sentence and the second sentence of the training data and the synonym data synonym group.

在步驟22中,對於每一訓練資料,該處理單元12根據該訓練資料及該至少一同義詞組,產生至少一擴增訓練資料。In step 22, for each training data, the processing unit 12 generates at least one augmented training data according to the training data and the at least one synonym group.

在步驟23中,該處理單元12根據該等訓練資料及步驟22產生的擴增訓練資料,利用一機器學習演算法,建立一用於解析二語句之關聯性的解析模型。In step 23 , the processing unit 12 uses a machine learning algorithm to establish a parsing model for parsing the correlation between two sentences according to the training data and the augmented training data generated in step 22 .

舉例來說,該訓練資料的該第一語句為『如果您不同意某項「服務」之修訂條款________』,該第二語句『如_您不同意________本條款之全部或部分內容』,且該訓練資料的標註為第二語句涵蓋第一語句。在步驟21中,若目標單字為”同意”及”修訂”,則可獲得例如包括”允諾、許可、准許、批准”的”同意”同義詞組,以及例如僅包括”修改”的”修訂” 同義詞組,則在步驟22中,該處理單元12可將該第一語句及該第二語句的”同意”分別替換成”允諾、許可、准許、批准”,以獲得4筆擴增訓練資料,亦可將該第一語句的”修訂”替換成修改”修改”1筆擴增訓練資料,其中,擴增訓練資料的標註不變,皆為第二語句涵蓋第一語句,同理,若目標單字為”修訂”,則僅會產生1筆擴增訓練資料。For example, the first sentence of the training material is "If you do not agree to the revised terms of a "Service" ________", the second sentence "If _ you do not agree to all or part of these terms ________ , and the training data is marked as the second sentence covering the first sentence. In step 21, if the target word is "agree" and "revise", a synonym group of "agree" including "promise, permission, permission, approval", for example, and "revise" synonym including only "revise", for example, can be obtained group, then in step 22, the processing unit 12 can replace the "agree" of the first sentence and the second sentence with "promise, permission, permission, approval" respectively, to obtain 4 pieces of augmented training data, and also The "revision" of the first sentence can be replaced by "modify" 1 augmented training data, wherein, the label of the augmented training data remains unchanged, and the second sentence covers the first sentence. Similarly, if the target word If it is "Revision", only one augmented training data will be generated.

值得注意,在本實施中,該機器學習演算法為自然語言處理(Natural Language Processing, NLP)預訓練(Pre-training)的神經網路(Neural Network, NN)演算法,例如BERT(Bidirectional Encoder Representations from Transformers)模型,但不以此為限。It is worth noting that in this implementation, the machine learning algorithm is a Natural Language Processing (NLP) pre-training (Neural Network, NN) algorithm, such as BERT (Bidirectional Encoder Representations) from Transformers) model, but not limited to this.

要再注意的是,本實施例由於六類標註中的其中幾類標註的樣本數過低,因此採用同義詞替換的資料擴增(data augmentation)方式(步驟21、22),以增加樣本數,若在其他實施方式,在資料充足的情況下,可不用執行資料擴增,即不執行步驟21、22,而直接執行步驟23,且在步驟23中,該處理單元12直接根據儲存單元11的根據該等訓練資料建立該解析模型。It should be noted again that in this embodiment, since the number of samples for several of the six types of annotations is too low, the data augmentation method of synonym replacement (steps 21 and 22) is used to increase the number of samples. In other embodiments, in the case of sufficient data, data augmentation may not be performed, that is, steps 21 and 22 are not performed, but step 23 is directly performed, and in step 23, the processing unit 12 The analytical model is built based on the training data.

參閱圖1、3,該模型建立程序包含步驟31、32,以下說明該契約書解析程序的步驟。Referring to Figures 1 and 3, the model establishment procedure includes steps 31 and 32, and the steps of the contract parsing procedure are described below.

在步驟31中,對於每一待解析語句,該處理單元12將該待解析語句與該等參考契約書資料的該等參考語句進行相似度比對,以從該等參考契約書資料的該等參考語句中獲得一與該待解析語句最相似的目標參考語句。In step 31, for each to-be-parsed statement, the processing unit 12 compares the similarity between the to-be-parsed statement and the reference statements of the reference contract data, so as to extract the similarity from the reference contract data. A target reference sentence most similar to the to-be-parsed sentence is obtained from the reference sentence.

搭配參閱圖4,在本實施例中,該處理單元12係利用一局部敏感雜湊演算法(local sensitivity hashing, LSH)從該等參考契約書資料的該等參考語句獲得一與該待解析語句最相似的目標參考語句,步驟31包括子步驟311及312,以下說明步驟31的子步驟。Referring to FIG. 4 , in the present embodiment, the processing unit 12 uses a local sensitivity hashing (LSH) algorithm to obtain from the reference sentences of the reference contract data a value that is closest to the to-be-parsed sentence Similar to the target reference sentence, step 31 includes sub-steps 311 and 312 , and the sub-steps of step 31 are described below.

在步驟311中,對於每一待解析語句,該處理單元12利用該局部敏感雜湊演算法計算出多個相關於該待解析語句分別與該等參考契約書資料的該等參考語句相似度的相似度值。In step 311, for each sentence to be parsed, the processing unit 12 uses the locality-sensitive hash algorithm to calculate a plurality of similarities of the similarity of the similarity of the reference sentence related to the sentence to be parsed and the reference contract data, respectively degree value.

要特別注意的是,在本實施例中,該處理單元12可以通過將該待解析語句與要比對的參考語句隨機投影到各種歸一化的單位向量中,以提取其特徵來計算該待解析語句與該等參考語句的局部敏感雜湊函數值,由於隨機投影的隨機性以及大小的關係,兩個句子的相似度值為根據雜湊表查找的特徵之間的相似性,但不以此為限。It should be noted that, in this embodiment, the processing unit 12 can calculate the to-be-parsed sentence by randomly projecting the to-be-parsed sentence and the reference sentence to be compared into various normalized unit vectors to extract its features. The local-sensitive hash function value of the parsing sentence and these reference sentences. Due to the randomness and size of random projection, the similarity value of the two sentences is the similarity between the features searched according to the hash table, but not limit.

在步驟312中,對於每一待解析語句,該處理單元12根據該等相似度值,從該等參考契約書資料的該等參考語句中獲得所對應的相似度值相對最高的該目標參考語句。In step 312 , for each sentence to be parsed, the processing unit 12 obtains, according to the similarity values, the target reference sentence with the highest corresponding similarity value from the reference sentences in the reference contract data. .

在步驟32中,對於每一待解析語句,該處理單元12根據該待解析語句與對應該待解析語句的目標參考語句,利用該解析模型產生一包括該待解析語句與該目標參考語句之關聯性的解析結果。In step 32, for each sentence to be parsed, the processing unit 12 uses the parsing model to generate an association including the sentence to be parsed and the target reference sentence according to the sentence to be parsed and the target reference sentence corresponding to the sentence to be parsed Sexual analysis results.

搭配參閱圖5,步驟32包括子步驟321及322,以下說明步驟32的子步驟。Referring to FIG. 5 , step 32 includes sub-steps 321 and 322 , and the sub-steps of step 32 are described below.

在步驟321中,對於每一待解析語句,該處理單元12利用一序列排列比對(sequence alignment comparison)演算法,將該待解析語句與該目標參考語句進行對齊,以獲得一相關於該待解析語句的對齊待解析語句及一相關於該目標參考語句的對齊參考語句。In step 321 , for each sentence to be parsed, the processing unit 12 uses a sequence alignment comparison algorithm to align the sentence to be parsed with the target reference sentence to obtain an information related to the sentence to be parsed. Alignment of Parsing Statements The to-be-parsed statement and an alignment reference statement relative to the target reference statement.

在步驟322中,對於每一待解析語句,該處理單元12將該對齊待解析語句與該對齊參考語句輸入該解析模型,以產生該解析結果。In step 322, for each statement to be parsed, the processing unit 12 inputs the aligned statement to be parsed and the aligned reference statement into the parsing model to generate the parsing result.

該處理單元12利用該序列排列比對演算法的詳細作動係利用下式的編輯(插入,刪除和替換)操作來將該待解析語句分別與該等候選參考語句進行對齊,以比對出該待解析語句及其比對的候選參考語句相同、缺失,或不同的部分。 d( i, j)=min

Figure 02_image001
其中,
Figure 02_image003
代表編輯距離函數(edit distance), ij分別代表該待解析語句與該目標參考語句的對應位置, S1與 S2分別代表該待解析語句與該目標參考語句的字串,
Figure 02_image005
Figure 02_image007
分別代表該待解析語句字串 S1在位置 i的字元,及該目標參考語句字串 S2在位置 j的字元。 S1 < S2表示字串 S1較 S2長。 The detailed action of the processing unit 12 using the sequence alignment algorithm is to use the editing (insertion, deletion and replacement) operations of the following formulas to align the to-be-parsed sentence with the candidate reference sentences respectively, so as to compare the The parsed statement and its aligned candidate reference statement are identical, missing, or different parts. d( i , j )=min
Figure 02_image001
in,
Figure 02_image003
represents the edit distance function (edit distance), i and j respectively represent the corresponding positions of the to-be-parsed sentence and the target reference sentence, S 1 and S 2 respectively represent the strings of the to-be-analyzed sentence and the target reference sentence,
Figure 02_image005
and
Figure 02_image007
respectively represent the character at position i of the to-be-parsed sentence string S 1 and the character at position j of the target reference sentence string S 2 . S 1 < S 2 means that the string S 1 is longer than S 2 .

舉例來說,該待解析語句 S1例如為『在您完成線上訂購程序以後,本系統會自動經由電子郵件或其他方式寄給您一封通知,但是該項通知只是通知您本系統已經收到您的訂購訊息,不代表交易已經完成或契約已經成立,PChomeOnline保留是否接受您的訂單的權利。』,而與該待解析語句比對的候選參考語句S2例如為『使用者完成線上訂購程序以後,即表示提出要約,本公司會自動經由電子郵件或其他方式寄發通知,但是該項通知只是代表已經收到使用者訂購訊息。』,則該處理單元12利用該序列排列比對演算法的結果如下表1所示。 表1 S1-p1: _在您完成線上訂購程序以後, S2-p1: 使用者完成線上訂購程序以後, S1-p2: ________ S2-p2: 即表示提出要約, S1-p3: 本系統會自動經由電子郵件或其他方式寄給您一封通知, S2-p3: 本公司會自動經由電子郵件或其他方式寄___發通知, S1-p4: 但是該項通知只是通知您本系統已經收到_您的訂購訊息, S2-P4: 但是該項通知只是____代表已經收到使用者訂購訊息, S1-p5: 不代表交易已經完成或契約已經成立,PChomeOnline保留是否接受您的訂單的權利。 S2-p5: _____________________________________。 For example, the to-be-parsed statement S1 is, for example, "After you complete the online ordering process, the system will automatically send you a notification via email or other means, but the notification only informs you that the system has received Your order information does not mean that the transaction has been completed or the contract has been established. PChomeOnline reserves the right to accept your order. ”, and the candidate reference sentence S2 compared with the to-be-analyzed sentence is, for example, “After the user completes the online ordering process, it means that an offer is made, and the company will automatically send a notification via email or other means, but the notification is only Indicates that a user order message has been received. ', the processing unit 12 uses the sequence alignment algorithm to obtain results as shown in Table 1 below. Table 1 S1-p1: _After you complete the online ordering procedure, S2-p1: After the user completes the online ordering process, S1-p2: ________ S2-p2: means making an offer, S1-p3: The system will automatically send you a notification via email or other means, S2-p3: The company will automatically send ___ notices by email or other means, S1-p4: But this notification only informs you that the system has received your order information, S2-P4: But this notification is only ____ on behalf of the user's order message has been received, S1-p5: It does not mean that the transaction has been completed or the contract has been established, and PChomeOnline reserves the right to accept your order. S2-p5: _____________________________________.

該對齊待解析語句即為『_在您完成線上訂購程序以後,_________本系統會自動經由電子郵件或其他方式寄給您一封通知,但是該項通知只是通知您本系統已經收到_您的訂購訊息,不代表交易已經完成或契約已經成立,PChomeOnline保留是否接受您的訂單的權利。』,該對齊參考語句為『使用者完成線上訂購程序以後,即表示提出要約,本公司會自動經由電子郵件或其他方式寄____發通知,但是該項通知只是____代表已經收到使用者訂購訊息,______________________________________________。』。The statement to be parsed is "_After you complete the online ordering process, _________ will automatically send you a notification by email or other means, but this notification only informs you that the system has received _ you. The order information does not mean that the transaction has been completed or the contract has been established, and PChomeOnline reserves the right to accept your order. ”, the alignment reference sentence is “After the user completes the online ordering process, it means that an offer is made, and the company will automatically send a notice via email or other means to ____, but the notice is only ____ on behalf of having received it. User ordering information, _____________________________________________. '.

要特別注意的是,在本實施例中,每一訓練資料的第一語句及第二語句,也是經由該處理單元12利用該序列排列比對演算法對齊後的語句。It should be noted that, in this embodiment, the first sentence and the second sentence of each training data are also sentences aligned by the processing unit 12 using the sequence alignment algorithm.

綜上所述,本發明契約書的自動相似度比對與解析方法,對於每一待解析語句,藉由該電腦系統1利用該局部敏感雜湊,從該等參考契約書資料的該等參考語句獲得該目標參考語句,大幅縮短比對時間,並將該待解析語句與該目標參考語句輸入至該解析模型,以產生該解析結果,讓使用者能根據對應該待解析契約書資料的該等待解析語句的解析結果,了解該待解析契約書資料與該等參考契約書資料之間的關聯性,故確實能達成本發明的目的。To sum up, in the automatic similarity comparison and analysis method of the contract of the present invention, for each sentence to be analyzed, the computer system 1 uses the local sensitive hash to extract the reference sentences from the reference contract data Obtaining the target reference sentence, greatly shortening the comparison time, and inputting the to-be-analyzed sentence and the target reference sentence into the analytical model to generate the analytical result, so that the user can follow the waiting period corresponding to the contract data to be parsed By analyzing the parsing result of the sentence, the relationship between the contract data to be analyzed and the reference contract data can be understood, so the purpose of the present invention can be achieved indeed.

惟以上所述者,僅為本發明的實施例而已,當不能以此限定本發明實施的範圍,凡是依本發明申請專利範圍及專利說明書內容所作的簡單的等效變化與修飾,皆仍屬本發明專利涵蓋的範圍內。However, the above are only examples of the present invention, and should not limit the scope of implementation of the present invention. Any simple equivalent changes and modifications made according to the scope of the patent application of the present invention and the contents of the patent specification are still included in the scope of the present invention. within the scope of the invention patent.

1:電腦系統 11:儲存單元 12:處理單元 21~23:步驟 31、32:步驟 311、312:步驟 321、322:步驟 1: Computer system 11: Storage unit 12: Processing unit 21~23: Steps 31, 32: Steps 311, 312: Steps 321, 322: Steps

本發明的其他的特徵及功效,將於參照圖式的實施方式中清楚地呈現,其中: 圖1是一方塊圖,說明用來本發明契約書的自動相似度比對與解析方法的一實施例的一電腦系統; 圖2是一流程圖,說明本發明契約書的自動相似度比對與解析方法的該實施例之一模型建立程序; 圖3是一流程圖,說明本發明契約書的自動相似度比對與解析方法的該實施例之一契約書解析程序; 圖4是一流程圖,輔助說明圖3該契約書解析程序的步驟31之子步驟; 及 圖5是一流程圖,輔助說明圖3該契約書解析程序的步驟32之子步驟。 Other features and effects of the present invention will be clearly presented in the embodiments with reference to the drawings, wherein: 1 is a block diagram illustrating a computer system used for an embodiment of the automatic similarity comparison and analysis method of the contract of the present invention; 2 is a flow chart illustrating a model building procedure of this embodiment of the automatic similarity comparison and analysis method of the contract of the present invention; 3 is a flow chart illustrating a contract parsing procedure of this embodiment of the automatic similarity comparison and parsing method for contracts of the present invention; FIG. 4 is a flowchart to assist in explaining the sub-steps of step 31 of the contract parsing procedure of FIG. 3; and FIG. 5 is a flowchart to assist in explaining the sub-steps of step 32 of the contract parsing program of FIG. 3 .

31、32:步驟 31, 32: Steps

Claims (6)

一種契約書的自動相似度比對與解析方法,藉由一電腦系統來執行,該電腦系統儲存一待解析契約書資料,及多筆參考契約書資料,該待解析契約書資料包括多個待解析語句,每一參考契約書資料包括多個參考語句,該方法包含以下步驟: (A)對於每一待解析語句,將該待解析語句與該等參考契約書資料的該等參考語句進行相似度比對,以從該等參考契約書資料的該等參考語句中獲得一與該待解析語句最相似的目標參考語句;及 (B)對於每一待解析語句,根據該待解析語句與對應該待解析語句的目標參考語句,利用一用於解析二語句之關聯性的解析模型產生一包括該待解析語句與該目標參考語句之關聯性的解析結果。 An automatic similarity comparison and analysis method for contracts is executed by a computer system, the computer system stores a contract data to be analyzed and a plurality of reference contract data, the to-be-analyzed contract data includes a plurality of to-be-analyzed contract data. Parsing sentences, each reference contract document includes a plurality of reference sentences, and the method includes the following steps: (A) For each statement to be parsed, compare the similarity between the statement to be parsed and the reference statements of the reference contract data, so as to obtain a match between the reference statements of the reference contract data The target reference sentence to which the sentence to be parsed is most similar; and (B) For each statement to be parsed, according to the statement to be parsed and the target reference statement corresponding to the statement to be parsed, a parsing model for parsing the correlation between the two statements is used to generate a statement including the statement to be parsed and the target reference statement The result of parsing the associativity of the statement. 如請求項1所述的契約書的自動相似度比對與解析方法,其中,在步驟(A)中,對於每一待解析語句,根據該待解析語句及該等參考契約書資料的該等參考語句,利用一局部敏感雜湊演算法,從該等參考契約書資料的該等參考語句獲得一與該待解析語句最相似的目標參考語句。The method for automatic similarity comparison and analysis of contract documents according to claim 1, wherein, in step (A), for each sentence to be analyzed, according to the sentence to be analyzed and the reference contract data For the reference sentence, a locality-sensitive hash algorithm is used to obtain a target reference sentence most similar to the to-be-parsed sentence from the reference sentences of the reference contract data. 如請求項2所述的契約書的自動相似度比對與解析方法,其中,步驟(A)包括以下子步驟: (A-1)對於每一待解析語句,利用該局部敏感雜湊演算法計算出多個相關於該待解析語句分別與該等參考契約書資料的該等參考語句相似度的相似度值;及 (A-2)對於每一待解析語句,根據該等相似度值,從該等參考契約書資料的該等參考語句中獲得所對應的相似度值相對最高的該目標參考語句。 The method for automatic similarity comparison and analysis of contract documents as claimed in claim 2, wherein step (A) includes the following sub-steps: (A-1) for each statement to be parsed, using the locality-sensitive hash algorithm to calculate a plurality of similarity values related to the similarity of the statement to be parsed and the reference statement data of the reference contract documents; and (A-2) For each sentence to be parsed, according to the similarity values, obtain the target reference sentence with the highest corresponding similarity value from the reference sentences in the reference contract data. 如請求項1所述的契約書的自動相似度比對與解析方法,其中,步驟(B)包括以下子步驟: (B-1) 對於每一待解析語句,利用一序列排列比對演算法,將該待解析語句與該目標參考語句進行對齊,以獲得一相關於該待解析語句的對齊待解析語句及一相關於該目標參考語句的對齊參考語句;及 (B-2) 對於每一待解析語句,將該對齊待解析語句與該對齊參考語句輸入該解析模型,以產生該解析結果。 The method for automatic similarity comparison and analysis of contract documents as claimed in claim 1, wherein step (B) includes the following sub-steps: (B-1) For each sentence to be parsed, use a sequence alignment algorithm to align the sentence to be parsed with the target reference sentence to obtain an aligned sentence to be parsed and an alignment related to the sentence to be parsed an aligned reference statement relative to the target reference statement; and (B-2) For each to-be-parsed statement, input the aligned to-be-parsed statement and the aligned reference statement into the parsing model to generate the parsing result. 如請求項1所述的契約書的自動相似度比對與解析方法,該電腦系統還儲存多筆訓練資料,每一訓練資料包括一由多個單字組成的第一語句、一由多個單字組成的且相異於該第一語句的第二語句,及一相關於該第一語句與該第二語句關聯性的標註,在步驟(B)之前還包含以下步驟: (C)根據該等訓練資料,利用一機器學習演算法,建立該解析模型。 According to the method for automatic similarity comparison and analysis of contract documents as described in claim 1, the computer system further stores a plurality of training data, each training data includes a first sentence composed of a plurality of single characters, a first sentence composed of a plurality of single characters The second sentence composed of and different from the first sentence, and a label related to the relevance of the first sentence and the second sentence, before step (B), it also includes the following steps: (C) Using a machine learning algorithm to build the analytical model according to the training data. 如請求項5所述的契約書的自動相似度比對與解析方法,該電腦系統還儲存一包括多種同義詞的同義詞資料,其中,步驟(C)包括以下子步驟: (C-1)對於每一訓練資料,根據該訓練資料的該第一語句與該第二語句中的至少一目標單字,以及該同義詞資料,獲得至少一對應該至少一目標單字的同義詞組; (C-2)對於每一訓練資料,根據該訓練資料及該至少一同義詞組,產生至少一擴增訓練資料;及 (C-3)根據該等訓練資料及步驟(C-2)產生的擴增訓練資料,利用該機器學習演算法,建立該解析模型。 The automatic similarity comparison and analysis method of the contract document according to claim 5, the computer system also stores a synonym data including a plurality of synonyms, wherein step (C) includes the following sub-steps: (C-1) For each training data, according to at least one target word in the first sentence and the second sentence of the training data, and the synonym data, obtain at least a pair of synonym groups corresponding to at least one target word; (C-2) For each training data, generate at least one augmented training data according to the training data and the at least one synonym group; and (C-3) According to the training data and the augmented training data generated in step (C-2), use the machine learning algorithm to establish the analytical model.
TW109140785A 2020-11-20 2020-11-20 Automatic similarity comparison and interpretation method of contracts TWI772975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW109140785A TWI772975B (en) 2020-11-20 2020-11-20 Automatic similarity comparison and interpretation method of contracts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW109140785A TWI772975B (en) 2020-11-20 2020-11-20 Automatic similarity comparison and interpretation method of contracts

Publications (2)

Publication Number Publication Date
TW202221556A TW202221556A (en) 2022-06-01
TWI772975B true TWI772975B (en) 2022-08-01

Family

ID=83062454

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109140785A TWI772975B (en) 2020-11-20 2020-11-20 Automatic similarity comparison and interpretation method of contracts

Country Status (1)

Country Link
TW (1) TWI772975B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014169334A1 (en) * 2013-04-15 2014-10-23 Contextual Systems Pty Ltd Methods and systems for improved document comparison

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014169334A1 (en) * 2013-04-15 2014-10-23 Contextual Systems Pty Ltd Methods and systems for improved document comparison

Also Published As

Publication number Publication date
TW202221556A (en) 2022-06-01

Similar Documents

Publication Publication Date Title
Läubli et al. When Google Translate is better than some human colleagues, those people are no longer colleagues
Kabir et al. The Power of Social Media Analytics: Text Analytics Based on Sentiment Analysis and Word Clouds on R.
US20150032645A1 (en) Computer-implemented systems and methods of performing contract review
Alabau et al. Casmacat: A computer-assisted translation workbench
Bhatia et al. Towards an information type lexicon for privacy policies
US20150106378A1 (en) Document Categorization By Rules and Clause Group Scores Associated with Type Profiles Apparatus and Method
Edie et al. Avoiding predatory journals: Quick peer review processes too good to be true
Tito et al. Icdar 2021 competition on document visual question answering
Tang et al. Research on automatic labeling of imbalanced texts of customer complaints based on text enhancement and layer-by-layer semantic matching
Moorkens et al. Translation resources and translator disempowerment
Goupil et al. Towards understanding the skill gap in cybersecurity
Nkemelu et al. Tackling Hate Speech in Low-resource Languages with Context Experts
US20150106385A1 (en) Transformation of Documents To Display Clauses In Variance From Best Practices and Custom Rules Score Apparatus and Method.
Sharma et al. Using Stack Overflow content to assist in code review
US20230325857A1 (en) Method and system of sentiment-based selective user engagement
Victor et al. Recommendations for social work researchers and journal editors on the use of generative AI and large language models
GB2571703A (en) A computer system
TWI772975B (en) Automatic similarity comparison and interpretation method of contracts
US10380533B2 (en) Business process modeling using a question and answer system
US20150106276A1 (en) Identification of Clauses in Conflict Across a Set of Documents Apparatus and Method
US11544327B2 (en) Method and system for streamlined auditing
US9495333B2 (en) Contract authoring system and method
Amariles et al. Compliance generation for privacy documents under GDPR: A roadmap for implementing automation and machine learning
US20150106880A1 (en) Authorized Document Distribution and Transmission Control By Groups of Categorized Clauses Apparatus and Method
Lin Standing on the shoulders of AI: Toward a policy framework for AI use in scholarly publishing