TWI772975B - Automatic similarity comparison and interpretation method of contracts - Google Patents
Automatic similarity comparison and interpretation method of contracts Download PDFInfo
- Publication number
- TWI772975B TWI772975B TW109140785A TW109140785A TWI772975B TW I772975 B TWI772975 B TW I772975B TW 109140785 A TW109140785 A TW 109140785A TW 109140785 A TW109140785 A TW 109140785A TW I772975 B TWI772975 B TW I772975B
- Authority
- TW
- Taiwan
- Prior art keywords
- sentence
- parsed
- statement
- contract
- data
- Prior art date
Links
Images
Landscapes
- Machine Translation (AREA)
- Executing Machine-Instructions (AREA)
- Synchronisation In Digital Transmission Systems (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
本發明是有關於一種解析方法,特別是指一種契約書的自動相似度比對與解析方法。The present invention relates to an analysis method, in particular to an automatic similarity comparison and analysis method for contracts.
法律契約是我們日常生活中非常重要的文件。契約是雙方當事人基於意思表示合致而成立的法律行為,為私法自治的主要表現,規定了雙方的責任和義務。由於涉及法律,若沒有詳細考慮契約內容,進而創建和簽署契約,可能會冒著陷入不公平或爭議情況的風險,並招致後來的訴訟費用。Legal contracts are very important documents in our daily life. A contract is a legal act established by both parties based on the agreement of their wills. It is the main manifestation of the autonomy of private law and stipulates the responsibilities and obligations of both parties. Because of the law involved, creating and signing a contract without careful consideration of its content may risk getting into an unfair or contentious situation and incurring later litigation costs.
然而,缺乏足夠法律知識的個人用戶難以發現契約中法律問題的不利之處以及契約聲明背後的意圖。即使是專業律師,也可能仍需要花費大量時間來解析新契約。However, it is difficult for individual users who lack sufficient legal knowledge to discover the disadvantages of legal issues in contracts and the intent behind contract statements. Even professional lawyers can still spend a lot of time parsing new deeds.
因為要理解法律聲明並發現潛在的缺陷,需要大量的法律和領域知識,對於現有的電腦難以實現取代人類心智活動來解析契約。Because a great deal of legal and domain knowledge is required to understand legal statements and spot potential flaws, it is difficult for existing computers to replace human mental activity to parse contracts.
因此,本發明的目的,即在提供一種能自動解析契約書的契約書的自動相似度比對與解析方法。Therefore, the purpose of the present invention is to provide an automatic similarity comparison and analysis method for a contract that can automatically analyze the contract.
於是,本發明契約書的自動相似度比對與解析方法,藉由一電腦系統來執行,該電腦系統儲存一待解析契約書資料,及多筆參考契約書資料,該待解析契約書資料包括多個待解析語句,每一參考契約書資料包括多個參考語句,該方法包含一步驟(A)及一步驟(B)。Therefore, the automatic similarity comparison and analysis method of the contract of the present invention is performed by a computer system, and the computer system stores a contract data to be analyzed and a plurality of reference contract data, and the to-be-analyzed contract data includes: A plurality of sentences to be parsed, each reference contract document includes a plurality of reference sentences, and the method includes a step (A) and a step (B).
在該步驟(A)中,對於每一待解析語句,該電腦系統將該待解析語句與該等參考契約書資料的該等參考語句進行相似度比對,以從該等參考契約書資料的該等參考語句中獲得一與該待解析語句最相似的目標參考語句。In the step (A), for each sentence to be parsed, the computer system compares the similarity between the sentence to be parsed and the reference sentences of the reference contract data, so as to obtain the data from the reference contract data. A target reference sentence most similar to the to-be-parsed sentence is obtained from the reference sentences.
在該步驟(B)中,對於每一待解析語句,該電腦系統根據該待解析語句與對應該待解析語句的目標參考語句,利用一用於解析二語句之關聯性的解析模型產生一包括該待解析語句與該目標參考語句之關聯性的解析結果。In the step (B), for each sentence to be parsed, the computer system generates an analytic model including The parsing result of the association between the to-be-parsed statement and the target reference statement.
本發明的功效在於:對於每一待解析語句,藉由該電腦系統將該待解析語句與該等參考契約書資料的該等參考語句進行相似度比對,以獲得該目標參考語句,並根據該待解析語句與該目標參考語句利用該解析模型產生該解析結果,讓使用者能根據對應該待解析契約書資料的該等待解析語句的解析結果,了解該待解析契約書資料與該等參考契約書資料之間的關聯性。The effect of the present invention is: for each sentence to be parsed, the computer system compares the similarity between the sentence to be parsed and the reference sentences of the reference contract data to obtain the target reference sentence, and according to The to-be-parsed statement and the target reference statement use the parsing model to generate the parsing result, so that the user can understand the to-be-parsed contract data and the reference based on the parsing result of the pending parsing statement corresponding to the to-be-parsed contract data Correlation between contract documents.
在本發明被詳細描述之前,應當注意在以下的說明內容中,類似的元件是以相同的編號來表示。Before the present invention is described in detail, it should be noted that in the following description, similar elements are designated by the same reference numerals.
參閱圖1,說明用來實施本發明契約書的自動相似度比對與解析方法的一實施例之一電腦系統1,該電腦系統1包含一儲存單元11及一電連接該儲存單元11的處理單元12。在本實施例中,該電腦系統1之實施態樣例如為一個人電腦、一伺服器或一雲端主機,但不以此為限。Referring to FIG. 1, a
該儲存單元11儲存多筆訓練資料、一包括多種同義詞的同義詞資料、一待解析契約書資料,及多筆參考契約書資料,每一訓練資料包括一第一語句、一第二語句,及一相關於該第一語句與該第二語句關聯性的標註,該待解析契約書資料包括多個待解析語句,每一參考契約書資料包括多個參考語句。The
值得注意的是,在本實施例中,每一訓練資料的標註例如有六類,分別為相同、第一語句涵蓋第二語句、第二語句涵蓋第一語句、相反、不相干,及其他,該等參考契約書資料是經過法律驗證的,且可分為13類,分別為關於用戶帳戶的準則、關於隱私和契約資訊的安全條款、有關契約成立的充分條件的條款、缺陷保修條款、危險負擔解除合同的權利以及雙方的索賠、免責條款、關於契約和第三方供應商的效力,服務提供商在退貨時的賠償或補償、退貨、業務的定義和說明、爭議解決與救濟渠道、採購代理、關於管轄權和適用法律,及知識產權,但不以此為限。It should be noted that, in this embodiment, the labels of each training data, for example, have six categories, which are the same, the first sentence covers the second sentence, the second sentence covers the first sentence, opposite, irrelevant, and others, These reference contract materials are legally verified and can be divided into 13 categories, namely guidelines regarding user accounts, security clauses regarding privacy and contract information, clauses regarding sufficient conditions for contract formation, defect warranty clauses, dangers Burden of right to rescind the contract and claims of both parties, disclaimers, validity of contracts and third-party suppliers, indemnification or compensation of service providers in the event of returns, returns, definition and description of business, dispute resolution and relief channels, purchasing agents , with respect to jurisdiction and applicable law, and intellectual property rights, without limitation.
參閱圖2、3,本發明契約書的自動相似度比對與解析方法的該實施例包含一模型建立程序及一契約書解析程序。Referring to FIGS. 2 and 3 , the embodiment of the automatic similarity comparison and analysis method for contracts of the present invention includes a model building program and a contract analysis program.
參閱圖1、2,該模型建立程序包含步驟21~23,以下說明該模型建立程序的步驟。Referring to Figures 1 and 2, the model building procedure includes steps 21 to 23, and the steps of the model building procedure are described below.
在步驟21中,對於每一訓練資料,該處理單元12根據該訓練資料的該第一語句與該第二語句中的至少一目標單字,以及該同義詞資料,獲得至少一對應該至少一目標單字的同義詞組。In step 21, for each training data, the
在步驟22中,對於每一訓練資料,該處理單元12根據該訓練資料及該至少一同義詞組,產生至少一擴增訓練資料。In
在步驟23中,該處理單元12根據該等訓練資料及步驟22產生的擴增訓練資料,利用一機器學習演算法,建立一用於解析二語句之關聯性的解析模型。In
舉例來說,該訓練資料的該第一語句為『如果您不同意某項「服務」之修訂條款________』,該第二語句『如_您不同意________本條款之全部或部分內容』,且該訓練資料的標註為第二語句涵蓋第一語句。在步驟21中,若目標單字為”同意”及”修訂”,則可獲得例如包括”允諾、許可、准許、批准”的”同意”同義詞組,以及例如僅包括”修改”的”修訂” 同義詞組,則在步驟22中,該處理單元12可將該第一語句及該第二語句的”同意”分別替換成”允諾、許可、准許、批准”,以獲得4筆擴增訓練資料,亦可將該第一語句的”修訂”替換成修改”修改”1筆擴增訓練資料,其中,擴增訓練資料的標註不變,皆為第二語句涵蓋第一語句,同理,若目標單字為”修訂”,則僅會產生1筆擴增訓練資料。For example, the first sentence of the training material is "If you do not agree to the revised terms of a "Service" ________", the second sentence "If _ you do not agree to all or part of these terms ________ , and the training data is marked as the second sentence covering the first sentence. In step 21, if the target word is "agree" and "revise", a synonym group of "agree" including "promise, permission, permission, approval", for example, and "revise" synonym including only "revise", for example, can be obtained group, then in
值得注意,在本實施中,該機器學習演算法為自然語言處理(Natural Language Processing, NLP)預訓練(Pre-training)的神經網路(Neural Network, NN)演算法,例如BERT(Bidirectional Encoder Representations from Transformers)模型,但不以此為限。It is worth noting that in this implementation, the machine learning algorithm is a Natural Language Processing (NLP) pre-training (Neural Network, NN) algorithm, such as BERT (Bidirectional Encoder Representations) from Transformers) model, but not limited to this.
要再注意的是,本實施例由於六類標註中的其中幾類標註的樣本數過低,因此採用同義詞替換的資料擴增(data augmentation)方式(步驟21、22),以增加樣本數,若在其他實施方式,在資料充足的情況下,可不用執行資料擴增,即不執行步驟21、22,而直接執行步驟23,且在步驟23中,該處理單元12直接根據儲存單元11的根據該等訓練資料建立該解析模型。It should be noted again that in this embodiment, since the number of samples for several of the six types of annotations is too low, the data augmentation method of synonym replacement (steps 21 and 22) is used to increase the number of samples. In other embodiments, in the case of sufficient data, data augmentation may not be performed, that is,
參閱圖1、3,該模型建立程序包含步驟31、32,以下說明該契約書解析程序的步驟。Referring to Figures 1 and 3, the model establishment procedure includes
在步驟31中,對於每一待解析語句,該處理單元12將該待解析語句與該等參考契約書資料的該等參考語句進行相似度比對,以從該等參考契約書資料的該等參考語句中獲得一與該待解析語句最相似的目標參考語句。In step 31, for each to-be-parsed statement, the
搭配參閱圖4,在本實施例中,該處理單元12係利用一局部敏感雜湊演算法(local sensitivity hashing, LSH)從該等參考契約書資料的該等參考語句獲得一與該待解析語句最相似的目標參考語句,步驟31包括子步驟311及312,以下說明步驟31的子步驟。Referring to FIG. 4 , in the present embodiment, the
在步驟311中,對於每一待解析語句,該處理單元12利用該局部敏感雜湊演算法計算出多個相關於該待解析語句分別與該等參考契約書資料的該等參考語句相似度的相似度值。In step 311, for each sentence to be parsed, the
要特別注意的是,在本實施例中,該處理單元12可以通過將該待解析語句與要比對的參考語句隨機投影到各種歸一化的單位向量中,以提取其特徵來計算該待解析語句與該等參考語句的局部敏感雜湊函數值,由於隨機投影的隨機性以及大小的關係,兩個句子的相似度值為根據雜湊表查找的特徵之間的相似性,但不以此為限。It should be noted that, in this embodiment, the
在步驟312中,對於每一待解析語句,該處理單元12根據該等相似度值,從該等參考契約書資料的該等參考語句中獲得所對應的相似度值相對最高的該目標參考語句。In
在步驟32中,對於每一待解析語句,該處理單元12根據該待解析語句與對應該待解析語句的目標參考語句,利用該解析模型產生一包括該待解析語句與該目標參考語句之關聯性的解析結果。In
搭配參閱圖5,步驟32包括子步驟321及322,以下說明步驟32的子步驟。Referring to FIG. 5 ,
在步驟321中,對於每一待解析語句,該處理單元12利用一序列排列比對(sequence alignment comparison)演算法,將該待解析語句與該目標參考語句進行對齊,以獲得一相關於該待解析語句的對齊待解析語句及一相關於該目標參考語句的對齊參考語句。In step 321 , for each sentence to be parsed, the
在步驟322中,對於每一待解析語句,該處理單元12將該對齊待解析語句與該對齊參考語句輸入該解析模型,以產生該解析結果。In
該處理單元12利用該序列排列比對演算法的詳細作動係利用下式的編輯(插入,刪除和替換)操作來將該待解析語句分別與該等候選參考語句進行對齊,以比對出該待解析語句及其比對的候選參考語句相同、缺失,或不同的部分。
d(
i,
j)=min
其中,
代表編輯距離函數(edit distance),
i與
j分別代表該待解析語句與該目標參考語句的對應位置,
S1與
S2分別代表該待解析語句與該目標參考語句的字串,
與
分別代表該待解析語句字串
S1在位置
i的字元,及該目標參考語句字串
S2在位置
j的字元。
S1 <
S2表示字串
S1較
S2長。
The detailed action of the
舉例來說,該待解析語句
S1例如為『在您完成線上訂購程序以後,本系統會自動經由電子郵件或其他方式寄給您一封通知,但是該項通知只是通知您本系統已經收到您的訂購訊息,不代表交易已經完成或契約已經成立,PChomeOnline保留是否接受您的訂單的權利。』,而與該待解析語句比對的候選參考語句S2例如為『使用者完成線上訂購程序以後,即表示提出要約,本公司會自動經由電子郵件或其他方式寄發通知,但是該項通知只是代表已經收到使用者訂購訊息。』,則該處理單元12利用該序列排列比對演算法的結果如下表1所示。
表1
該對齊待解析語句即為『_在您完成線上訂購程序以後,_________本系統會自動經由電子郵件或其他方式寄給您一封通知,但是該項通知只是通知您本系統已經收到_您的訂購訊息,不代表交易已經完成或契約已經成立,PChomeOnline保留是否接受您的訂單的權利。』,該對齊參考語句為『使用者完成線上訂購程序以後,即表示提出要約,本公司會自動經由電子郵件或其他方式寄____發通知,但是該項通知只是____代表已經收到使用者訂購訊息,______________________________________________。』。The statement to be parsed is "_After you complete the online ordering process, _________ will automatically send you a notification by email or other means, but this notification only informs you that the system has received _ you. The order information does not mean that the transaction has been completed or the contract has been established, and PChomeOnline reserves the right to accept your order. ”, the alignment reference sentence is “After the user completes the online ordering process, it means that an offer is made, and the company will automatically send a notice via email or other means to ____, but the notice is only ____ on behalf of having received it. User ordering information, _____________________________________________. '.
要特別注意的是,在本實施例中,每一訓練資料的第一語句及第二語句,也是經由該處理單元12利用該序列排列比對演算法對齊後的語句。It should be noted that, in this embodiment, the first sentence and the second sentence of each training data are also sentences aligned by the
綜上所述,本發明契約書的自動相似度比對與解析方法,對於每一待解析語句,藉由該電腦系統1利用該局部敏感雜湊,從該等參考契約書資料的該等參考語句獲得該目標參考語句,大幅縮短比對時間,並將該待解析語句與該目標參考語句輸入至該解析模型,以產生該解析結果,讓使用者能根據對應該待解析契約書資料的該等待解析語句的解析結果,了解該待解析契約書資料與該等參考契約書資料之間的關聯性,故確實能達成本發明的目的。To sum up, in the automatic similarity comparison and analysis method of the contract of the present invention, for each sentence to be analyzed, the
惟以上所述者,僅為本發明的實施例而已,當不能以此限定本發明實施的範圍,凡是依本發明申請專利範圍及專利說明書內容所作的簡單的等效變化與修飾,皆仍屬本發明專利涵蓋的範圍內。However, the above are only examples of the present invention, and should not limit the scope of implementation of the present invention. Any simple equivalent changes and modifications made according to the scope of the patent application of the present invention and the contents of the patent specification are still included in the scope of the present invention. within the scope of the invention patent.
1:電腦系統 11:儲存單元 12:處理單元 21~23:步驟 31、32:步驟 311、312:步驟 321、322:步驟 1: Computer system 11: Storage unit 12: Processing unit 21~23: Steps 31, 32: Steps 311, 312: Steps 321, 322: Steps
本發明的其他的特徵及功效,將於參照圖式的實施方式中清楚地呈現,其中:
圖1是一方塊圖,說明用來本發明契約書的自動相似度比對與解析方法的一實施例的一電腦系統;
圖2是一流程圖,說明本發明契約書的自動相似度比對與解析方法的該實施例之一模型建立程序;
圖3是一流程圖,說明本發明契約書的自動相似度比對與解析方法的該實施例之一契約書解析程序;
圖4是一流程圖,輔助說明圖3該契約書解析程序的步驟31之子步驟; 及
圖5是一流程圖,輔助說明圖3該契約書解析程序的步驟32之子步驟。
Other features and effects of the present invention will be clearly presented in the embodiments with reference to the drawings, wherein:
1 is a block diagram illustrating a computer system used for an embodiment of the automatic similarity comparison and analysis method of the contract of the present invention;
2 is a flow chart illustrating a model building procedure of this embodiment of the automatic similarity comparison and analysis method of the contract of the present invention;
3 is a flow chart illustrating a contract parsing procedure of this embodiment of the automatic similarity comparison and parsing method for contracts of the present invention;
FIG. 4 is a flowchart to assist in explaining the sub-steps of step 31 of the contract parsing procedure of FIG. 3; and
FIG. 5 is a flowchart to assist in explaining the sub-steps of
31、32:步驟 31, 32: Steps
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109140785A TWI772975B (en) | 2020-11-20 | 2020-11-20 | Automatic similarity comparison and interpretation method of contracts |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109140785A TWI772975B (en) | 2020-11-20 | 2020-11-20 | Automatic similarity comparison and interpretation method of contracts |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202221556A TW202221556A (en) | 2022-06-01 |
TWI772975B true TWI772975B (en) | 2022-08-01 |
Family
ID=83062454
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109140785A TWI772975B (en) | 2020-11-20 | 2020-11-20 | Automatic similarity comparison and interpretation method of contracts |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI772975B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014169334A1 (en) * | 2013-04-15 | 2014-10-23 | Contextual Systems Pty Ltd | Methods and systems for improved document comparison |
-
2020
- 2020-11-20 TW TW109140785A patent/TWI772975B/en active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014169334A1 (en) * | 2013-04-15 | 2014-10-23 | Contextual Systems Pty Ltd | Methods and systems for improved document comparison |
Also Published As
Publication number | Publication date |
---|---|
TW202221556A (en) | 2022-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Läubli et al. | When Google Translate is better than some human colleagues, those people are no longer colleagues | |
Kabir et al. | The Power of Social Media Analytics: Text Analytics Based on Sentiment Analysis and Word Clouds on R. | |
US20150032645A1 (en) | Computer-implemented systems and methods of performing contract review | |
Alabau et al. | Casmacat: A computer-assisted translation workbench | |
Bhatia et al. | Towards an information type lexicon for privacy policies | |
US20150106378A1 (en) | Document Categorization By Rules and Clause Group Scores Associated with Type Profiles Apparatus and Method | |
Edie et al. | Avoiding predatory journals: Quick peer review processes too good to be true | |
Tito et al. | Icdar 2021 competition on document visual question answering | |
Tang et al. | Research on automatic labeling of imbalanced texts of customer complaints based on text enhancement and layer-by-layer semantic matching | |
Moorkens et al. | Translation resources and translator disempowerment | |
Goupil et al. | Towards understanding the skill gap in cybersecurity | |
Nkemelu et al. | Tackling Hate Speech in Low-resource Languages with Context Experts | |
US20150106385A1 (en) | Transformation of Documents To Display Clauses In Variance From Best Practices and Custom Rules Score Apparatus and Method. | |
Sharma et al. | Using Stack Overflow content to assist in code review | |
US20230325857A1 (en) | Method and system of sentiment-based selective user engagement | |
Victor et al. | Recommendations for social work researchers and journal editors on the use of generative AI and large language models | |
GB2571703A (en) | A computer system | |
TWI772975B (en) | Automatic similarity comparison and interpretation method of contracts | |
US10380533B2 (en) | Business process modeling using a question and answer system | |
US20150106276A1 (en) | Identification of Clauses in Conflict Across a Set of Documents Apparatus and Method | |
US11544327B2 (en) | Method and system for streamlined auditing | |
US9495333B2 (en) | Contract authoring system and method | |
Amariles et al. | Compliance generation for privacy documents under GDPR: A roadmap for implementing automation and machine learning | |
US20150106880A1 (en) | Authorized Document Distribution and Transmission Control By Groups of Categorized Clauses Apparatus and Method | |
Lin | Standing on the shoulders of AI: Toward a policy framework for AI use in scholarly publishing |