TW201606534A

TW201606534A - A document analysis system, document analysis method and document analysis program

Info

Publication number: TW201606534A
Application number: TW104109435A
Authority: TW
Inventors: 守本正宏; 白井喜勝; 武田秀樹; 蓮子和巳; 花谷彰晃
Original assignee: Ubic股份有限公司
Priority date: 2014-03-24
Filing date: 2015-03-24
Publication date: 2016-02-16
Also published as: WO2015145524A1

Abstract

The purpose of the present invention is to analysis document information by means of computer with high precision. The present invention provides a document analysis system, retrieving the information stored in the predetermined computer or server, and analyzing the retrieved information and the document information comprising a plurality of documents. The document analysis system comprising: a category selection section for sorting the retrieved documents to be a category respectively and selecting the corresponding proper category; and a score calculating section for combining the proper category to the document containing litigation and unfair event investigation, forming a distinguishing mark to represent the combining, and calculating a weight of the distinguishing mark to be a score.

Description

File analysis system, file analysis method, and file analysis program

本發明係關於一種將記錄在設定的電腦或者是伺服器中的文件資訊加以分析之文件分析系統。 The present invention relates to a file analysis system for analyzing document information recorded in a set computer or server.

以往，非法存取或是機密資訊洩漏等等關於電腦的犯罪或法律紛爭產生之際，在原因究明或者是在捜査上，都會將必要的機器或資料，以及電子記錄加以取集、分析，在此，本發明提供一種作為法律上明顯的證據性的手段或技術的提案。 In the past, when illegal or access to confidential information, such as computer crimes or legal disputes, the necessary machines or materials, as well as electronic records, were collected and analyzed. Thus, the present invention provides a proposal as a legally evident evidence means or technique.

特別是，在美國民事訴訟當中，都被要求進行eDiscovery(電子證據開示)，無論是該當訴訟的原告以及被告，任何另一方面都負有將相關的數位資訊作為證據加以提出的責任。因此，必須把記錄在電腦‧伺服器中的數位資訊作為證據，加以提出。 In particular, in the US civil litigation, eDiscovery (electronic evidence discovery) is required, and both the plaintiff and the defendant of the litigation, on the other hand, have the responsibility to present relevant digital information as evidence. Therefore, the digital information recorded in the computer ‧ server must be presented as evidence.

另一方面，隨著資訊科技的急速發達與普及，今日的商業世界中幾乎所有的資訊都是用電腦作成的，即使是在同一企業內，也泛濫著許多的數位資訊。 On the other hand, with the rapid development and popularization of information technology, almost all information in today's business world is made by computer. Even in the same enterprise, there is a lot of digital information.

因此，在進行對法庭提出證據資料的準備作業的過程中，就連與該當訴訟不見得有關係的機密的數位資訊，也容易會有被當作證據資料而提出去的失誤。還有，把與該當訴訟沒關係的機密文件資訊提出去，會造成問題。 Therefore, in the process of preparing the evidence for the court, even the confidential digital information that is not necessarily related to the litigation is likely to be mistaken as evidence. Also, raising confidential information about the file that is not relevant to the lawsuit will cause problems.

近年，在法庭立證系統中，與文件資訊相關的技術，在專利文獻1~3中被提案出來。可是，例如，在如專利文獻1~3的法庭立證系統中，就有使用複數個的電腦‧伺服器來收集使用者的巨大的文件資訊。 In recent years, in the court certification system, technologies related to document information have been proposed in Patent Documents 1 to 3. However, for example, in the court certification system of Patent Documents 1 to 3, a plurality of computers and servers are used to collect huge file information of the user.

像這樣把被數位化的巨大的文件資訊，當作是訴訟的證據資料，來區別是否妥當的作業，要靠著被稱之為校正者的使用者以人力目視來確認，必須把該當文件資訊一個一個的加以區別，會有花費極大的勞力和費用的問題。 In this way, the huge amount of document information that is digitized is regarded as evidence of litigation, and whether the proper operation is to be determined is determined by human visual inspection by a user called a corrector. Differentiating one by one will have the problem of labor and expense.

為了解決上述問題之文件區別系統，在專利文獻4中被提示出來。在專利文獻4中，揭示了一種文件區別系統：在收集了被數位化後的文件資訊之後，對該當文件資訊，藉由電腦系統，自動地賦予一個區別符號，可以減輕在訴訟中對所利用的文件資訊進行區別作業的負担。 A document distinguishing system for solving the above problem is suggested in Patent Document 4. In Patent Document 4, a file distinguishing system is disclosed: after collecting the digitized file information, the file information is automatically assigned a distinguishing symbol by the computer system, thereby reducing the use in the litigation. The file information is used to differentiate the burden of the job.

(previous technical literature)

(專利文獻1)特開2011-209930號公報(2011年10月20日公開) (Patent Document 1) JP-A-2011-209930 (published on October 20, 2011)

(專利文獻2)特開2011-209931號公報(2011年10月20日公開) (Patent Document 2) JP-A-2011-209931 (published on October 20, 2011)

(專利文獻3)特開2012-032859號公報(2012年02月16日公開) (Patent Document 3) JP-A-2012-032859 (published on February 16, 2012)

(專利文獻4)特開2013-182338號公報(2013年09月12日公開) (Patent Document 4) JP-A-2013-182338 (published on September 12, 2013)

可是，在開示於上述專利文獻4中的文件區別系統中，當分屬相異品類的文件與調査對象的文件，兩者混在一起的時候，被包含在各自各別的文件中的記載的差異，會對應不起來，把上述區別符號自動地加在該當文件上的(對該當文件加以評分)精度就會有不足之虞。 However, in the file discrimination system disclosed in the above-mentioned Patent Document 4, when the files belonging to the different categories and the documents to be investigated are mixed, the differences recorded in the respective files are included. It will not correspond, and the accuracy of automatically adding the above-mentioned difference symbol to the document (scoring the file) will be insufficient.

本發明是，有鑑於上述的課題，提供一種其目的為：可以用高精度來分析該當文件資訊之文件分析系統。 The present invention has been made in view of the above problems, and provides a file analysis system capable of analyzing the document information with high precision.

本發明之文件分析系統，係一種為了解決上述課題，而取得記錄在設定的電腦或者是伺服器中的資訊，針對包含該當被取得的資訊，和由複數個的文件所構成的文件資訊，加以分析之文件分析系統，其具備有：一品類選擇部，把被包含在前述複數個的文件中的各別各自的文件，當作是可分類的指標，也就是作為品類，來加以選擇；和一評分算出部，當構成前述文件資訊的文件，在顯示出該當文件資訊和訴訟或者是不實行為調査之間的關連度之區別符號相結合之後，把結合的強度顯示出來之評分，藉由前述品類選擇部選擇出來的品類，將上述品類的評分逐一算出。 The document analysis system of the present invention is configured to obtain information recorded in a set computer or a server in order to solve the above problem, and to provide information including the information to be acquired and file information composed of a plurality of files. The analysis file analysis system includes: a category selection unit that selects each of the respective files included in the plurality of files as a categorizable index, that is, as a category, and selects; a score calculation unit that displays the score of the strength of the combination after the document constituting the document information is combined with the difference symbol indicating the degree of connection between the document information and the lawsuit or the non-initiation of the investigation. The category selected by the category selection unit calculates the scores of the above categories one by one.

還有，在本發明之文件分析系統中，前述品類選擇部，是從複數個的品類中，依次選擇前述品類，前述評分算出部，可藉由前述品類選擇部，將被依次選擇的品類逐個地，算出前述評分。 Further, in the document analysis system of the present invention, the category selection unit sequentially selects the category from a plurality of categories, and the score calculation unit can select the categories to be sequentially selected by the category selection unit. Ground, calculate the above score.

還有，本發明之文件分析系統中，前述品類選擇部，係將前述訴訟或者是不實行為調査的種類，因應造成前述訴訟或者是不實行為調査的原因之設定的行為的進展而加以分類的相位，以及，前述文件資訊的屬性之中，的其中至少一個，當作是前述品類來加以選擇。 Further, in the document analysis system of the present invention, the category selection unit classifies the litigation or the type that is not to be investigated, and classifies the behavior according to the behavior of the foregoing litigation or the reason for not performing the survey. At least one of the phase, and the attributes of the aforementioned file information, is selected as the aforementioned category.

還有，在本發明之文件分析系統中，更具備有一調査品類輸入接收部，用來接收前述訴訟或者是不實行為調査所屬的品類的輸入；前述品類選擇部，可藉由前述調査品類輸入接收部，來選擇被接收的品類。 Further, in the document analysis system of the present invention, there is further provided a survey item type input receiving unit for receiving the aforementioned lawsuit or for inputting a category to which the survey belongs; the category selection unit can be input by the aforementioned survey category The receiving unit selects the category to be received.

還有，在本發明之文件分析系統中，更具備有一調査種類判定部，根據藉由前述調査品類輸入接收部所受的品類，判定其作為調査對象的調査品類，再自前述調査基礎資料庫，將必要的資訊種類予以抽樣。 Further, the document analysis system of the present invention further includes a survey type determination unit that determines the survey item to be surveyed based on the category received by the survey category input receiving unit, and then proceeds from the survey base database. , to sample the necessary types of information.

還有，在本發明之文件分析系統中，更具備有一資訊抽樣部，把包含在前述文件資訊中的關鍵字以及/或者是文章，當作是與前述訴訟或者是不實行為調査有關連的資訊，從該當文件資訊予以抽樣。 Further, in the document analysis system of the present invention, there is further provided an information sampling unit for treating the keyword and/or the article included in the information of the aforementioned document as being related to the foregoing lawsuit or not being related to the investigation. Information, sampled from the information of the document.

還有，在本發明之文件分析系統中，更具備有一檢索部，把包含在前述文件資訊裡的關鍵字以及/或者是文章，自前述複數個的文件之中，加以檢索出來。 Further, in the document analysis system of the present invention, there is further provided a retrieval unit. The keywords and/or articles contained in the information of the aforementioned documents are retrieved from the plurality of files mentioned above.

還有，在本發明之文件分析系統中，更具備有一提示部，把藉由前述評分算出部所算出來的評分，讓使用者以可能掌握的程度，予以提示出來。 Further, in the document analysis system of the present invention, there is further provided a presentation unit for presenting the score calculated by the score calculation unit to the user so as to be possible to grasp.

本發明之文件分析方法，係一種為了解決上述課題，取得被記錄在設定的電腦或者是伺服器中的資訊，在被包含在該當所取得的資訊中，對自複數個的文件所構成的文件資訊予以分析之文件分析方法，包含有下列步驟：一品類選擇步驟，把被包含在前述複數個的文件中的各別各自的的文件，以其可分類的指標，也就是品類，來加以選擇；和一評分算出步驟，係當構成前述文件資訊之文件，有顯示該當文件資訊和訴訟或者是不實行為調査之間的關連度之區別符號時，以評分來顯示其與該區別符號相結合的強度，在前述品類選擇步驟中，把所選擇的品類逐一算出其評分。 The document analysis method according to the present invention is a file which is composed of a plurality of files included in the information acquired in order to solve the above problem and obtain information recorded in a set computer or a server. The document analysis method for analyzing the information includes the following steps: a category selection step, and selecting each of the respective files included in the plurality of files by using the categorizable index, that is, the category, And a score calculation step, which is a document that constitutes the information of the aforementioned document, and has a difference symbol indicating the degree of correlation between the document information and the lawsuit or is not implemented as a survey, and is displayed by a score in combination with the difference symbol In the above-described category selection step, the selected categories are calculated one by one.

本發明之文件分析程式，係一種為了解決上述課題，取得被記錄在設定的電腦或者是伺服器中的資訊，包含在該當被取得的資訊中，對從複數個的文件所構成的文件資訊加以分析之文件分析程式，在電腦中，執行了：一品類選擇機能，把被包含在前述複數個的文件中的各別各自的文件，當作是可分類的指標，也就是作為品類，來加以選擇，；和一評分算出機能，當構成前述文件資訊的文件，在顯示出該當文件資訊和訴訟或者是不實行為調査之間的關連度之區別符號相結合之後，把結合的強度顯示出來之評分，藉由前述品類選擇部選擇出來的品類，將上述品類的評分逐一算出。 The file analysis program of the present invention is a type of information that is recorded in a set computer or a server in order to solve the above problem, and includes information on a file composed of a plurality of files in the information to be acquired. The analysis file analysis program, in the computer, performs: a category selection function, and treats each of the respective files included in the plurality of files as a categorizable indicator, that is, as a category, Selecting; and a score calculation function, when the document constituting the information of the aforementioned document is combined with the difference symbol indicating the degree of correlation between the document information and the litigation or not being investigated, the intensity of the combination is displayed. The score is calculated by the category selected by the category selection unit, and the scores of the above categories are calculated one by one.

本發明之文件分析系統，文件分析方法，以及，文件分析程式，係對應於文件所屬之品類，可藉由算出其評分，以高精度地來分析文件資訊。 The file analysis system, the file analysis method, and the file analysis program of the present invention correspond to the category to which the file belongs, and can be analyzed with high precision by calculating the score. File information.

1‧‧‧文件分析系統 1‧‧‧Document Analysis System

11‧‧‧文件顯示畫面 11‧‧‧File display screen

20‧‧‧調査品類輸入接收部 20‧‧‧Investigation category input receiving department

22‧‧‧調査種類判定部 22‧‧‧Investigation Type Determination Department

24‧‧‧資訊抽樣部 24‧‧‧Information Sampling Department

26‧‧‧品類選擇部 26‧‧‧Category Selection Department

30‧‧‧檢索部 30‧‧‧Search Department

201‧‧‧第1自動區別部 201‧‧‧1st automatic distinction department

301‧‧‧第2自動區別部 301‧‧‧2nd Automatic Differences Department

401‧‧‧第3自動區別部 401‧‧‧3rd Automatic Distinction Department

501‧‧‧品質檢査部 501‧‧‧Quality Inspection Department

601‧‧‧學習部 601‧‧‧Learning Department

701‧‧‧報告作成部 701‧‧‧Reporting Department

100‧‧‧資料儲存部 100‧‧‧Data Storage Department

101‧‧‧數位資訊儲存領域 101‧‧‧Digital Information Storage

103‧‧‧調査基礎資料庫 103‧‧‧Investigation basic database

104‧‧‧關鍵字資料庫 104‧‧‧Keyword database

105‧‧‧關連用語資料庫 105‧‧‧Connected Language Database

106‧‧‧評分算出資料庫 106‧‧‧Scoring calculation database

107‧‧‧報告作成資料庫 107‧‧‧Reporting database

109‧‧‧資料庫管理部 109‧‧‧Database Management Department

116‧‧‧評分算出部 116‧‧‧Scoring calculation department

118‧‧‧文件解析部 118‧‧‧Document Analysis Department

120‧‧‧言語判定部 120‧‧  Speech Judgment Department

122‧‧‧翻譯部 122‧‧‧Translation Department

124‧‧‧傾向資訊生成部 124‧‧‧ Trend Information Generation Department

130‧‧‧提示部 130‧‧‧ Tips Department

131‧‧‧區別符號接收賦予部 131‧‧‧Differential Symbol Receiving Department

133‧‧‧律師校正接收部 133‧‧ ‧ lawyer correction receiving department

第1圖本發明之實施形態相關的文件分析系統，顯示主要結構之功能方塊圖 Fig. 1 is a diagram showing a functional block diagram of a main structure of a file analysis system according to an embodiment of the present invention

第2圖評分算出部對某文件的評分，以逐一品類的方式予以算出，顯示該樣式之模式圖. Fig. 2 The score calculation unit calculates the score of a file, and calculates it one by one, showing the pattern of the pattern.

第3圖本發明之實施形態相關的文件分析方法中，所示處理流程的流程圖 Fig. 3 is a diagram showing a process of analyzing a file according to an embodiment of the present invention flow chart

第4圖本發明之實施形態相關的文件分析方法中，所示詳細的處理流程的流程圖 Fig. 4 is a flow chart showing the detailed processing flow in the file analysis method according to the embodiment of the present invention

第5圖本發明之實施形態相關的文件分析方法中，所示因應調査種類所做的調査以及區別處理的流程之流程圖 Fig. 5 is a flowchart showing a flow of investigation and a process of distinguishing processing according to the type of investigation in the file analysis method according to the embodiment of the present invention.

第6圖本發明之實施形態相關的文件分析方法，所示因應調査種類所做的預測編碼的流程之流程圖 Fig. 6 is a flow chart showing a flow of predictive coding in accordance with the type of investigation of the file analysis method according to the embodiment of the present invention

第7圖實施形態中，顯示各個階段的處理的流程之流程圖 Figure 7 is a flow chart showing the flow of processing in each stage in the embodiment.

第8圖實施形態中，顯示關鍵字資料庫的處理流程之流程圖 Figure 8 is a flow chart showing the processing flow of the keyword database in the embodiment.

第9圖本實施形態中，關連用語資料庫的處理流程圖 Figure 9 Flowchart of the processing of the related term database in this embodiment

第10圖本實施形態中，第1自動區別部的處理流程圖 Fig. 10 is a flow chart showing the processing of the first automatic distinguishing unit in the embodiment.

第11圖本實施形態中，第2自動區別部的處理流程圖 Fig. 11 is a flow chart showing the processing of the second automatic distinguishing unit in the embodiment.

第12圖本實施形態中，區別符號接收賦予部的處理流程圖 Fig. 12 is a flowchart showing the processing of the difference symbol reception providing unit in the present embodiment.

第13圖本實施形態中，文件解析部的處理流程圖 Fig. 13 is a flowchart showing the processing of the file analysis unit in the embodiment.

第14圖本實施形態中，顯示在文件解析部中的解析結果的圖表 Fig. 14 is a diagram showing the analysis result in the file analysis unit in the embodiment.

第15圖本實施形態之一實施例中，第3自動區別部的處理流程圖 Fig. 15 is a flowchart showing the processing of the third automatic distinguishing unit in an embodiment of the embodiment

第16圖本實施形態之其他的實施例中，第3自動區別部的處理流程圖 Fig. 16 is a flowchart showing the processing of the third automatic distinguishing unit in another embodiment of the embodiment.

第17圖本實施形態中，品質檢査部的處理流程圖 Figure 17 Flowchart of the quality inspection unit in this embodiment

第18圖本實施形態中，文件顯示畫面 Figure 18 In this embodiment, the file display screen

根據第1圖~第18圖，來說明本發明之實施形態。 Embodiments of the present invention will be described with reference to Figs. 1 to 18 .

〔文件分析系統1的結構〕 [Structure of Document Analysis System 1]

第1圖是本發明之實施形態相關的文件分析系統1中，顯示要部結構的功能方塊圖。文件分析系統1是，取得記錄在設定的電腦或者是伺服器中的資訊，在被包含在該當被取得的資訊中，自複數個的文件所構成的文件資訊予以分析之系統。 Fig. 1 is a functional block diagram showing the configuration of a main part in the document analysis system 1 according to the embodiment of the present invention. The file analysis system 1 is a system that acquires information recorded in a set computer or a server, and analyzes file information composed of a plurality of files included in the information to be acquired.

如第1圖所示，文件分析系統1，係具備有：資料儲存部100(數位資訊儲存領域101，調査基礎資料庫103，關鍵字資料庫104，關連用語資料庫105，評分算出資料庫106，報告作成資料庫107)，資料庫管理部109，資訊抽樣部24，檢索部30，評分算出部116，文件解析部118，調査品類輸入接收部20，調査種類判定部22，提示部130，品類選擇部26，第1自動區別部201，第2自動區別部301，區別符號接收賦予部131，以及，第3自動區別部401。還有，文件分析系統1，可以更具備有：傾向資訊生成部124，品質檢査部501，學習部601，報告作成部701，律師校正接收部133，言語判定部120，翻譯部122。 As shown in FIG. 1, the file analysis system 1 is provided with a data storage unit 100 (digital information storage area 101, survey basic database 103, keyword database 104, related term database 105, and score calculation database 106). The report creation database 107), the database management unit 109, the information sampling unit 24, the search unit 30, the score calculation unit 116, the file analysis unit 118, the investigation type input reception unit 20, the investigation type determination unit 22, and the presentation unit 130. The category selection unit 26, the first automatic distinguishing unit 201, the second automatic distinguishing unit 301, the difference symbol receiving providing unit 131, and the third automatic distinguishing unit 401. Further, the document analysis system 1 may further include a tendency information generating unit 124, a quality checking unit 501, a learning unit 601, a report creating unit 701, a lawyer correction receiving unit 133, a speech determining unit 120, and a translation unit 122.

調査品類輸入接收部20，是用來接收由使用者所輸入的品類。當輸入品類的時候，調査品類輸入接收部20，會將該當品類輸出到調査種類判定部22以及品類選擇部26。所以，上述品類，會把被包含在複數個的文件中的各自各別的文件，作為可分類的指標。 The survey category input receiving unit 20 is for receiving a category input by the user. When the category is input, the survey category input receiving unit 20 outputs the category to the survey type determining unit 22 and the category selecting unit 26. Therefore, in the above categories, each individual file included in a plurality of files is used as a categorizable indicator.

例如，上述品類，訴訟或者是不實行為調査的種類(該當訴訟或者是顯示不實行為調査相關連的事件的性質，例如，包含有：反壟斷，專利，海外賄賂禁止(FCPA)，製造物責任(PL)，資訊洩漏，虛構的請求等)。或者是，上述品類也可以利用在，文件資訊的屬性 (顯示被包含在文件資訊中的資訊的性質，例如，競爭對手方面的資訊，價格，估價單，金額一覽，製品等)。 For example, the above categories , the lawsuit or the type of investigation that is not implemented (the litigation or the nature of the event that does not appear to be related to the investigation, for example, includes: antitrust, patent, overseas bribery prohibition (FCPA), manufacturing liability (PL) , information leaks, fictional requests, etc.). Alternatively, the above categories may also be used in the properties of the document information (displaying the nature of the information contained in the document information, such as information on competitors, prices, valuations, list of amounts, products, etc.).

或者說，上述品類，也可以是因應訴訟或者是造成不實行為調査的原因中，在設定的行為的進展，而予以分類的相位。在這裡，上述「相位」是一種用來顯示上述設定的行為所進展的各階段(因應上述設定的行為的進展而加以分類)的指標。例如，像是「Relationship Building」(關係構築)這樣子的相位，會在以Competition(競爭)的相位為前提的階段，構築出顧客‧競爭的關係的階段。還有，在「Preparation」(準備)的相位，會有與競爭的其他公司(也可以說是第三方)交換與競爭有關的資訊之階段。 In other words, the above-mentioned categories may also be classified according to the progress of the set behaviors in response to the lawsuit or the cause of the non-implementation of the investigation. Here, the "phase" is an index for displaying each stage of the behavior of the above-described setting (classifying according to the progress of the behavior set above). For example, a phase such as "Relationship Building" is a stage in which the relationship between the customer and the competition is established at the stage of the phase of the Competition. Also, in the phase of "Preparation", there will be a stage of exchanging competition-related information with other competing companies (or third parties).

再者，如「Competition」(競爭)的相位，是指對顧客提示價格，得到回饋意見，與該當回饋意見相關，取得競爭合作和溝通意見的階段。還有，上述「設定的行為」，是指訴訟或者是不實行為調査(例如，反壟斷，專利，海外賄賂禁止，製造物責任，資訊洩漏，虛構的請求等等)有關係的不正當的行為(例如，參加與競爭合作相關的價格調整會議等等)，或者是做出與該當不正當的行為有關連的行為。當上述品類為上述相位的時候，文件分析系統1，就可對上述設定的行為，進行最適合的分析。 Furthermore, the phase of "Competition" refers to the stage in which the customer is prompted for the price, the feedback is received, and the feedback is related to the feedback and the communication cooperation and communication opinions are obtained. In addition, the above-mentioned "set behavior" refers to litigation or non-implementation of investigations (for example, anti-monopoly, patents, overseas bribery prohibitions, manufacturing liability, information disclosure, fictional requests, etc.) Behavior (for example, attending a price adjustment meeting related to competitive cooperation, etc.), or making an act related to the improper act. When the above-mentioned category is the above-described phase, the file analysis system 1 can perform the most suitable analysis on the behavior set as described above.

調査種類判定部22，藉由上述調査品類輸入接收部20，根據所接收的品類，把當作是調査的對象之品類進行判定，從調査基礎資料庫103抽樣出必要的資訊種類。例如，上述文件資訊，如果是電子郵件，簡報資料，表計算資料，協商資料，契約書，組織圖，或者是事業計畫書的時候，調査種類判定部22，會把上述各別各自的文件，作為必要的資訊的種類，輸出到資訊抽樣部24。因此，文件分析系統1，就可以抽樣上述必要的資訊的種類。 The survey type determination unit 22 determines the category to be investigated based on the received category based on the received category, and samples the necessary information type from the survey base database 103. For example, in the case of the above-mentioned document information, if it is an e-mail, a briefing material, a table calculation material, a consultation material, a contract book, an organization chart, or a business plan book, the investigation type determining unit 22 will put each of the above respective documents. The type of information necessary is output to the information sampling unit 24. Therefore, the document analysis system 1 can sample the types of information necessary as described above.

資訊抽樣部24，會從文件資訊抽樣出複數個的文件。具體而言，資訊抽樣部24，會從調査種類判定部22，自被輸入的資訊(例如，電子郵件，簡報資料，表計算資料，協商資料，契約書，組織圖，事業計畫書等等)，把該當資訊所包含的關鍵字以及/或者是文章，當作是訴訟或者是不實行為調査相關連的資訊，來進行抽樣，把抽樣的結果儲存在調査基礎資料庫103中。因此，文件分析系統1，可以將上述訴訟或者是不實行為調査相關連的資訊加以識別，並儲存在資料庫中。 The information sampling unit 24 samples a plurality of files from the file information. Specifically, the information sampling unit 24 receives the information (for example, email, briefing materials, table calculation data, negotiation materials, contract documents, organization charts, etc.) from the investigation type determination unit 22. The business plan book, etc.), the keywords and/or the articles contained in the information are treated as litigation or not related to the survey, and the results of the sampling are stored in the survey basic data. Library 103. Therefore, the document analysis system 1 can identify the above-mentioned lawsuit or does not implement the information related to the investigation and store it in the database.

品類選擇部26，選擇上述品類，把選擇的品類輸出到評分算出部116。當品類被設想成有複數個的時候，品類選擇部26，可以從該當複數個的品類中，依次選擇一個品類。 The item selection unit 26 selects the above-described item and outputs the selected item to the score calculation unit 116. When the category is assumed to have a plurality of categories, the category selection unit 26 may sequentially select one category from among the plurality of categories.

還有，當從調査品類輸入接收部20來輸入品類的時候，品類選擇部26，可以選擇該當被輸入的品類。藉此，文件分析系統1，就可以確實地選擇那些由使用者所輸入的品類。 Further, when the category is input from the survey item input receiving unit 20, the category selection unit 26 can select the category to be input. Thereby, the document analysis system 1 can surely select those categories input by the user.

評分算出部116，從文件資訊中被抽樣出來的文件，把顯示該當文件資訊和訴訟或者是不實行為調査的關連度的區別符號連結上去，再把顯示連結的強度之評分，對上述品類逐一計算出來。評分算出部116，可以把上述評分，按照時間序列算出來。還有，對於上述評分的算出方法，在後面會再詳細地說明。 The score calculation unit 116 links the file that is sampled from the file information, and displays the difference between the document information and the lawsuit or the degree of connection that is not implemented as the investigation, and then ranks the strength of the display link one by one. Calculated. The score calculation unit 116 can calculate the score in time series. Further, the method of calculating the above score will be described in detail later.

提示部130，會把利用評分算出部116所算出的評分，提示給使用者，讓使用者更能掌握。提示部130，例如，把上述評分顯示在設定的顯示部(圖中未示)中，藉此，可以把該當的評分提示給使用者。文件分析系統1，會把作為對象的文件，是否適合的任一品類加以提示，可以讓使用者更能掌握。 The presentation unit 130 presents the score calculated by the score calculation unit 116 to the user so that the user can grasp it more. The presentation unit 130 displays the score on the set display unit (not shown), for example, whereby the score can be presented to the user. The file analysis system 1 will prompt the user as a suitable file for any type of item, which can be more easily grasped by the user.

檢索部30，係將包含在文件資訊中的關鍵字以及/或者是文章，從複數個的文件之中加以檢索。藉此，文件分析系統1，可以對被包含在上述文件資訊中的關鍵字以及/或者是文章，加以抽樣。 The search unit 30 searches for a keyword and/or an article included in the file information from among a plurality of files. Thereby, the document analysis system 1 can sample the keywords and/or the articles included in the above-mentioned file information.

第1自動區別部201，藉由字彙檢索部114，檢索關鍵字資料庫104所儲存的關鍵字，再藉由文件抽樣部112，從包含該當關鍵字的文件的文件資訊加以抽樣的話，對該當被抽樣的文件，根據關鍵字對應資訊，自動地加上識別的區別符號。 The first automatic distinguishing unit 201 searches the keyword stored in the keyword database 104 by the vocabulary search unit 114, and samples the file information of the file including the keyword by the file sampling unit 112. The sampled file is automatically added with the recognized distinguishing symbol based on the keyword corresponding information.

第2自動區別部301，關連用語資料庫所儲存的包含關連用語的文件是從文件資訊加以抽樣，該當被抽樣文件所包含的關連用語的評價值，以及根據該當關連用語的數值，來算出評分的話，在包含上述關連用語的文件之中，對該當評分超過一定值的文件，根據該當評分以及關連用語對應資訊，自動地加上設定的區別符號。 The second automatic distinguishing unit 301, the related content stored in the related term database is included The document of the term is sampled from the document information, and the score of the relevant term contained in the sampled document, and the score based on the value of the related term, is scored in the document containing the above related term. The file of a certain value is automatically added with the set difference symbol based on the rating and the corresponding information of the related term.

區別符號接收賦予部131，對從文件資訊被抽樣出來，而沒被加上區別符號的複數個的文件，根據使用者與訴訟之間的關連性，接收其被加上的區別符號，來加上該當區別符號。 The difference symbol reception providing unit 131 receives a plurality of files from which the file information is sampled without being distinguished, and receives the added difference symbol according to the relationship between the user and the lawsuit. The difference sign should be on.

文件解析部118，係利用區別符號接收賦予部131，來解析被加上了區別符號的文件。還有，文件解析部118，會根據與訴訟的關連性，從使用者所接收區別符號，加在所賦予的文件上，在第1自動區別部201以及第2自動區別部301中，根據關鍵字，關連用語，評分，解析那些自動地被加上區別符號的文件，由使用者接收然後加上區別符號的上述文件，和自動地加上區別符號的上述文件，兩者統合，也可以得到總合性的解析結果。這時候，第3自動區別部401，可以根據該當總合性的解析結果，自動地加上區別符號。 The file analysis unit 118 analyzes the file to which the distinguishing symbol is added by the difference symbol reception providing unit 131. Further, the file analysis unit 118 adds the difference symbol from the user to the file to be attached based on the relevance to the lawsuit, and the first automatic difference unit 201 and the second automatic difference unit 301 are based on the key. Words, related terms, scores, parsing files that are automatically distinguished by a symbol, and the above-mentioned files that are received by the user and then added with the distinguishing symbols, and the above-mentioned files that automatically add the distinguishing symbols, can be obtained by combining Analytical results of totality. At this time, the third automatic distinguishing unit 401 can automatically add the distinguishing symbol based on the analysis result of the totality.

還有，區別以及調査作業的進行過程中，利用字彙檢索進行自動區別，由使用者所進行的區別以及調査的接收，使用評分來進行的自動區別以及調査，透過學習過程來進行的自動區別以及調査，透過品質保證來進行的自動區別以及調査等等，有各式各樣的進行方式。上述多樣的區別以及調査作業，是以怎麼樣的順序，怎麼樣的組合來進行，會有顯示相關資訊的進行履歴，同時，把加上了區別符號的複數個的文件交由文件解析部118來進行解析，也可以在後述的報告作成部701中來報告該當解析的結果。 In addition, the difference and the progress of the investigation work are automatically distinguished by the vocabulary search, the difference between the user and the reception of the survey, the automatic distinction and the survey using the score, the automatic distinction through the learning process, and Surveys, automatic differentiation through quality assurance, surveys, etc., are available in a variety of ways. The above-mentioned various differences and the investigation work are performed in a combination of how and in what order, and the related information is displayed, and a plurality of files to which the distinguishing symbols are added are passed to the file analysis unit 118. The analysis may be performed, and the result of the analysis may be reported in a report creation unit 701 to be described later.

第3自動區別部401中，利用區別符號接收賦予部131來加上區別符號的文件，會藉由文件解析部118，根據解析後的結果，對從文件資訊所抽樣出來的複數個的文件，自動地加上區別符號。 In the third automatic distinguishing unit 401, the difference symbol receiving and providing unit 131 adds a file of the distinguishing symbol, and the file analyzing unit 118 analyzes the plurality of files sampled from the file information based on the analyzed result. Automatically add the difference symbol.

傾向資訊生成部124中，由於文件解析部118會進行解析，根據各文件所包含的單字的種類，出現次數，單字的評價值，來生成傾向資訊，也就是顯示各文件所具有的區別符號，與被加上區別符號的文件之間的類似的程度。 In the trend information generating unit 124, the file analysis unit 118 analyzes the number of occurrences of the word included in each file, and the evaluation value of the word is generated. The tendency information is to show the degree of similarity between the different symbols of each file and the files to which the distinguishing symbols are added.

品質檢査部501中，把用區別符號接收賦予部131所接收的區別符號，和用文件解析部118以傾向資訊所加上的區別符號，兩者作比較，來檢證由區別符號接收賦予部131所接收的區別符號的妥當性。 The quality inspection unit 501 compares the difference symbol received by the difference symbol reception providing unit 131 with the difference symbol added by the file analysis unit 118 with the tendency information, and verifies the difference symbol reception providing unit. The appropriateness of the distinguishing symbols received by 131.

學習部601，係根據將文件加以區別處理的結果，來學習各關鍵字或者是關連用語的加權。學習部601，係根據從第1到第4的處理結果(如後所述)，把各關鍵字或者是關連用語的加權，利用式(2)來加以學習。學習部601，也可以將該當學習結果，反映在關鍵字資料庫104，關連用語資料庫105，或者是評分算出資料庫106中。 The learning unit 601 learns the weighting of each keyword or related term based on the result of distinguishing the files. The learning unit 601 learns the weight of each keyword or related term based on the processing results from the first to fourth (described later) using equation (2). The learning unit 601 may also reflect the learning result in the keyword database 104, the related term database 105, or the score calculation database 106.

報告作成部701中，依照把文件區別處理的結果，因應訴訟案件或者是不實行為調査的調査種類，來輸出最適當的調査報告。還有，如前所述，在訴訟案件中，例如，包含反壟斷，專利，海外賄賂禁止(FCPA)，製造物責任(PL)等等。還有，在不實行為調査中，例如，包含資訊洩漏，虛構的請求等等。 The report creation unit 701 outputs the most appropriate investigation report in accordance with the result of the difference processing of the document, in response to the litigation case or the type of investigation in which the investigation is not performed. Also, as mentioned above, in litigation cases, for example, include antitrust, patents, overseas bribery prohibitions (FCPA), manufacturing liability (PL), and the like. Also, in the absence of investigations, for example, including information disclosure, fictitious requests, and so on.

律師校正接收部133，提升區別調査和報告的品質，為了使區別調査和報告的責任能更加明確，所以要接受主任律師或者是主任專利師的校正。 The lawyer corrects the receiving department 133 to improve the quality of the difference investigation and report. In order to make the responsibility for distinguishing the investigation and report more clear, it is necessary to accept the correction of the chief lawyer or the chief patent attorney.

言語判定部120，用來判定被抽樣的文件的言語的種類。 The speech determining unit 120 is configured to determine the type of speech of the sampled file.

翻譯部122，把接收來自使用者的指定，或者是，自動地，把抽樣後的文件，進行翻譯。這種情形下，可以對應單一文字多言語的複合言語，把言語判定部中的言語的小段落，縮小到比單一文字還要小。還有，對言語的判定，預測編碼(Predictive Coding)，字符編碼之任一編碼，或者是兩種編碼一起用，亦無妨。再者，HTML(Hyper Text Markup Language)的標頭等等，也可以從翻譯對象中除去。 The translation unit 122 receives the designation from the user or automatically translates the sampled file. In this case, it is possible to reduce the small paragraph of the speech in the speech determination unit to be smaller than the single text in a compound language in which a single word and a plurality of words are combined. Also, the judgment of speech, Predictive Coding, any encoding of character encoding, or both encodings may be used. Furthermore, HTML (Hyper Text Markup Language) headers and the like can also be removed from the translated object.

資料儲存部100，為了用在訴訟或者是不實行為調査的解析，把從複數個的電腦或者是伺服器中所取得的數位資訊，儲存在數位資訊儲存領域101中。還有，資料儲存部100，包含有：調査基礎資料庫 103，關鍵字資料庫104，關連用語資料庫105，評分算出資料庫106，以及，報告作成資料庫107。還有，資料儲存部100，如第1圖所示，也可以是文件分析系統1的內部所包含的記錄媒體，也可以是與該當文件分析系統1可進行通訊而連結的外部的記錄媒體。 The data storage unit 100 stores the digital information acquired from a plurality of computers or servers in the digital information storage area 101 for use in litigation or not for analysis. Further, the data storage unit 100 includes: a survey basic database 103, a keyword database 104, a related term database 105, a score calculation database 106, and a report creation database 107. Further, as shown in FIG. 1, the data storage unit 100 may be a recording medium included in the file analysis system 1, or may be an external recording medium that is communicably connected to the file analysis system 1.

調査基礎資料庫103是用來儲存：例如，包含有反壟斷，專利，海外賄賂禁止(Foreign Corrupt Practices Act；FCPA)，製造物責任(Products Liability；PL)等等的訴訟案件，以及/或者是，包含有資訊洩漏，虛構的請求等等的不實行為調査，屬於上述任一種的事件屬性，公司名稱，負責人，監督管理員，以及，調査或者是區別輸入畫面的結構。 The survey base database 103 is used to store, for example, litigation cases including antitrust, patents, Foreign Corrupt Practices Act (FCPA), Goods Liability (PL), etc., and/or Including the disclosure of information, fictitious requests, etc., is not an investigation, and belongs to any of the above-mentioned event attributes, company name, person in charge, supervisor, and investigation, or the structure of the input screen.

關鍵字資料庫104是用來儲存：被包含在所取得的數位資訊中，文件的識別的區別符號，與該當識別的區別符號有密接關係的關鍵字，以及，顯示該當識別的區別符號和該當關鍵字之間的對應關係之關鍵字對應資訊。 The keyword database 104 is configured to store: a distinguishing symbol included in the acquired digital information, a distinguishing symbol of the file, a keyword closely related to the recognized distinguishing symbol, and a display of the recognized distinguishing symbol and the The keyword correspondence information of the correspondence between the keywords.

關連用語資料庫105中，儲存關連用語對應資訊.也就是儲存了：設定的區別符號，在該當設定的區別符號所加上去的文件上，所出現頻率高的單字所組成的關連用語，以及，顯示該當設定的區別符號和關連用語的對應關係之關連用語對應資訊。 In the related term database 105, the related term information is stored. That is, the stored difference symbol is set, and the related words of the high frequency appearing on the file added to the set difference symbol, and Correspondence information corresponding to the correspondence between the set difference symbol and the related term is displayed.

評分算出資料庫106是用來儲存：為了算出用來顯示文件和區別符號相連結的強度的評分，而被包含在該當文件中的字彙之加權。 The score calculation database 106 is for storing: in order to calculate a score for displaying the strength of the link of the file and the distinguishing symbol, the weight of the vocabulary included in the file.

報告作成資料庫107，係因應品類，監督管理員，區別作業的內容，來維持被制定的報告書的形式。 The report creation database 107 maintains the form of the prepared report in response to the category, the supervisor, and the content of the assignment.

資料庫管理部109，是用來管理：調査基礎資料庫103，關鍵字資料庫104，關連用語資料庫105，評分算出資料庫106，以及，報告作成資料庫107的資料內容的更新。資料庫管理部109，也可以透過專用接續線或者是網際網路回線901，連接到資訊儲存裝置902。這種情形下，資料庫管理部109，也可以根據資訊儲存裝置902中所儲存的資料的內容，來更新調査基礎資料庫103，關鍵字資料庫104，關連用語資料庫105，評分算出資料庫106，以及，報告作成資料庫107的資料內容。 The database management unit 109 is for managing the survey basic database 103, the keyword database 104, the related term database 105, the score calculation database 106, and the update of the data content of the report creation database 107. The database management unit 109 may also be connected to the information storage device 902 via a dedicated connection line or an internet connection line 901. In this case, the database management unit 109 may also update the survey basic database 103, the keyword database 104, and related terms according to the content of the data stored in the information storage device 902. The database 105, the score calculation database 106, and the data content of the report creation database 107.

如前所述，在以往的文件分析系統，如果不同的品類所屬的文件與調査對象的文件混在一起的話，各自各別的文件中所包含的記載的差異就會對應不起來，把上述區別符號自動地加在該當文件上(對該當文件加以評分)的精度就會有不足之虞。 As described above, in the conventional file analysis system, if the files belonging to different categories are mixed with the files of the survey object, the differences in the records included in the respective files will not correspond, and the above-mentioned difference symbols will be used. The accuracy of automatically adding to the document (scoring the file) will be insufficient.

另一方面，文件分析系統1中，由於可以因應文件所屬的品類，來算出評分，所以可以用高精度來分析文件資訊。 On the other hand, in the file analysis system 1, since the score can be calculated in accordance with the category to which the file belongs, the file information can be analyzed with high precision.

還有，文件分析系統1，具備有：一品類選擇部，把前述複數個的文件所包含的各自各別的文件，當作是可分類的指標，也就是第1品類，來加以選擇；和一評分算出部，構成前述文件資訊的文件，把顯示該當文件資訊和訴訟或者是不實行為調査的關連度之區別符號相結合的強度所呈現的評分，利用前述品類選擇部，對所選擇的第1品類加以評分算出。前述品類選擇部，在前述評分算出部針對前述第1品類算出評分之後，會進一步選擇與該當第1品類不同之第2品類。前述評分算出部，也可以利用前述品類選擇部，對被選擇後的第2品類，進一步算出前述評分。 Further, the document analysis system 1 includes: a category selection unit that selects each of the respective files included in the plurality of files as a categorizable index, that is, a first category; and a score calculation unit that constructs a file of the file information, and displays a score displayed by the intensity of the combination of the document information and the litigation or the difference symbol of the degree of correlation that is not implemented as the survey, and uses the category selection unit to select the selected The first category is scored and calculated. When the score calculation unit calculates a score for the first category, the category selection unit further selects a second category different from the first category. The score calculation unit may further calculate the score for the selected second category by using the category selection unit.

〔用語的說明〕 [Explanation of terms]

「區別符號」，是指為了把文件加以分類所使用的識別子，為了讓文件在訴訟中可以更容易地利用，來顯示與該當訴訟的關連度的識別子。例如，在訴訟中將文件資訊作為證據來使用時，可以加上對應於證據的種類。 "Differential symbol" refers to the identifier used to classify a document, and the identifier for the degree of relevance to the litigation is displayed in order to make the file more easily available in litigation. For example, when using document information as evidence in a lawsuit, the type corresponding to the evidence can be added.

「文件」，是指包含一個以上的單字的資料，例如，電子郵件，簡報資料，表計算資料，協商資料，契約書，組織圖，事業計畫書等等。 "Document" means information containing more than one word, such as e-mail, briefing materials, table calculation data, consultation materials, contract letters, organization charts, business plans, etc.

「單字」是，具有意義之最少的文字列的簡縮。例如，在「所謂的文件，是指包含一個以上的單字之資料。」這樣的文章中，包含了「文件」，「一個」，「以上」，「單字」，「包含」，「資料」，「所謂」這些個單字。 "Single word" is the simplification of the least meaningful character string. For example, in the article "The so-called file refers to data containing more than one word." This article contains "documents", "one", "above", "single word", "include", "data", "so-called" Some words.

「關鍵字」，是指在某種言語中，具有一定的意義的文字列的群集。例如，從「把文件加以區別」這樣子的文章中選定關鍵字的話，就可以把「文件」，「區別」作為關鍵字。在本實施形態中，就可以把「侵權」或「訴訟」，或者說「專利公報○○號」等等的關鍵字，重點性地加以選定。還有，上述「關鍵字」，也可以包含有形態要素。 "Keyword" refers to a cluster of text columns that have a certain meaning in a certain language. . For example, if a keyword is selected from an article that "distinguishes a file", "file" and "distinction" can be used as keywords. In the present embodiment, keywords such as "infringement" or "litigation" or "patent bulletin ○○" can be selected in an important manner. Further, the above "keyword" may include a form factor.

「關鍵字對應資訊」，是指顯示出關鍵字和識別的區別符號之間對應關係的資訊。例如，在訴訟中，顯示重要的文件的「重要」這樣的區別符號，如果和「侵權者」這樣的關鍵字有密切連接的關係時，上述「關鍵字對應資訊」就可以把區別符號「重要」和關鍵字「侵權者」加以串接，作為管理上的資訊。 The "keyword correspondence information" refers to information showing the correspondence between the keyword and the recognized difference symbol. For example, in a lawsuit, a distinguishing symbol such as "important" of an important document is displayed. If there is a close relationship with a keyword such as "infringer", the above "keyword correspondence information" can distinguish the symbol "important". And the keyword "infringer" is connected in series as management information.

「關連用語」，和所加上設定的區別符號的文件是共通的，在出現頻率高的單字之中，評價值在一定數值以上的用語。在這裡，出現頻率也可以是指，例如，在一個文件中登場的單字的總數之中，關連用語所出現的比例。 The "relevant term" is a term that is common to the file to which the set difference symbol is added. The word whose evaluation value is equal to or greater than a certain value among the words with a high frequency of occurrence. Here, the frequency of occurrence may also refer to, for example, the proportion of the related words among the total number of words appearing in a file.

「評價值」，是各單字在某文件中，顯示出其所發揮的資訊量的值。「評價值」，也可以把傳達資訊量當作基準而算出來。例如，把設定的商品名稱，當作是區別符號而加上去時，上述「關連用語」，也可以是指該當商品所屬的技術領域的名稱，該當商品的販賣國，該當商品的類似商品名等等。具體而言，把進行畫像符號化處理的裝置的商品名稱，當作是區別符號而加上去的「關連用語」，被舉出來的有「符號化處理」，「日本」，「編碼器」等等。 The "evaluation value" is a value in which each word is displayed in a file and the amount of information it plays. The "evaluation value" can also be calculated by using the amount of information to be transmitted as a reference. For example, when the set product name is added as a difference symbol, the above-mentioned "connected language" may also refer to the name of the technical field to which the product belongs, the country of sale of the product, the similar product name of the product, and the like. Wait. Specifically, the product name of the device for symbolizing the image is treated as a "connected language" that is added to the symbol, and "symbolized", "Japan", "encoder", etc. are cited. Wait.

「關連用語對應資訊」，係表示關連用語與區別符號的對應關係的資訊。例如，與訴訟有關係的商品名之「製品A」的區別符號，若是具有製品A的機能之「畫像符號化」的關連用語的話，「關連用語對應資訊」，就可以是把區別符號「製品A」和關連用語「畫像符號化」予以繫合，而加以管理的資訊。 "Related language correspondence information" is information indicating the correspondence between related terms and distinguishing symbols. For example, if the distinguishing symbol of the "product A" of the product name related to the lawsuit is the related term of "image symbolization" of the function of the product A, the "relational term correspondence information" may be the difference symbol "product". A" and the related term "portrait symbolization" are linked and managed.

「評分」，是指在文件中，把與識別的區別符號相結合的強度，用定量性的評價值予以陳述。在本發明之各實施形態中，例如，用以下的式(1)，藉由在文件中出現的單字和各單字所持有的評價值，將評分算出來。 "Score" refers to the combination of the recognized distinguishing symbols in the document. Intensity is stated by quantitative evaluation values. In each of the embodiments of the present invention, for example, the score is calculated by the following formula (1) by the evaluation value held by the single word and each word appearing in the document.

Scr：文書的評分 Scr: the score of the instrument

m _i：第i個關鍵字或者是關連用語的出現頻率 m _i : the frequency of the ith keyword or related language

：第i個關鍵字或者是關連用語的權重 : the weight of the i-th keyword or related language

第2圖是，評分算出部116對某文件的評分，以逐一品類加以算出，顯示該樣態的模式圖。如前所述，評分算出部116，係將上述評分依逐一品類地加以算出。具體而言，如第2圖所示，每個品類，上述權重(係數，參數)是事先決定(學習)好的，評分算出部116，會對應於藉由品類選擇部26所選擇的品類，將上述權重依次代入，對於文件的評分依品類逐一計算出來。 In the second drawing, the score calculation unit 116 calculates the scores of a certain file, and classifies them one by one, and displays a pattern diagram of the pattern. As described above, the score calculation unit 116 calculates the scores one by one. Specifically, as shown in FIG. 2, the weight (coefficient, parameter) of each category is determined in advance (learning), and the score calculation unit 116 corresponds to the category selected by the category selection unit 26. Substituting the above weights in turn, the scores for the files are calculated one by one according to the category.

亦即，前述品類選擇部，是前述評分算出部用來算出前述評分的係數，把前述每個品類預先決定的係數依次代入，選擇前述品類。前述評分算出部，利用前述品類選擇部所選擇的品類，使用對應的前述係數，逐一算出前述品類的前述評分。 In other words, the category selection unit is a coefficient for calculating the score by the score calculation unit, and sequentially substitutes the coefficient determined in advance for each category to select the category. The score calculation unit calculates the score of the category one by one using the corresponding coefficient by the category selected by the category selection unit.

因此，在文件分析系統1，可以對想得到的所有的品類，一個接一個地算出上述評分。例如，對品類A，B，C雖然算出低的評分，但是有可能對品類D算出高的評分。藉此，使用對每個品類逐一算出的上述評分，由於可以多角化地分析事案，所以文件分析系統1，可以以高精度來分析文件資訊。 Therefore, in the document analysis system 1, the above score can be calculated one by one for all the categories that are desired. For example, although the classifications A, B, and C have low scores, it is possible to calculate a high score for the category D. Thereby, the above-mentioned score calculated one by one for each category is used, and since the case can be analyzed in a multi-dimensional manner, the file analysis system 1 can analyze the file information with high precision.

在文件分析系統1中，使用者所加上的區別符號，也可以對共通的文件中頻頻出現的單字予以抽樣。然後，對每個文件中所包含，該當抽樣的單字的種類，各單字所具有的評價值，以及出現次數的傾向資訊，對每個文件逐一加以解析。由區別符號接收賦予部131所無法接收區別符號的文件當中，對於與解析後的傾向資訊有相同傾向的文件，也可以加上共通的區別符號。 In the file analysis system 1, the distinguishing symbols added by the user can also sample the words that appear frequently in the common file. Then, for each file, the type of the sampled word, the evaluation value of each word, and the tendency information of the number of occurrences are included, and each file is analyzed one by one. Among the files in which the difference symbol reception providing unit 131 cannot receive the difference symbol, the file having the same tendency as the analyzed tendency information is also A common difference symbol can be added.

在這裡，「傾向資訊」是指，各文件所具有，用來表示與加上區別符號的文件有類似比例的資訊，根據各文件所包含的單字的種類，出現次數，單字的評價值，來顯示與設定的區別符號的關連度的資訊。例如，各文件，其加上設定的區別符號的文件，和與該當設定的區別符號的關連度若是類似的話，該當兩個文件就可以說是有相同的傾向資訊。還有，即使所包含的單字的種類不一樣，對於評價值相同的單字，若是包含相同的出現次數的文件，也可以當作是具有相同傾向的文件。 Here, "prone information" means that each file has a similar ratio of information to a file with a distinguishing symbol, and the number of occurrences of the word included in each file, the evaluation value of the word, Information showing the degree of correlation with the set difference symbol. For example, if the file, the file with the set difference symbol, and the degree of association with the set difference symbol are similar, the two files can be said to have the same tendency information. Further, even if the types of the included words are different, the words having the same number of occurrences can be regarded as files having the same tendency if the words having the same evaluation value are included.

〔在文件分析系統1中被執行的處理〕 [Processing performed in the file analysis system 1]

第3圖，是在文件分析系統1中，所顯示被執行的處理(本發明之實施形態相關的文件分析方法)的其中一例的流程圖。還有，在以下的說明，括弧中的「~步驟」，係顯示其包含在上述文件分析方法(文件分析系統1的控制方法)中的各個步驟。 FIG. 3 is a flowchart showing an example of the executed processing (the file analysis method according to the embodiment of the present invention) in the file analysis system 1. In the following description, the "~ step" in the parentheses displays the respective steps included in the above-described file analysis method (the control method of the file analysis system 1).

最初，品類選擇部26，係將複數個的文件所包含的各自各別的文件，作為可分類的指標也就是品類，來加以選擇(品類選擇步驟，步驟41，以下將「步驟」簡記為「S」)。接下來，在評分算出部116中，構成文件資訊的文件，會與顯示該當文件資訊和訴訟或者是不實行為調査的關連度的區別符號結合起來，把顯示結合起來的強度之評分，利用品類選擇部26，依被選擇出來的品類，逐一地計算出來(評分算出步驟，S42)。品類選擇部26，會判定是否選擇了所有的品類，如果還殘存著未選擇的品類的話(在S43中顯示為NO)，就選擇與前回選擇品類不同的品類(S41)，評分算出部116會進一步算出上述的評分(S42)。 First, the category selection unit 26 selects each of the respective files included in the plurality of files as a categorizable index, that is, a category (category selection step, step 41, hereinafter "step" is simply referred to as " S"). Next, in the score calculation unit 116, the file constituting the document information is combined with the difference symbol indicating the degree of relevance of the document information and the litigation or the non-implementation of the survey, and the score of the strength of the display is combined, and the category is utilized. The selection unit 26 calculates one by one based on the selected category (score calculation step, S42). The item selection unit 26 determines whether or not all the items are selected, and if there are still unselected items remaining (displayed as NO in S43), the item selected from the previous selection category is selected (S41), and the score calculation unit 116 The above score is further calculated (S42).

接著，關於本發明之文件分析方法的詳細內容，參照圖面，具體地加以說明。還有，在以下的說明例只是其中一例，本發明並不被限定於該例。 Next, the details of the document analysis method of the present invention will be specifically described with reference to the drawings. Note that the following description is merely an example, and the present invention is not limited to this example.

第4圖，本發明之實施形態相關的文件分析方法的詳細的流程圖。還有，第3圖中所示的流程，可以從第4圖中所示的流程中作為獨立處理來加以執行，也可以在第4圖中所示的流程中的任意的地方作為內部子流程處理，來加以執行。 Figure 4 A detailed flowchart of the file analysis method according to the embodiment of the present invention. Further, the flow shown in FIG. 3 can be executed as an independent process from the flow shown in FIG. 4, or can be used as an internal sub-flow anywhere in the flow shown in FIG. Process it to execute it.

因應顯示部中顯示畫面的顯示，接收由使用者所指定的引數，例如，包含反壟斷，專利，FCPA，PL的訴訟案件，或者是包含資訊洩漏，虛構的請求的不實行為調査，可以識別從這些引數所對應的品類(S11)。 In response to the display of the displayed screen in the display section, receiving the arguments specified by the user, for example, a litigation involving an antitrust, patent, FCPA, PL, or a non-implementation of a fictitious request containing an information disclosure, may The category corresponding to these arguments is identified (S11).

因應於被識別的品類，可以將調査基礎資料庫，文件分析資料庫等的使用資料庫加以識別(S12)。 In accordance with the identified category, the use database of the survey basic database, the document analysis database, and the like can be identified (S12).

為了確認使用資料庫是否是最新的內容，可以對儲存著最新資料庫的資訊儲存裝置進行存取。資訊儲存裝置，有時是被設置在實施區別的組織的內部，有時是被設置在組織的外部。資訊儲存裝置被設置在組織的外部的情況，例如，有時會被設置在合作的法律事務所或者是專利事務所中。 In order to confirm whether the use of the database is up-to-date, the information storage device storing the latest database can be accessed. The information storage device is sometimes disposed inside the organization that implements the difference, and is sometimes placed outside the organization. The information storage device is set outside the organization, for example, sometimes in a cooperative law firm or a patent office.

在對資訊儲存裝置進行存取資料時，為了保障其安全性，可以進行ID以及密碼的認證(S13)。 When accessing the information storage device, in order to secure the security, ID and password authentication can be performed (S13).

在進行完認證之後，允許對資訊收納裝置進行存取，調査基礎資料庫，文件分析資料庫等的使用資料庫，就可以被更新為指標性質的資料庫(S14)。 After the authentication is completed, the information storage device is allowed to be accessed, and the usage database such as the basic database, the file analysis database, and the like can be updated, and the database can be updated to an index (S14).

對被更新後的調査基礎資料庫進行檢索(S15)，顯示裝置的畫面上，可以提示公司名，負責人，監督管理員的名字(S16)。 The searched basic database is searched (S15), and the name of the company, the person in charge, and the name of the supervisor can be presented on the screen of the display device (S16).

顯示裝置的畫面上所顯示的負責人和監督管理員的名字，如果和實際的負責人和監督管理員的名字不一樣的話，使用者會在顯示裝置的畫面上，修正負責人和監督管理員的名字。文件分析系統，可以接收使用者的修正輸入，來識別實際的負責人和監督管理員的名字(S17)。 The name of the person in charge and the supervisory administrator displayed on the screen of the display device, if the name of the actual person in charge and the supervisor is different, the user will correct the person in charge and the supervisor in the screen of the display device. name. The file analysis system can receive the user's correction input to identify the actual person in charge and the supervisor's name (S17).

接下來，為了實施文件分析作業，可以對數位文件資訊進行抽樣(S18)。 Next, in order to carry out the file analysis job, the digital file information can be sampled (S18).

作為被更新後的文件分析資料庫，對被更新後的關鍵字資料庫，關連用語資料庫，以及評分算出資料庫進行檢索(S19)，可以對抽樣文件資訊加上區別符號(S20)。 As the updated file analysis database, the updated keyword database, the related term database, and the score calculation database are searched (S19), and the sample file information can be distinguished (S20).

還有，可以讓校閱者接收區別符號，對抽樣文件資訊加上區別符號(S21)。 Also, the reviewer can receive the distinguishing symbol and add a distinguishing symbol to the sampled document information (S21).

將區別結果作為指引資料，對資料庫進行檢索，可以把區別符號加到抽樣文件資訊上面去(S22)。 The difference result is used as the guidance material, and the database is searched, and the difference symbol can be added to the sample file information (S22).

可以接收由主任律師或者是專利師所做出的校正(S23)。藉此，可以提升調査的品質。 It is possible to receive a correction made by a chief lawyer or a patent attorney (S23). In this way, the quality of the survey can be improved.

利用使用者之引數指定，將品類予以識別(S24)，因應於被識別的品類，可將報告作成資料庫予以識別(S25)。再利用被識別的報告作成資料庫，決定報告書的形式，可以自動地輸出報告書(S26)。 The category is identified by the user's argument designation (S24), and the report can be identified as a database in accordance with the identified category (S25). The report is created using the identified report, and the form of the report is determined, and the report can be automatically output (S26).

第5圖，係對應於本發明之實施形態相關的文件分析方法中的調査種類，來表示所成之調査以及區別處理的流程之流程圖。 Fig. 5 is a flow chart showing the flow of the investigation and the difference processing in accordance with the type of investigation in the document analysis method according to the embodiment of the present invention.

最初，可以輸入調査種類(S31)。亦即，因應顯示畫面的表示，使用者可以輸入：例如，包含反壟斷，專利，海外賄賂禁止(FCPA)，製造物責任(PL)之訴訟案件，或者是包含資訊洩漏，虛構的請求之不實行為調査所欲著手進行的調査，以及對應於區別作業之品類。文件分析系統，可以接收使用者的品類輸入，把成為調査對象的品類予以識別。 Initially, the type of survey can be entered (S31). That is, in response to the indication of the display screen, the user can input: for example, a lawsuit containing antitrust, patent, overseas bribery prohibition (FCPA), manufacturing liability (PL), or a request containing information leakage, fictitious request Conduct surveys that are conducted for the investigation and the categories that correspond to the different assignments. The document analysis system can receive the user's category input and identify the category to be investigated.

因應於被識別的品類，可以判定其調査以及文件分析處理的種類，和所使用的資料庫的種類(S32)。 The type of the investigation and the file analysis processing and the type of the database to be used can be determined in accordance with the identified item (S32).

對應於被識別的品類，可以對被儲存在調査基礎資料庫，文件分析資料庫等的使用資料庫中的資訊貯存進行存取(S33)。 Corresponding to the identified category, access to the information stored in the usage database of the survey basic database, the file analysis database, etc. may be accessed (S33).

對應於被識別的品類，對調査基礎資料庫進行存取，可以顯示對應於被識別的品類之各關鍵字的輸入畫面(S34)。 In response to the identified category, the survey base database is accessed, and an input screen corresponding to each keyword of the identified category can be displayed (S34).

對應於被識別的品類，對調査基礎資料庫進行存取，可以顯示對應於被識別的品類之各文章的輸入畫面(S35) The access to the survey basic database is performed corresponding to the identified category, and an input screen corresponding to each article of the identified category can be displayed (S35)

對應於被識別的品類，對調査基礎資料庫進行存取，對應於被識別的品類，可以對關鍵字或者是文件進行抽樣(S36)。 The survey basic database is accessed corresponding to the identified category, and the keyword or the file may be sampled corresponding to the identified category (S36).

藉由執行上述的處理，可以對自動加上區別符號(預測編碼)的指引資料進行加權的追加(S37)。 By performing the above processing, it is possible to automatically add a distinguishing symbol (predictive editing) The guidance data of the code is added by weighting (S37).

藉由對文件分析資料庫進行關鍵字檢索，可以對抽樣文件以及資訊進行縮小範圍(S38)。 The sample file and the information can be narrowed down by performing keyword search on the document analysis database (S38).

第6圖，是本發明之實施形態相關的文件分析方法中，顯示因應於調査種類的預測編碼的流程之流程圖。 Fig. 6 is a flow chart showing the flow of predictive coding in accordance with the type of survey in the file analysis method according to the embodiment of the present invention.

在本發明之實施形態相關的文件分析方法中，最初，文件分析系統會因應調査的種類，要求使用者進行輸入，可以接收使用者對此事件的輸入。例如，與反壟斷法相關連，關於聯合壟斷，有關的對象製品，關係者(姓名和電子郵件位址)，關係組織(名稱和部門)以及時期，要求使用者輸入，可以接收使用者對此事件的輸入。其他還有，針對有關係的組織，關於競爭對手企業和顧客企業，要求使用者進行輸入，可以接收使用者對此事件的輸入(S51)。 In the file analysis method according to the embodiment of the present invention, initially, the file analysis system requests the user to input according to the type of the investigation, and can receive the user's input of the event. For example, related to antitrust law, about joint monopoly, related object products, related parties (name and e-mail address), relationship organization (name and department), and period, requiring user input, can receive users The input of the event. In addition, for the related organization, the competitor company and the customer enterprise are required to input the user, and the user can receive the input of the event (S51).

接下來，藉由輸入關鍵字，可以加上區別符號，進行加權(S52)。然後，可以進行預測編碼(S53)。 Next, by inputting a keyword, it is possible to add a distinguishing symbol and perform weighting (S52). Then, predictive coding can be performed (S53).

在本發明之實施形態中，作為一例，依照如第7圖所示之流程圖，在第1階段~第5階段，進行登錄處理，區別處理，以及檢査處理。 In the embodiment of the present invention, as an example, in the first to fifth stages, the registration processing, the difference processing, and the inspection processing are performed in accordance with the flowchart shown in FIG.

在第1階段，使用過去的區別處理的結果，事先進行關鍵字和關連用語的更新登錄(S100)。這時候，關鍵字以及關連用語，區別符號和關鍵字或者是關連用語的對應資訊，也就是關鍵字對應資訊以及關連用語對應資訊，會同時被更新登錄。 In the first stage, the update registration of the keyword and the related term is performed in advance using the result of the past difference processing (S100). At this time, the keywords and related terms, the distinguishing symbols and the keywords or the corresponding information of the related terms, that is, the keyword corresponding information and the related information of the related terms, will be updated and registered at the same time.

在第2階段是，對第1階段中包含有更新登錄的關鍵字的文件，從全文件資訊中予以抽樣，如果發現該文件的話，即參照在第1階段中所記錄的更新關鍵字對應資訊，加上對應於該關鍵字的區別符號，進行第1區別處理(S200)。 In the second stage, the file containing the keyword for updating the registration in the first stage is sampled from the full file information, and if the file is found, the updated keyword corresponding information recorded in the first stage is referred to. The first difference processing is performed by adding the difference symbol corresponding to the keyword (S200).

在第3階段是，對第1階段中包含有更新登錄的關連用語的文件，從第2階段中沒有被加上區別符號的文件資訊中予以抽樣，計算出包含有該關連用語的文件的評分。該計算出來的評分，參照於在第1 階段中更新登錄的關連用語對應資訊，加上區別符號，進行第2區別處理(S300)。 In the third stage, the file containing the related term of the update registration in the first stage is sampled from the file information in which the difference symbol is not added in the second stage, and the score of the file including the related term is calculated. . The calculated score is referred to in the first In the stage, the related term information corresponding to the login is updated, and the difference symbol is added to perform the second difference process (S300).

在第4階段是，對於到第3階段為止仍沒有加上區別符號的文件資訊，則接收使用者所加上的區別符號，對該文件資訊，加上由使用者所接收的區別符號。接下來，對加上由使用者所接收的區別符號的文件資訊進行解析，根據解析結果，對那些沒有加上區別符號的文件進行抽樣，對抽樣後的文件加上區別符號，進行第3區別處理。例如，對該使用者所加上的區別符號在共通的文件中頻頻出現的語彙進行抽樣，包含在每個文件中，抽樣後的單字的種類，各單字所具有的評價值，以及出現次數的傾向資訊，對每個文件逐一加以解析，對於與該傾向資訊具有相同傾向的文件，則加上共通的區別符號(S400)。 In the fourth stage, for the file information that has not been distinguished by the third stage, the difference symbol added by the user is received, and the difference information received by the user is added to the file information. Next, the file information added by the difference symbol received by the user is parsed, and according to the analysis result, the files without the distinguishing symbols are sampled, and the sampled file is distinguished by the difference symbol, and the third difference is made. deal with. For example, the distinguishing symbols added to the user are sampled in the vocabulary frequently appearing in the common file, and are included in each file, the type of the sampled word, the evaluation value of each word, and the number of occurrences. The tendency information is analyzed one by one for each file, and a common difference symbol is added to the file having the same tendency as the tendency information (S400).

在第5階段是，對於在第4階段中使用者有加上區別符號的文件，根據解析後的傾向資訊，決定所應加上的區別符號，該決定的區別符號，與使用者所加上的區別符號，兩者相比較，來檢證區別處理的妥當性(S500)。還有，因應必要，也可以根據文件分析處理的結果，來進行學習處理。 In the fifth stage, for the file in which the user has a distinguishing symbol in the fourth stage, the difference symbol to be added is determined based on the analyzed tendency information, and the difference symbol of the decision is added to the user. The difference symbol is compared with the two to verify the validity of the difference processing (S500). Also, if necessary, the learning process can be performed based on the results of the file analysis processing.

在第4階段以及第5階段的處理中所使用的傾向資訊，是指各文件所具有的，與加上區別符號後的文件之間類似的比例，可以說是根據各文件所包含的單字的種類，出現次數，單字的評價值所產生的資訊。例如，各文件為加上設定的區別符號的文件，和該設定的區別符號的關連度相類似時，可以說該兩個文件具有相同的傾向資訊。還有，即使所包含的單字的種類不一樣，對於評價值相同的單字包含有相同的出現次數的文件，也可以把它當成具有相同的傾向的文件。 The tendency information used in the processing of the fourth stage and the fifth stage refers to a similar ratio between the files and the files with the distinguishing symbols, which can be said to be based on the words included in each file. Type, number of occurrences, information generated by the evaluation value of a word. For example, when each file is a file with a set difference symbol, and the degree of connection of the set difference symbol is similar, it can be said that the two files have the same tendency information. Further, even if the types of the included words are different, the files having the same number of occurrences for the same word having the same evaluation value can be regarded as files having the same tendency.

從第1階段到第5階段的各階段中的詳細的處理流程，說明如下。 The detailed processing flow in each stage from the first stage to the fifth stage will be described below.

<第1階段(S100)> <Phase 1 (S100)>

在第1階段中的關鍵字資料庫104的詳細的處理流程，用第8圖加以說明。 The detailed processing flow of the keyword database 104 in the first stage will be described using FIG.

關鍵字資料庫104，在過去的訴訟中，依據把文件進行區別後的結果，將各自各別的區別符號逐個地，作成管理用的一覽表，把對應於各區別符號的關鍵字予以識別(S111)。該識別，係於本發明之實施形態中，把被加上各區別符號的文件予以解析，採用該文件中之各關鍵字的出現次數以及評價值，使用關鍵字所具有的傳達資訊量的方法，使用者也可以採用以手動來選擇的方法。 In the past, the keyword database 104 identifies the keywords for each of the different distinguishing symbols one by one based on the result of distinguishing the documents, and identifies the keywords corresponding to the respective distinguishing symbols (S111). ). This identification is an embodiment of the present invention, and the file to which each distinguishing symbol is added is analyzed, and the number of occurrences of each keyword in the file and the evaluation value are used, and the method of transmitting the information amount by the keyword is used. The user can also adopt the method of selecting by hand.

在本發明之實施形態中，例如，作為區別符號「重要」的關鍵字，「侵權」以及「專利師」這樣的關鍵字被識別時，會作成顯示「侵權」以及「專利師」和區別符號「重要」有密切連接的關係的關鍵字之關鍵字對應資訊(S112)。然後，把被識別的關鍵字登錄在關鍵字資料庫104中。此時，把被識別的關鍵字和關鍵字對應資訊加上關係，登錄到關鍵字資料庫104的區別符號「重要」的管理一覽表中(S113)。 In the embodiment of the present invention, for example, when keywords such as "important" are distinguished, "infringement" and "patent" are recognized, "infringement" and "patent" and distinguishing symbols are displayed. "Important" keyword correspondence information of a keyword having a closely related relationship (S112). The identified keywords are then logged in the keyword database 104. At this time, the recognized keyword and the keyword correspondence information are added to the relationship and registered in the management list of the distinguishing symbol "important" of the keyword database 104 (S113).

接下來，關連用語資料庫105的詳細的處理流程，用第9圖來說明。關連用語資料庫105是，過去的訴訟中，依據把文件予以區別的結果，對每個各自各別的區別符號，作成管理用的一覽表，把對應於各區別符號的關連用語進行登錄(S121)。在本發明之實施形態中，例如，作為「製品A」的關連用語，作為「符號化處理」以及「製品a」連同「製品B」的關連用語，將「解碼」以及「製品b」予以登錄。 Next, the detailed processing flow of the related term database 105 will be described using FIG. In the related litigation database 105, in the past litigation, based on the result of discriminating the documents, a list for management is created for each of the respective distinguishing symbols, and the related terms corresponding to the respective distinguishing symbols are registered (S121). . In the embodiment of the present invention, for example, as a related term of "product A", "decoding" and "product b" are registered as "signification processing" and "product a" and "product B" related terms. .

登錄後的各自各別的關連用語，會顯示對應怎麼樣的區別符號，據此作成關連用語對應資訊(S122)，記錄在各管理一覽表中(S123)。這時候，在關連用語對應資訊中，各關連用語所具有的評價值，以及對決定區別符號所必要的評分之界限值，也一併予以記錄。 Each of the related terms after the registration is displayed, and the relevant difference symbol is displayed, and the related term correspondence information is generated (S122), and is recorded in each management list (S123). At this time, in the related information of the related language, the evaluation value of each related term and the limit value of the score necessary for determining the difference symbol are also recorded.

實際上，在進行區別作業之前，關鍵字和關鍵字對應資訊，以及關連用語和關連用語對應資訊，會用最新的資訊加以更新登錄(S113，S123)。 In fact, before the difference operation, the keyword and keyword correspondence information, as well as the related terms and related terms, will be updated and registered with the latest information (S113, S123).

<第2階段(S200)> <Second stage (S200)>

第2階段中的第1自動區別部201的詳細的處理流程，用第10圖加以說明。本發明之實施形態中，在第2階段，藉由第1自動區別部201，把區別符號「重要」付加在文件上來進行處理。 The detailed processing flow of the first automatic distinguishing unit 201 in the second stage will be described with reference to FIG. In the embodiment of the present invention, in the second stage, the first automatic distinguishing unit 201 is used. The difference symbol "important" is added to the file for processing.

在第1自動區別部201，於第1階段(S100)，把包含有登錄在關鍵字資料庫104中的關鍵字「侵權」以及「專利師」的文件，從文件資訊中進行抽樣(S211)。對該抽樣的文件，從關鍵字對應資訊，參照該關鍵字所被記錄的管理一覽表(S212)，加上「重要」這個區別符號(S213)。 In the first automatic segmentation unit 201, in the first stage (S100), the file including the keyword "infringement" and "patent" registered in the keyword database 104 is sampled from the file information (S211). . For the sampled file, the management information list (S212) in which the keyword is recorded is referred to from the keyword correspondence information, and the difference symbol "important" is added (S213).

<第3階段(S300)> <Phase 3 (S300)>

第3階段中的第2自動區別部301的詳細的處理流程，用第11圖來加以說明。 The detailed processing flow of the second automatic distinguishing unit 301 in the third stage will be described with reference to FIG.

本發明之實施形態中，在第2自動區別部301，於第2階段(S200)中，對於那些沒被加上區別符號的文件資訊，將被加上「製品A」以及「製品B」這樣的區別符號，來進行處理。 In the second automatic segmentation unit 301, in the second stage (S200), "product A" and "product B" are added to the file information in which the distinguishing symbols are not added. The difference symbol is to be processed.

在第2自動區別部301中，從該文件資訊中，在第1階段把記錄於關連用語資料庫105中包含關連用語「符號化處理」，「製品a」，「解碼」以及「製品b」的文件進行抽樣(S311)。對於該抽樣的文件，根據記錄的4個的關連用語的出現頻率，評價值，使用式(1)，由評分算出部116算出評分(S312)。該評分係顯示各文件與區別符號「製品A」以及「製品B」之間的關連度。 In the second automatic distinguishing unit 301, the related information "symbolization processing", "product a", "decoding", and "product b" are included in the related term database 105 from the file information in the first stage. The file is sampled (S311). For the sampled file, the evaluation value is calculated based on the frequency of occurrence of the four related terms recorded, and the score is calculated by the score calculation unit 116 using the equation (1) (S312). This rating shows the degree of connection between each document and the distinguishing symbols "Product A" and "Product B".

當該評分超過界限值時，會參照關連用語對應資訊(S313)，來加上適當的區別符號(S314)。 When the score exceeds the threshold value, the related term information (S313) is referred to, and an appropriate distinguishing symbol is added (S314).

例如，在某個文件中，關連用語「符號化處理」以及「製品a」的出現頻率連同關連用語「符號化處理」所具有的評價值高，顯示與區別符號「製品A」的關連度之評分超過界限值時，對該文件就加上區別符號「製品A」。 For example, in a certain file, the frequency of occurrence of the related terms "symbolization processing" and "product a" is higher than the evaluation value of the related term "symbolization processing", and the degree of correlation with the distinguishing symbol "product A" is displayed. When the score exceeds the limit value, the difference symbol "Product A" is added to the file.

這時候，在該文件中，如果關連用語「製品b」的出現頻率高，顯示與區別符號「製品B」的關連度之評分也超過界限值的話，則對該文件併同區別符號「製品A」，加上「製品B」。另一方面，在該文件中，如果關連用語「製品b」的出現頻率低，顯示與區別符號「製品B」的關連度的評分沒有超過界限值的話，則在該文件中，就只要加上區別符號「製品A」即可。 At this time, in the document, if the frequency of occurrence of the related term "product b" is high, and the degree of correlation between the display and the distinguishing symbol "product B" exceeds the threshold value, the document is also distinguished by the symbol "product A". Add "Product B". On the other hand, in this document, if the frequency of occurrence of the related term "product b" is low, the display and the distinguishing symbol "system If the score of the degree of correlation of the product B" does not exceed the threshold value, the difference symbol "product A" may be added to the file.

在第2自動區別部301，第4階段的S432中，使用被計算出來的評分，藉由如下所示的式(2)，把關連用語的評價值重新計算，進行該評價值的加權(S315)。 In the second automatic distinguishing unit 301, in the fourth stage S432, the calculated score is used, and the evaluation value of the related term is recalculated by the following formula (2), and the evaluation value is weighted (S315). ).

wgt _i,0：學習前的第i個選定關鍵字的權重(初期值) Wgt _{i, 0} : weight of the i-th selected keyword before learning (initial value)

wgt _i,L：第L回學習後的第i個選定關鍵字的權重 Wgt _i,L : the weight of the i th selected keyword after the Lth learning

γ _L：第L回學習中的學習參數 γ _L : learning parameters in the L-th learning

：學習效果的界限值 : the threshold value of the learning effect

例如，「解碼」的出現頻率雖然非常高，但是評分卻比某設定值以上來得低，像這樣的文件發生了某設定次數以上的話，關連用語「解碼」的評價值就會被調降，然後再度記錄到關連用語對應資訊中。 For example, although the frequency of occurrence of "decoding" is very high, the score is lower than a certain set value. If a file like this is set more than a certain number of times, the evaluation value of the related word "decoding" will be lowered, and then Record again in the related information of the related language.

<第4階段(S400)> <Phase 4 (S400)>

第4階段，如第12圖所示，在第3階段之前的處理，從那些沒加上區別符號的文件資訊所抽樣的一定的比例的文件資訊，接收由校正者加上去的區別符號，對該當文件資訊加上其所接收的區別符號。接下來，如第13圖所示，解析那些加上由校正者所接收到的區別符號的文件資訊，根據這些解析結果，對沒加上區別符號的文件資訊，加上區別符號。還有，在本發明之實施形態中，對於該文件資訊，在第4階段中，進行加上例如，「重要」，「製品A」以及「製品B」的區別符號之處理。針對第4階段，進一步記載如下。 In the fourth stage, as shown in Fig. 12, before the third stage, the file information of a certain proportion sampled from the information of the file without the distinguishing symbol is received, and the difference symbol added by the corrector is received, The file information plus the difference symbol it receives. Next, as shown in Fig. 13, the file information of the distinguishing symbols received by the corrector is analyzed, and based on the analysis results, the file information without the distinguishing symbol is added, and the distinguishing symbol is added. Further, in the embodiment of the present invention, in the fourth stage, the processing of adding the distinguishing symbols of "important", "product A" and "product B" in the fourth stage is performed. For the fourth stage, it is further described as follows.

第4階段中的區別符號接收賦予部131的詳細的處理流程，用第12圖來加以說明。從第4階段中成為處理對象的文件資訊中，首先，資訊抽樣部24，會隨機地對文件加以採樣，然後在文件顯示部130上顯示出來。在本發明之實施形態，會對成為處理對象的文件資訊中的大約兩成的文件予以隨機地抽樣，由校正者來決定區別對象。採樣方式，也可以依照文件的作成日時排序，名稱排序，把文件並列出來，從上面開始選擇三成的文件來進行抽樣。 The detailed processing flow of the difference symbol reception providing unit 131 in the fourth stage will be described with reference to FIG. In the file information to be processed in the fourth stage, first, the information sampling unit 24 randomly samples the file and displays it on the file display unit 130. In the embodiment of the present invention, about 20% of the file information to be processed is randomly sampled, and the corrector determines the difference object. Sampling method, You can also sort the files according to the time of the file, sort the names, list the files, and select 30% of the files from the top to sample.

使用者閱覽顯示於文件顯示部130上的，第18圖所示之文件顯示畫面11，對各文件選擇所要加上去的區別符號。區別符號接收賦予部131，接收該使用者所選擇的區別符號(S411)，根據所加上去的區別符號來予以區別(S412)。 The user views the file display screen 11 shown in FIG. 18 displayed on the file display unit 130, and selects the difference symbol to be added to each file. The difference symbol reception providing unit 131 receives the difference symbol selected by the user (S411), and distinguishes it based on the added difference symbol (S412).

接下來，文件解析部118的詳細的處理流程，使用第13圖來加以說明。文件解析部118，會在區別符號接收賦予部131中，對被用區別符號逐一區別的文件，與其共通，頻頻出現的單字，進行抽樣(S421)。把抽樣的共通的單字的評價值，利用式(2)來加以解析(S422)，解析該共通單字的文件中的出現頻率(S423)。 Next, the detailed processing flow of the file analysis unit 118 will be described using FIG. The file analysis unit 118, in the difference symbol reception providing unit 131, performs sampling on the files that are distinguished by the difference symbols one by one, and the frequently appearing words are frequently used. (S421). The evaluation value of the sampled common word is analyzed by the equation (2) (S422), and the appearance frequency in the file of the common word is analyzed (S423).

進一步，利用S422以及S423，根據解析後的結果，把有加上「重要」的區別符號的文件，解析其傾向資訊(S424)。 Further, in S422 and S423, based on the result of the analysis, the file having the "significant" difference symbol is analyzed, and the tendency information is analyzed (S424).

第14圖中，由S424，被加上「重要」這樣的區別符號的文件中，對共通且頻頻出現的單字進行解析的結果之圖表。 In Fig. 14, a graph of the results of parsing the common and frequently occurring single words in the file of the distinguishing symbol "important" is added to S424.

在第14圖中，縱軸R_hot是，由使用者來加上區別符號「重要」的全文件之中，被當作與區別符號「重要」繫合的單字，包含被選定的單字，且顯示出被加上區別符號「重要」的文件的比例。横軸是，在使用者實施了區別處理的全文件之中，利用區別符號接收賦予部131，在S421，顯示出包含有被抽樣的單字之文件的比例。 In Fig. 14, the vertical axis R_hot is a single word that is added to the difference symbol "important" by the user, and is included as a single word with the distinguishing symbol "important", including the selected word, and displayed. The proportion of documents that are distinguished by the symbol "important". On the horizontal axis, the difference symbol reception providing unit 131 is used in the entire file in which the user performs the discrimination processing, and the ratio of the file including the sampled single word is displayed in S421.

本發明之實施形態中，區別符號接收賦予部131，係將直線R_hot=R_all被繪製在上部的單字，作為區別符號「重要」中的共通的單字，來進行抽樣。 In the embodiment of the present invention, the difference symbol reception providing unit 131 performs sampling by using the line in which the straight line R_hot=R_all is drawn on the upper part as a common word in the distinction symbol "important".

S421到S424的處理，對加上了「製品A」以及「製品B」這樣的區別符號的文件予以執行，解析該文件的傾向資訊。 The processing of S421 to S424 is performed on a file to which a distinguishing symbol such as "product A" and "product B" is added, and the tendency information of the file is analyzed.

接下來，第3自動區別部401的詳細的處理流程，用第15圖來說明。第3自動區別部401中，在第4階段的處理對象的文件資訊當中，在S411利用區別符號接收賦予部131，對那些沒有接收區別符號的文件進行處理。在第3自動區別部401，將這樣子的文件，在S424進行解析，把有加上區別符號為「重要」，「製品A」以及「製品B」的文件的傾向資訊，和具有相同傾向資訊的文件，進行抽樣(S431)，針對抽樣的文件，依照傾向法則，用式(1)算出評分(S432)。還有，對在S431抽樣出來的文件，根據傾向資訊，加上適當的區別符號(S433)。 Next, the detailed processing flow of the third automatic distinguishing unit 401 will be described using FIG. In the third automatic distinguishing unit 401, among the file information of the processing target of the fourth stage, the distinguishing symbol receiving providing unit 131 is used in S411, and those that do not receive the distinguishing character are used. The number of files is processed. The third automatic distinguishing unit 401 analyzes such a file in S424, and adds the difference information to the "important", "product A" and "product B" files, and the same tendency information. The file is sampled (S431), and for the sampled document, the score is calculated by the formula (1) according to the tendency rule (S432). Further, for the file sampled at S431, an appropriate distinguishing symbol is added based on the tendency information (S433).

在第3自動區別部401，進一步，使用以S432所算出來的評分，把區別結果反映在各資料庫上(S434)。具體而言，也可以把評分低的文件所包含的關鍵字以及關連用語的評價值往下調，把評分高的文件所包含的關鍵字以及關連用語的評價值往上調。 Further, the third automatic distinguishing unit 401 further reflects the difference result on each of the databases using the score calculated in S432 (S434). Specifically, the keyword included in the file with a low score and the evaluation value of the related term may be lowered, and the keyword included in the file with a high score and the evaluation value of the related term may be raised.

第3自動區別部401的詳細的處理流程的其中一例，利用第16圖來進一步加以說明。第3自動區別部401，在第4階段的處理對象的文件資訊之中，也可以在S411，利用區別符號接收賦予部131，對無法接收那些加上區別符號的文件，進行區別處理。在第3自動區別部401中，如果給不出引數的話(S441：無)，將該文件，用S424來加以解析，然後對區別符號被加上「重要」的文件之傾向資訊，和具有相同的傾向資訊的文件，進行抽樣(S442)，針對抽樣後的文件，依照傾向資訊，使用式(1)，來算出評分(S443)。還有，對於在S442中所抽樣的文件，根據傾向資訊，加上適當的區別符號(S444)。 An example of the detailed processing flow of the third automatic distinguishing unit 401 will be further described using FIG. In the file information of the processing target of the fourth stage, the third automatic distinguishing unit 401 may perform the distinguishing process by using the difference symbol receiving and providing unit 131 in S411, and the file in which the distinguishing symbol is not received. In the third automatic distinguishing unit 401, if the argument is not available (S441: Nil), the file is analyzed by S424, and then the tendency information of the "important" file is added to the distinguishing symbol, and The file of the same tendency information is sampled (S442), and the score is calculated using the formula (1) based on the tendency information for the sampled file (S443). Further, for the file sampled in S442, an appropriate distinguishing symbol is added based on the tendency information (S444).

在第3自動區別部401，進一步使用S443所算出的評分，把區別結果反映在各資料庫上(S445)。具體而言，評分低的文件所包含的關鍵字以及關連用語的評價值會被調降，另一方面，評分高的文件所包含的關鍵字以及關連用語的評價值會被調高。 The third automatic distinguishing unit 401 further uses the score calculated in S443 to reflect the difference result on each of the databases (S445). Specifically, the keyword and the evaluation value of the related term included in the file with a low score are lowered. On the other hand, the keyword included in the file with a high score and the evaluation value of the related term are raised.

如上所述，會利用第2自動區別部301和第3自動區別部401兩者來算出評分，如果評分算出的回數變多時，把用來評分算出的資料，一起儲存在評分算出資料庫106中也無妨。 As described above, the score is calculated by both the second automatic distinguishing unit 301 and the third automatic distinguishing unit 401. When the number of rounds calculated by the score is increased, the data used for the score calculation is stored together in the score calculation database. 106 is no problem.

<第5階段(S500)> <Phase 5 (S500)>

第5階段中的品質檢査部501的詳細的處理流程，用第17圖來說明。品質檢査部501中，區別符號接收賦予部131是，在S411對於接收的文件，根據文件解析部118在S424所解析的傾向資訊，決定應該加上去的區別符號(S511)。 The detailed processing flow of the quality inspection unit 501 in the fifth stage will be described with reference to FIG. In the quality inspection unit 501, the difference symbol reception providing unit 131 is for receiving at S411. The file determines the difference symbol to be added based on the tendency information analyzed by the file analysis unit 118 in S424 (S511).

區別符號接收賦予部131所接收的區別符號，在S511和決定後的區別符號相比較(S512)，來檢證在S411所接收的區別符號的妥當性(S513)。 The difference symbol received by the difference symbol reception providing unit 131 is compared with the determined difference symbol in S511 (S512), and the validity of the difference symbol received in S411 is verified (S513).

本發明之實施形態相關的文件分析系統1，也可以具備有學習部601。學習部601，會依據從第1到第4的處理結果，把各關鍵字或者是關連用語的加權，利用式(2)來加以學習。也可以把該學習結果，反映在關鍵字資料庫104，關連用語資料庫105，或者是評分算出資料庫106中。 The file analysis system 1 according to the embodiment of the present invention may include a learning unit 601. The learning unit 601 learns the weight of each keyword or related term based on the processing results from the first to the fourth, using equation (2). The learning result may also be reflected in the keyword database 104, the related term database 105, or the score calculation database 106.

本發明之實施形態相關的文件分析系統1，可具備一報告作成部701，會依據文件分析處理的結果，輸出符合訴訟案件(例如，訴訟中常見的聯合壟斷‧專利‧FCPA‧PL等等)或者是不實行為調査(例如，資訊洩漏，虛構的請求等等)的調査種類之最適當的調査報告。 The document analysis system 1 according to the embodiment of the present invention may include a report creation unit 701 that outputs a compliance suit according to the result of the file analysis processing (for example, a joint monopoly ‧ patent ‧ FCPA ‧ PL, etc. common in litigation) Or the most appropriate investigation report for the type of investigation that is not investigated (eg, information disclosure, fictitious requests, etc.).

依調査種類之不同，調査的內容也會不一樣。 The content of the survey will vary depending on the type of survey.

例如，如果是聯合壟斷案件的話， For example, if it is a joint monopoly case,

1．競爭對手的負責人對聯合壟斷相關連的默契(價格的調整)，是在何時‧如何取得的呢？ 1. How did the person in charge of the competitor’s tacit understanding of the joint monopoly (price adjustment) be obtained?

2．關係者是怎樣的組織裡面的誰呢？ 2. Who is the person in the organization?

這些會成為檢視的要點。 These will become the main points of the inspection.

還有，要是有專利侵權的話， Also, if there is patent infringement,

1．成為侵權對象的技術和內容一樣嗎？ 1. Is the technology and content of the infringing object the same?

2．是誰？何時？具有怎樣的意圖(或不具有)？侵權嗎？或沒侵權嗎？ 2. who is it? when? What is the intention (or not)? Infringement? Or is it not infringing?

這些事項都是檢核要點。 These matters are the main points of inspection.

針對本發明之實施形態的其他的實施例相關的文件調査報告系統，以及文件調査報告方法，連同文件調査報告程式，記載如下。 A document investigation report system, a document investigation report method, and a document investigation report program according to another embodiment of the embodiment of the present invention are described below.

本發明之實施形態中，其他的實施例相關的文件調査報告系統，對應於類似的檢索資訊，解析那些已經加上區別符號的文件，根據解析結果，調整加上區別符號的範圍。然後根據那些加上調整後的區別符號的範圍，進行區別作業以及調査作業，再根據區別作業以及調査作業的結果，作成報告。 In the embodiment of the present invention, the file investigation report system related to other embodiments analyzes the files to which the distinguishing symbols have been added corresponding to the similar search information, and adjusts the range of the distinguishing symbols according to the analysis result. Then, based on the range of the adjusted difference symbols, the difference operation and the investigation work are performed, and the report is made based on the difference operation and the result of the investigation operation.

對應於類似的檢索資訊，有作為調整付加區別符號的範圍的方法，也有對應於類似的檢索資訊，群聚類似的檢索資訊，調整加上區別符號的範圍的方法，和學習區別結果，進行預測區別的方法。對應於類似的檢索資訊，將類似的檢索資訊予以群聚，調整加上區別符號範圍的方法中，例如，有時候會著重在原始資料的共通性，在原文件，原文件的回信文件，原文件的回信文件的回信文件中，加上共通的區別符號。至於學習區別結果，來進行預測區別的方法，是指藉由針對區別結果，將類似的檢索資訊予以統合，加以學習，來針對類似的檢索資訊，加上同一或者是類似的區別符號。 Corresponding to the similar search information, there is a method for adjusting the range of the differential sign, and there are similar search information, clustering similar search information, adjusting the method of distinguishing the range of the symbol, and learning the difference result to make prediction. The difference between the methods. Corresponding to similar search information, similar search information is clustered, and the method of adjusting the range of distinguishing symbols is, for example, sometimes focusing on the commonality of the original materials, the original documents, the return files of the original documents, and the original documents. In the reply file of the reply letter, the common difference symbol is added. As for the method of learning the difference, the method for predicting the difference means that similar search information is integrated and learned for the difference result, and the same or similar difference symbol is added to the similar search information.

在本發明之實施形態的其他的實施例中，依成為解析對象的文件的件數，解析結果的信頼性也會有所變化。也可以針對那些成為區別的對象的文件的總件數，加上統計手法，在某個時點，針對全文件的某個比例，根據解析結果來加上區別符號，調整範圍來決定。 In another embodiment of the embodiment of the present invention, the reliability of the analysis result may vary depending on the number of files to be analyzed. It is also possible to add statistical methods to the total number of files that are the objects of the difference, and at some point, for a certain proportion of the entire file, the difference is added according to the analysis result, and the range is adjusted.

在本發明之實施形態的其他的實施例，有些方法是：對應於類似的檢索資訊，調整付加區別符號的範圍；也有些方法是：對應於類似的檢索資訊，把檢索資訊予以群聚之後，調整付加區別符號的範圍；也可以在學習區別結果後，進行預測區別的方法，將兩種方法加以執行，調整付加區別符號的文件的範圍，亦可。 In other embodiments of the embodiments of the present invention, some methods are: adjusting the range of the added differential symbols corresponding to similar retrieval information; and some methods are: after the search information is clustered corresponding to similar retrieval information, Adjust the range of the difference sign; you can also perform the method of predicting the difference after learning the difference result, and execute the two methods to adjust the range of the file to which the difference symbol is added.

利用本發明之實施形態的其他的實施例相關的文件調査報告系統，以及文件調査報告方法，連同文件調査報告程式，根據這些區別作業以及調査的結果，作成報告。 The document investigation report system and the document investigation report method according to the other embodiments of the embodiment of the present invention, together with the document investigation report program, create a report based on the results of these different operations and investigations.

藉此，本發明之實施形態的其他的實施例相關的文件調査報告系統，以及文件調査報告方法，連同文件調査報告程式，可以迅速確實地作成調査報告，同時，減輕區別作業以及報告作成作業所伴隨而來的負担。 Therefore, the document investigation report system and the document investigation report method according to other embodiments of the embodiment of the present invention can be quickly confirmed together with the document investigation report program. In the field, a survey report is prepared, and at the same time, the burden associated with the difference work and the report creation work is alleviated.

本發明之實施形態的其他的實施例，對使用者而言，可以具備有一畫面控制部，用來控制那些提示調査種類判定部所抽樣的資訊的種類的顯示畫面。 In another embodiment of the embodiment of the present invention, the user may be provided with a screen control unit for controlling the display screens for indicating the types of information sampled by the survey type determining unit.

本發明之實施形態的其他的實施例中，可以具有一輸入接收部，對應於被提示在顯示畫面控制部中的資訊種類，接收由使用者所輸入的關鍵字以及/或者是文章。 In another embodiment of the embodiment of the present invention, an input receiving unit may be provided to receive a keyword and/or an article input by the user in accordance with the type of information presented to the display screen control unit.

本發明之實施形態，係針對訴訟案件或者是不實行為調査案件的品類，藉由接收使用者的輸入，因應品類，以程式自動地更新資料庫。藉此，把負責人，監督管理員的姓名等加以輸入的事務作業的負担會大大地減輕。還有，藉由因應品類來自動更新的資料庫，來調整檢索字彙，使用被調整後的檢索字彙，對該當文件資訊自動地賦予一個區別符號。藉此，訴訟或者是不實行為調査案件中所利用的文件資訊的區別作業，其負担就會大大地減輕。亦即，藉由本發明，訴訟中所利用的文件資訊的分析會更加容易。 The embodiment of the present invention automatically updates the database by a program in response to a case or a category that is not implemented as a survey case by receiving input from the user. In this way, the burden of the business of inputting the name of the person in charge, the name of the supervisor, etc., is greatly reduced. Further, the search vocabulary is adjusted by the database automatically updated in response to the category, and the adjusted search vocabulary is used to automatically assign a distinguishing symbol to the document information. In this way, the burden of litigation or the non-implementation of the difference in the information used to investigate the documents used in the investigation will greatly reduce the burden. That is, with the present invention, analysis of document information utilized in litigation is made easier.

〔利用軟體來執行的例子〕 [Example of execution using software]

文件分析系統1的控制區塊，可以藉由積體電路(IC晶片)等所形成的邏輯電路(硬體)來加以執行，也可以使用CPU(Central Processing Unit)，藉由軟體來執行。如果是後者的話，文件分析系統1，具備有：執行各機能之軟體也就是程式(控制程式)的命令之CPU，上述程式，以及可將各種資料儲存在用電腦(或者是CPU)可讀取的ROM(Read Only Memory)，或者是記憶裝置(這些稱之為「記錄媒體」)，和將上述程式予以展開的RAM(Random Access Memory)等等。然後，電腦(或者是CPU)將上述程式從上述記錄媒體中讀取出來，藉由執行其中的程式，來達成本發明之目的。作為上述記錄媒體，可以是「非暫時性的有形的媒體」，例如，卡帶，光碟片，磁卡，半導體記憶體，可程式化的邏輯電路等等。還有，上述程式，將該程式也可以透過可傳輸的任意的傳送媒體 (通信網路或播放波等)，傳輸到上述電腦中。本發明中的上述程式，利用電子式的傳送，可具體展現，也可有用植入於傳輸波中的資料信號的形態，來予以執行。 The control block of the file analysis system 1 can be executed by a logic circuit (hardware) formed by an integrated circuit (IC chip) or the like, or can be executed by a CPU using a CPU (Central Processing Unit). In the latter case, the file analysis system 1 has a CPU that executes a command of each function software, that is, a program (control program), the above program, and can store various data in a computer (or CPU) to be readable. The ROM (Read Only Memory) is either a memory device (these are referred to as "recording media"), a RAM (Random Access Memory) that expands the above program, and the like. Then, the computer (or the CPU) reads the above program from the recording medium, and executes the program therein to achieve the object of the present invention. The recording medium may be a "non-transitory tangible medium" such as a cassette, a disc, a magnetic card, a semiconductor memory, a programmable logic circuit, or the like. Also, the above program can also use the program to transmit any transmission medium that can be transmitted. (communication network or playing wave, etc.), transferred to the above computer. The above-described program in the present invention can be specifically embodied by electronic transmission, or can be executed by using a form of a data signal embedded in a transmission wave.

具體而言，本發明所相關的文件分析程式，係取得記錄於設定的電腦或者是伺服器中的資訊，從該當被取得的資訊中所包含的，複數個的文件所構成的文件資訊，加以分析之文件分析程式。在電腦中，可以執行：一品類選擇機能，將包含在前述複數個的文件中之各自各別的文件，選擇其可分類的指標也就是品類；和一評分算出機能，在構成前述文件資訊的文件中，顯示該當文件資訊和訴訟或者是不實行為調査的關連度之區別符號，與其予以連結的強度之評分，藉由前述品類選擇機能，對每個被選擇出來的品類逐一算出其評分。 Specifically, the file analysis program related to the present invention acquires information recorded in a set computer or a server, and extracts file information composed of a plurality of files included in the acquired information. Analysis of the file analysis program. In the computer, it can be executed: a category selection function, which will include each of the plurality of files in the foregoing plurality of files, and select a categorizable index, that is, a category; and a score calculation function, which constitutes the information of the aforementioned file. In the document, the difference between the document information and the litigation or the degree of correlation that is not implemented as the survey is displayed, and the score of the strength of the link is calculated, and the scores of each of the selected categories are calculated one by one by the above-mentioned category selection function.

上述品類選擇機能，可藉由上述品類選擇部26來執行。還有，上述評分算出機能，可藉由上述評分算出部116來執行。無論是哪一種，其詳細做法皆如上所述。 The above-described category selection function can be executed by the above-described category selection unit 26. Further, the above-described score calculation function can be executed by the above-described score calculation unit 116. In either case, the detailed practices are as described above.

〔付記事項〕 [payment matters]

本發明並未受上述之各自各別的實施形態所限定，在申請專利範圍中所示的範圍有種的可能的變更，不同的實施形態中，各自各別開示的技術手段，可透過適當的組合來得到不同的實施形態，即使如此，亦包含在本發明之技術範圍中。再者，利用各實施形態中分別開示的技術手段之組合，也可以形成新的技術特徴。 The present invention is not limited by the respective embodiments described above, and the scope shown in the scope of the patent application is In the different embodiments, different technical means can be used to obtain different embodiments through appropriate combinations. Even in this case, it is included in the technical scope of the present invention. Further, new technical features can be formed by using a combination of the technical means disclosed in the respective embodiments.

本發明是一文件分析系統，其係一種取得記錄在複數個的電腦或者是伺服器中的數位資訊，分析該被包含在所取得的數位資訊中，由複數個的文件所構成的文件資訊，可以使訴訟或者是不實行為調査的利用更加容易之文件分析系統，具備有：一調查基儲資料庫，用來儲存與前述訴訟或者是不實行為調査相關連的資訊；一調査品類輸入接收部，用來接收前述訴訟或者是不實行為調査的品類的輸入；和一調査種類判定部；根據前述調査品類輸入接收部所接收的品類，把作為調査對象的調査品類加以判定，再從前述調査基礎資料庫，將必要的資訊的種類予以抽樣。 The present invention is a file analysis system, which is a digital information recorded in a plurality of computers or servers, and analyzes the file information composed of a plurality of files included in the acquired digital information. A document analysis system that can make litigation or is not easy to use for investigation. It has: a survey base repository for storing information related to the aforementioned litigation or not for investigation; a survey category input and reception The department is configured to receive the above-mentioned lawsuit or input of a category that is not to be investigated; and a survey type determination unit; and based on the category received by the survey category input receiving unit, determine the survey item to be investigated, and then Investigate the basic database and put the necessary information Species are sampled.

前述文件分析系統，更具有一顯示畫面控制部，其特徵在於：針對使用者，用來控制那些提示前述調査種類判定部所抽樣出來的資訊的種類之顯示畫面。 Further, the file analysis system further includes a display screen control unit for controlling a display screen for prompting the type of information sampled by the investigation type determination unit.

前述文件分析系統，其特徵在於：更具有一輸入接收部，對應於在前述顯示畫面控制部中所提示的資訊的種類，接收由使用者所輸入的關鍵字以及/或者是文章。 The file analysis system further includes an input receiving unit that receives a keyword input by the user and/or an article corresponding to the type of information presented by the display screen control unit.

前述文件分析系統，其特徵在於：更具有一資訊抽樣部，自前述調査基礎資料庫，對應於前述調査種類判定部所抽樣的資訊的種類，將關鍵字以及/或者是文章予以抽樣。 The above-described document analysis system is characterized in that there is further an information sampling unit that samples a keyword and/or an article from the type of information sampled by the investigation type determination unit from the investigation basic database.

前述文件分析系統，其特徵在於：更具有一檢索部，將前述關鍵字以及/或者是文章，自前述文件中檢索出來。 The foregoing file analysis system is characterized in that there is further a search unit for retrieving the aforementioned keywords and/or articles from the aforementioned files.

前述文件分析系統，其特徵在於：更具有一自動區別符號賦予部，對前述文件，自動地加上區別符號.前述關鍵字以及/或者是文章，可以利用這一點，把前述區別符號加上去。 The above-described file analysis system is characterized in that there is an automatic distinguishing symbol giving unit that automatically adds a distinguishing symbol to the aforementioned file. The aforementioned keyword and/or the article can be used to add the aforementioned distinguishing symbol.

本發明更提供一種文件分析方法，其係一種取得被記錄在複數個的電腦或者是伺服器中的資訊，包含在該當被取得的資訊中，對從複數個的文件所構成的文件資訊加以分析，使訴訟或者是對不實行為調查的使用會更加容易之文件分析方法，該文件分析方法包含下列步驟：一調查品類接收步驟，把前述訴訟或不實行為調查的品類的輸入加以接收；和一調查種類判定步驟，根據前述調查品類輸入接收步驟，依據所接收的品類，把作為調查對象的調查品類加以判定，從儲存前述訴訟或者是不實行為調查有關的資訊的調查基礎資料庫中，把必要的資訊的種類加以抽出。 The present invention further provides a file analysis method, which is a method for obtaining information recorded in a plurality of computers or servers, and including information on files formed from a plurality of files in the information to be obtained. To make a lawsuit or a method of document analysis that would be easier to use for investigation purposes. The document analysis method includes the following steps: a survey category receiving step, receiving the aforementioned litigation or input of a category not being investigated; and The investigation type determination step is based on the input category receiving and receiving step, and the investigation item to be investigated is determined based on the received category, and is stored in a survey basic database that stores the aforementioned lawsuit or does not implement the information related to the investigation. Extract the type of information necessary.

本發明更提供一種文件分析程式，其係一種取得被記錄在複數個的電腦或者是伺服器中的資訊，包含在該當被取得的資訊中，對從複數個的文件所構成的文件資訊加以分析，使訴訟或者是對不實行為調查的使用會更加容易之文件分析程式，該程式，係於電腦中，執行：一調查品類接收機能，把前述訴訟或不實行為調查的品類的輸入加以接收；和一調查種類判定機能，根據前述調查品類輸入接收機能，係依據所接收的品類，把作為調查對象的調查品類加以判定，從儲存前述訴訟或者是不實行為調查有關的資訊的調查基礎資料庫中，把必要的資訊的種類加以抽出。 The present invention further provides a file analysis program, which is a method for obtaining information recorded in a plurality of computers or servers, and including information on files formed from a plurality of files in the information to be obtained. To make a lawsuit or a file analysis program that is easier to use for investigation purposes. The program is tied to a computer and executed: The inspected class receiver can receive the above-mentioned litigation or the input of the category not inspected for investigation; and the function of determining the type of the survey, and inputting the receiver according to the surveyed category, and classifying the surveyed object according to the received category. It is determined that the types of necessary information are extracted from the basic database of investigations that store the aforementioned litigation or that do not implement the information related to the survey.

本發明更提供一種文件分析程式，其係一種取得被記錄在設定的電腦或者是伺服器中的資訊，包含在該當被取得的資訊中，對從複數個的文件所構成的文件資訊加以分析之文件分析系統，具備有：一調査基礎資料庫，把訴訟或者是不實行為調査的原因所造成的設定的行為產生的生成過程模式，因應該當設定的行為的進展，加以分類而成的相位，逐一儲存，同時，把前述訴訟或者是不實行為調査相關連的資訊，該當訴訟或者是不實行為調査所屬的品類以及前述生成過程模式，逐一地進一步儲存，再將顯示前述相位的時間序列之時系列資訊，以及前述訴訟或者是不實行為調査相關連的複數個的人物的關係性，進一步加以儲存；和一識別部，將前述訴訟或者是不實行為調査相關連的資訊，前述生成過程模式，前述時系列資訊，以及根據前述複數個的人物的關係性，之前述文件資訊加以分析，將現在的相位予以識別。 The present invention further provides a file analysis program, which is a method for obtaining information recorded in a set computer or a server, and analyzing the file information composed of a plurality of files in the information to be obtained. The document analysis system has a generation process model that generates a basic database of investigations, and sets the behavior of the litigation or the behavior that is not caused by the investigation. Because the progress of the set behavior should be classified and stored one by one, and at the same time, the aforementioned litigation is not implemented as the relevant information of the investigation, and the litigation is not carried out as the category to which the investigation belongs and the aforementioned generation process. The mode is further stored one by one, and then the series information of the time series of the aforementioned phase is displayed, and the foregoing litigation or the relationship of the plurality of characters not related to the investigation is further stored; and a recognition unit will The foregoing lawsuit does not implement information related to the investigation, the aforementioned generation process mode, the aforementioned series of information, and the above-mentioned document information are analyzed according to the relationship of the plurality of characters, and the current phase is identified.

1‧‧‧文件分析系統 1‧‧‧Document Analysis System

11‧‧‧文件顯示畫面 11‧‧‧File display screen

24‧‧‧資訊抽樣部 24‧‧‧Information Sampling Department

26‧‧‧品類選擇部 26‧‧‧Category Selection Department

30‧‧‧檢索部 30‧‧‧Search Department

501‧‧‧品質檢査部 501‧‧‧Quality Inspection Department

601‧‧‧學習部 601‧‧‧Learning Department

701‧‧‧報告作成部 701‧‧‧Reporting Department

100‧‧‧資料儲存部 100‧‧‧Data Storage Department

101‧‧‧數位資訊儲存領域 101‧‧‧Digital Information Storage

103‧‧‧調査基礎資料庫 103‧‧‧Investigation basic database

104‧‧‧關鍵字資料庫 104‧‧‧Keyword database

105‧‧‧關連用語資料庫 105‧‧‧Connected Language Database

106‧‧‧評分算出資料庫 106‧‧‧Scoring calculation database

107‧‧‧報告作成資料庫 107‧‧‧Reporting database

109‧‧‧資料庫管理部 109‧‧‧Database Management Department

116‧‧‧評分算出部 116‧‧‧Scoring calculation department

118‧‧‧文件解析部 118‧‧‧Document Analysis Department

120‧‧‧言語判定部 120‧‧  Speech Judgment Department

122‧‧‧翻譯部 122‧‧‧Translation Department

130‧‧‧提示部 130‧‧‧ Tips Department

Claims

A file analysis system is a file analysis system for analyzing information recorded in a set computer or a server, and analyzing the file information including the information to be obtained and the plurality of files. : Having a category selection unit that treats each of the respective files included in the plurality of files as a categorizable index, that is, as a category, and a selection calculation unit; The document constituting the information of the aforementioned document is combined with the difference symbol indicating the degree of connection between the document information and the lawsuit or the non-implementation of the investigation, and the score indicating the strength of the combination is selected by the category selection unit. The category of the above categories is calculated one by one.

The document analysis system according to the first aspect of the invention, wherein the category selection unit sequentially selects the category from a plurality of categories; and the score calculation unit is sequentially selected by the category selection unit. The selected category, one by one, calculates the aforementioned score.

For example, in the document analysis system described in claim 1 or 2, wherein the category selection unit refers to the type of the aforementioned lawsuit or is not implemented as a survey, and the reason for causing the foregoing lawsuit or not for the purpose of the investigation is At least one of the progress of the set behavior, the phase to be classified, and the attribute of the aforementioned document information is selected as the aforementioned category.

The document analysis system of claim 1, wherein the document type input receiving unit is further configured to receive the foregoing lawsuit or not to input the category to which the survey belongs; the category selection unit is The aforementioned survey category input In the department, to select the category to be received.

The document analysis system according to the first aspect of the invention, further comprising: a survey type determination unit that determines the survey item to be investigated based on the category received by the survey category input receiving unit, and Investigate the basic database and sample the necessary types of information.

For example, the document analysis system described in claim 1 has an information sampling department that treats keywords and/or articles contained in the aforementioned document information as being in conflict with the foregoing lawsuit or not Investigate information about the company and sample it from the information in the document.

The document analysis system according to claim 1, wherein the file analysis system further includes a search unit for retrieving keywords and/or articles included in the file information from the plurality of files. .

The document analysis system according to the first aspect of the invention, further comprising a presentation unit that causes the user to present the score calculated by the score calculation unit to a degree that is likely to be grasped by the user.

A file analysis method for obtaining a document recorded in a set computer or a server, and analyzing a file information composed of a plurality of files included in the information acquired by the database , characterized in that it comprises the following steps: a category selection step of selecting each of the respective files included in the plurality of files by a categorizable index, that is, a category; and The step of calculating the score is a file that constitutes the information of the aforementioned document, and the document is displayed. When the information and the lawsuit are not the difference symbols of the degree of correlation between the investigations, the intensity of the combination with the difference symbol is displayed by a score, and in the above-described category selection step, the selected category is calculated one by one.

A file analysis program for obtaining information recorded in a set computer or a server, and a file analysis program for analyzing file information composed of a plurality of files in the information to be obtained, The method is characterized in that: in a computer, performing: a category selection function, and selecting each of the respective files included in the plurality of files as a categorizable index, that is, as a category, and selecting; a score calculation function, when the document constituting the information of the aforementioned document is combined with the difference symbol indicating the degree of connection between the document information and the lawsuit or the non-implementation of the investigation, the score of the combined strength is displayed by The category selected by the category selection unit calculates the scores of the above categories one by one.