WO2015173894A1 - Système d'analyse de document, procédé de commande destiné à un système d'analyse de document et programme de commande destiné à un système d'analyse de document - Google Patents

Système d'analyse de document, procédé de commande destiné à un système d'analyse de document et programme de commande destiné à un système d'analyse de document Download PDF

Info

Publication number
WO2015173894A1
WO2015173894A1 PCT/JP2014/062743 JP2014062743W WO2015173894A1 WO 2015173894 A1 WO2015173894 A1 WO 2015173894A1 JP 2014062743 W JP2014062743 W JP 2014062743W WO 2015173894 A1 WO2015173894 A1 WO 2015173894A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
unit
keyword
analysis system
sentence
Prior art date
Application number
PCT/JP2014/062743
Other languages
English (en)
Japanese (ja)
Inventor
守本 正宏
秀樹 武田
和巳 蓮子
Original Assignee
株式会社Ubic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Ubic filed Critical 株式会社Ubic
Priority to JP2015510547A priority Critical patent/JP5815911B1/ja
Priority to PCT/JP2014/062743 priority patent/WO2015173894A1/fr
Publication of WO2015173894A1 publication Critical patent/WO2015173894A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to a document analysis system for analyzing a document.
  • Patent Document 1 discloses a document separation system that analyzes a digital document collected for submission as evidence in a lawsuit and separates it so as to facilitate use in a lawsuit.
  • JP2013-214152A released on October 17, 2013
  • the score is calculated based on the evaluation value of the related term included in the extracted document and the number of the related term.
  • the present invention has been made in view of the above problems, and an object of the present invention is to provide a document analysis system or the like that can accurately calculate a score that correctly reflects the sentence meaning.
  • a document analysis system for analyzing a document, and a keyword indicating whether or not a predetermined keyword is included in a sentence included in the document.
  • a generating unit that generates a vector for each sentence, and a keyword vector generated by the generating unit is multiplied by a correlation matrix that indicates a correlation between the predetermined keyword and another keyword different from the predetermined keyword.
  • the classification code indicating the relevance between the document and the predetermined event
  • a calculation unit that calculates a score indicating the strength associated with the document.
  • the keyword vector includes, for example, whether each keyword element includes a value of “0” or “1”, so that a predetermined keyword associated with the element is included in the document.
  • This is a vector indicating whether or not.
  • the above correlation matrix indicates, for example, when the keyword “price” appears in a sentence, the ease of appearance of another keyword (for example, “adjustment”) with respect to the keyword (that is, “correlation”) in the sentence. It is a square matrix represented in each element of the matrix.
  • the document analysis system generates a keyword vector for each sentence, so that the keyword vector has a structure (expression) that can correctly reflect the sentence meaning of “sentence”. Scores can be calculated accurately so that there is a significant difference between documents.
  • the calculation unit calculates the score by calculating an inner product of the summed value and a weight vector indicating a weight for the predetermined keyword. Good.
  • the document analysis system may further include an extraction unit that extracts a sentence corresponding to the keyword vector indicating that the predetermined keyword is contained most in the document.
  • the document analysis system provides a summarizing unit that generates a summary of a document by enumerating sentences corresponding to the keyword vector indicating that the predetermined keyword is included in the document. May be further provided.
  • a phase for classifying a predetermined action that causes the predetermined case according to progress of the predetermined action is set to a score calculated by the calculation unit. You may further provide the specific
  • the document analysis system may further include a change estimation unit that estimates a change in the phase identified by the phase identification unit based on a temporal transition of the phase.
  • the document analysis system may further include a code assigning unit that assigns a classification code to the document based on the score calculated by the calculation unit.
  • a control method for a document analysis system is a control method for a document analysis system for analyzing a document, wherein a sentence includes a predetermined keyword.
  • a generation step for generating a keyword vector for each sentence, and a correlation between the predetermined keyword and another keyword different from the predetermined keyword.
  • a multiplication step for obtaining a correlation vector for each sentence by multiplying each correlation matrix, and a degree of relevance between the document and a predetermined event is shown based on a sum of all correlation vectors obtained in the multiplication step. Calculation that calculates the score that indicates the strength with which the classification code is associated with the document. And a step.
  • control method of the document analysis system has the same effect as the document analysis system.
  • a control program for a document analysis system is a control program for a document analysis system that analyzes a document, and the computer includes a predetermined keyword for a sentence included in the document.
  • a generation function for generating a keyword vector for each sentence, a keyword vector generated by the generation function, the predetermined keyword, and another keyword different from the predetermined keyword
  • the classification code indicating the degree of relevance indicates the strength associated with the document To realize a calculating function for calculating the core.
  • the document analysis system according to each aspect of the present invention may be realized by a computer.
  • a control program of the document analysis system for realizing the document analysis system in the computer by operating the computer as each unit included in the document analysis system, and a computer-readable recording medium storing the control program are also provided. It falls within the scope of the present invention.
  • control program of the document analysis system has the same effect as the document analysis system.
  • the document analysis system, the document analysis system control method, and the document analysis system control program according to one aspect of the present invention have a structure capable of correctly reflecting the sentence meaning of “sentence” by generating a keyword vector for each sentence. Since the keyword vector has (expression), there is an effect that the score can be accurately calculated so that there is a significant difference between two documents having different properties.
  • FIG. 5 is a flowchart illustrating an example of predictive coding according to a survey type in the example of the process illustrated in FIG. 4.
  • Embodiment 1 A first embodiment (Embodiment 1) of the present invention will be described with reference to FIGS.
  • FIG. 1 is a block diagram showing a main configuration of a document analysis system 100 according to the first embodiment of the present invention.
  • the document analysis system 100 is a system for analyzing a document (document analysis system).
  • the document analysis system 100 only needs to be a device that can execute the processing described below, and can be realized using an arbitrary computer.
  • the document analysis system 100 includes a reception unit 21, a control unit 10 (acquisition unit 11, generation unit 12, multiplication unit 13, calculation unit 14, extraction unit 15, summarization unit 16, phase identification unit 17. , A change estimation unit 18) and a display unit 50.
  • the receiving unit 21 receives the document data 1 from an external computer by communicating with the outside through a communication network according to a predetermined communication method.
  • the receiving unit 21 only needs to have an essential function for realizing communication with the computer, and a communication line, a communication method, a communication medium, and the like are not limited.
  • the receiving unit 21 can be configured by a device such as an Ethernet (registered trademark) adapter, for example.
  • the receiving unit 21 can use a communication method or a communication medium such as IEEE802.11 wireless communication or Bluetooth (registered trademark).
  • the control unit 10 comprehensively controls various functions of the document analysis system 100.
  • the control unit 10 includes an acquisition unit 11, a generation unit 12, a multiplication unit 13, a calculation unit 14, an extraction unit 15, a summarization unit 16, a phase identification unit 17, and a change estimation unit 18.
  • the acquisition unit 11 acquires the document data 1 received by the reception unit 21 and outputs the document data 1 to the generation unit 12.
  • the generation unit 12 generates, for each sentence, a keyword vector 2 indicating whether or not a predetermined keyword (morpheme) is included in the sentence included in the document data (document) 1.
  • a keyword vector 2 indicating whether or not a predetermined keyword (morpheme) is included in the sentence included in the document data (document) 1.
  • the keyword vector 2 whether or not a predetermined keyword associated with the element is included in the document data 1 when each element of the keyword vector 2 takes a value of “0” or “1”. Is a vector indicating
  • the generation unit 12 changes the element corresponding to the “price” of the keyword vector 2 from “0”. Change to “1”.
  • the generating unit 12 outputs the generated keyword vector 2 to the multiplying unit 13, the extracting unit 15, the summarizing unit 16, and the phase specifying unit 17, respectively.
  • the multiplication unit 13 multiplies the keyword vector 2 generated by the generation unit 12 by a correlation matrix that indicates the correlation between the predetermined keyword and another keyword different from the predetermined keyword, for each sentence.
  • Correlation vector 3 is obtained. For example, when the keyword “price” appears in a sentence, the correlation matrix indicates the likelihood (that is, the correlation) that another keyword (for example, “adjustment”) appears for the keyword in the sentence. It is a square matrix represented in each element.
  • the multiplication unit 13 outputs the correlation vector 3 to the calculation unit 14.
  • the correlation matrix is optimized in advance using a learning data set including a predetermined number of predetermined document data. For example, when a keyword “price” appears in a certain sentence, a value obtained by normalizing the number of occurrences of other keywords with respect to the keyword between 0 and 1 (that is, a maximum likelihood estimated value) (Therefore, the sum for each column of the correlation matrix is 1). Thereby, the document analysis system 100 can calculate the correlation vector 3 optimally.
  • the calculation unit 14 indicates the degree of association between the document data 1 and a predetermined case based on the sum of all the correlation vectors 3 obtained by the multiplication unit 13.
  • a score 4 indicating the strength with which the classification code is associated with the document data 1 is calculated for each document data 1. More specifically, as shown in the following [Equation 1], the calculation unit 14 calculates the sum (the vertical vector) and a weight vector W (horizontal) indicating the weight for the predetermined keyword. The score 4 is calculated for each document by calculating the inner product with the vector.
  • C represents a correlation matrix
  • s s represents the s-th keyword vector 2.
  • TFnorm (the above summed value) is calculated as shown in [Equation 2] below.
  • TF i represents the appearance frequency (Term Frequency) of the i-th keyword
  • s js represents the j-th element of the s-th keyword vector 2.
  • the calculating unit 14 calculates the above score 4 for each document by calculating the following [Equation 3].
  • w i is the i-th element of the weight vector W.
  • the calculating unit 14 outputs the calculated score 4 to the phase specifying unit 17, the change estimating unit 18, and the display unit 50.
  • the extraction unit 15 extracts a sentence (maximum sentence 5) corresponding to the keyword vector 2 indicating that the predetermined number of keywords is contained most in the document data 1. For example, in the sentence “The price of product a sold by company A is higher than the price of product b sold by company B, we adjusted the price of both products.” Appears 3 times.
  • the extraction unit 15 outputs the sentence as the most sentence 5 to the display unit 50.
  • the predetermined keyword in the above example, the keyword “price” may be given to the document analysis system 100 via a predetermined input device.
  • the summary unit 16 generates a summary of the document data 1 by enumerating sentences corresponding to the keyword vector 2 indicating that the predetermined keyword is included in the document data 1. For example, the summary unit 16 generates the summary by listing the sentences included in the document data 1 and including the keyword “price”, and displays the summary information 6 including information on the summary. To the unit 50. As described above, the predetermined keyword may be given to the document analysis system 100 via a predetermined input device.
  • the phase identification unit 17 performs a predetermined action (an action performed by an organization or an individual composed of a plurality of persons) that causes a predetermined case (for example, lawsuit, fraud investigation, collusion, information leakage, fictitious request, etc.)
  • a predetermined case for example, lawsuit, fraud investigation, collusion, information leakage, fictitious request, etc.
  • the phase to be classified according to the progress of the predetermined action is specified based on the score 4 calculated by the calculation unit 14.
  • the predetermined event may be given to the document analysis system 100 via a predetermined input device.
  • the phase is an index indicating each stage in which the predetermined action progresses (classified according to the progress of the predetermined action). For example, if “rigidation” is specified as the predetermined case, “Relationship Building” (phase for building relationships with customers / competitions), “Preparation” (phase for exchanging information about competition with third parties), “ It can be assumed that there is a phase such as “competition” (a phase in which a price is presented to a customer, feedback is obtained, and communication is made with the competition regarding the feedback).
  • the phase specifying unit 17 specifies a phase associated with the predetermined value range, and outputs phase information 7 including information on the phase to the change estimating unit 18. It's okay.
  • the phase specifying unit 17 is configured to calculate a likelihood (each phase of a model (observation process, likelihood function)) representing a process in which a predetermined action subject (an organization or an individual composed of a plurality of persons) reaches the predetermined action.
  • the phase (maximum likelihood phase) that maximizes the value calculated as the score according to the above may be specified, and the phase information 7 including information on the phase may be output to the change estimation unit 18.
  • the keyword vector 2 is input from the generation unit 12 and that the keyword vector 2 includes a predetermined keyword (for example, “price”, “adjustment”, etc.).
  • the phase specifying unit 17 specifies the phase corresponding to the predetermined keyword (if the keywords “price” and “adjustment” are included, the phase is specified as “Competition”), and the phase The phase information 7 including the information regarding may be output to the change estimation unit 18.
  • the change estimating unit 18 estimates the change of the phase specified by the phase specifying unit 17 based on the temporal transition of the phase. For example, a series of transitions in which the phase of “Relationship Building” develops through the phase of “Preparation” to the phase of “Competition” (competition) In the case where it is clear (by using time-series information indicating the order), when the phase identification unit 17 identifies that the current phase is in the “Preparation” phase, the change estimation unit 18 Presumed to develop into a phase called “Competition”.
  • the change estimation unit 18 outputs change information 8 including information related to the change of the phase to the display unit 50.
  • the change estimation unit 18 may estimate the phase change by calculating the correlation between the moving average of the score 4 calculated by the calculation unit 14 and a predetermined pattern.
  • the predetermined pattern is a pattern in which a score calculated in another case different from the predetermined case (for example, lawsuit, fraud investigation, collusion, information leakage, fictitious request, etc.) changes with the passage of time. It may be.
  • the change estimation unit 18 sets the moving average to the predetermined value.
  • the correlation between the moving average of the score 4 for the document data 1 analyzed this time and the predetermined pattern is calculated.
  • the change estimation unit 18 calculates the degree of coincidence (correlation) between the two while shifting the elapsed time and / or score. If the correlation between the two becomes high, the change estimation unit 18 estimates that the current score will take the same value in the future in conjunction with the predetermined pattern.
  • the display unit 50 displays the score 4 input from the calculation unit 14, the most frequent sentence 5 input from the extraction unit 15, summary information 6 input from the summary unit 16, and change information 8 input from the change estimation unit 18.
  • a display device capable of displaying for example, a liquid crystal display. 1 shows a configuration example in which the document analysis system 100 includes the display unit 50, the display unit 50 only needs to be able to present each of the above information to the user.
  • the document analysis system 100 It may be an external display device connected to be communicable.
  • FIG. 2 is a flowchart illustrating an example of processing executed by the document analysis system 100.
  • parenthesized “ ⁇ steps” represent steps included in the control method of the document analysis system 100 (control method of the document analysis system).
  • the acquisition unit 11 acquires the document data 1 (Step 1, hereinafter “Step” is abbreviated as “S”).
  • the generation unit 12 generates, for each sentence, a keyword vector 2 indicating whether or not a predetermined keyword is included in the sentence included in the document data 1 (S2, generation step).
  • the multiplying unit 13 multiplies the keyword vector 2 generated in S2 by a correlation matrix indicating the correlation between the predetermined keyword and another keyword different from the predetermined keyword, for each sentence. Correlation vector 3 is obtained (S3, multiplication step).
  • the calculation unit 14 determines the strength with which the classification code indicating the degree of association between the document data 1 and the predetermined event is associated with the document based on the sum of all the correlation vectors 3 obtained in S3.
  • the score 4 shown is calculated (S4, calculation step).
  • the above control method is executed not only by the above-described processing with reference to FIG. 2 but also by the acquisition unit 11, the extraction unit 15, the summarization unit 16, the phase identification unit 17, and / or the change estimation unit 18. Processing may optionally be included.
  • Embodiment 2 A second embodiment (Embodiment 2) of the present invention will be described with reference to FIGS. In the present embodiment, only the configuration added to the first embodiment and the configuration different from the configuration of the first embodiment will be described. That is, all the configurations described in the first embodiment can be included in the second embodiment. Moreover, the definition of the term described in Embodiment 1 is the same also in Embodiment 2.
  • FIG. 3 is a block diagram showing a main configuration of the document analysis system 101 according to Embodiment 2 of the present invention.
  • the document analysis system 101 is a system that acquires information recorded in a predetermined computer or server and analyzes document information including a plurality of documents included in the acquired information.
  • the document analysis system 101 includes the control unit 10 (acquisition unit 11, generation unit 12, multiplication unit 13, calculation unit 14, extraction unit 15, summarization unit 16, phase described in the first embodiment.
  • the data storage unit 108 digital information storage area 102, survey basic database 103, keyword database 104, related term database 105, score calculation database 106, report creation database 107
  • database Management unit 109 information extraction unit 24, search unit 30, document analysis unit 118, survey category input reception unit 20, survey type determination unit 22, presentation unit 130, category selection unit 26, first automatic sorting unit 201, second automatic A sorting unit 301, a sorting code reception and grant unit 131, and a third automatic sorting unit 401 are further provided.
  • the document analysis system 101 may further include a trend information generation unit 124, a quality inspection unit 501, a learning unit 601, a report creation unit 701, a lawyer review reception unit 133, a language determination unit 120, and a translation unit 122.
  • the survey category input receiving unit 20 receives a category input by the user. When a category is input, the survey category input reception unit 20 outputs the category to the survey type determination unit 22 and the category selection unit 26.
  • the category is an index that can classify each document included in a plurality of documents.
  • the above categories represent the type of litigation or fraud investigation (representing the nature of the case relating to the litigation or fraud investigation, such as antitrust, patents, foreign bribery prohibition (FCPA), product liability (PL), Information leakage, fictitious billing, etc.).
  • the category may be an attribute of document information (representing the nature of information included in the document information, such as competing opponent information, price, estimate sheet, price list, product, etc.).
  • the category may be a phase classified according to the progress of a predetermined action that causes a lawsuit or fraud investigation.
  • the survey type determination unit 22 determines a category to be surveyed based on the category received by the survey category input reception unit 20 and extracts a necessary information type from the survey basic database 103. For example, when the document information is an e-mail, a presentation material, a spreadsheet, a meeting material, a contract, an organization chart, or a business plan, the investigation type determination unit 22 sets each of the types of necessary information as above. The information is output to the information extraction unit 24. Therefore, the document analysis system 101 can extract the necessary information types.
  • the information extraction unit 24 extracts a plurality of documents from the document information. Specifically, the information extraction unit 24 uses information input from the survey type determination unit 22 (for example, e-mail, presentation materials, spreadsheet materials, meeting materials, contracts, organization charts, business plans, etc.) The keywords and / or sentences included in the information are extracted as information related to lawsuits or fraud investigations, and the extracted results are stored in the investigation basic database 103. In addition, the information extraction unit 24 outputs the extracted result as document data 1 to the control unit 10. Therefore, the document analysis system 101 can specify information related to the lawsuit or fraud investigation and hold it in the database.
  • the survey type determination unit 22 for example, e-mail, presentation materials, spreadsheet materials, meeting materials, contracts, organization charts, business plans, etc.
  • the keywords and / or sentences included in the information are extracted as information related to lawsuits or fraud investigations, and the extracted results are stored in the investigation basic database 103.
  • the information extraction unit 24 outputs the extracted result as document data 1 to the control unit 10. Therefore, the document analysis system 101 can specify information related
  • the category selection unit 26 selects the category and outputs the selected category to the control unit 10. When a plurality of categories are assumed, the category selection unit 26 can sequentially select one category from the plurality of categories.
  • the category selection unit 26 can select the input category. Thereby, the document analysis system 101 can reliably select the category input by the user.
  • the presenting unit 130 presents the score 4 calculated by the control unit 10 (calculating unit 14) to the user so as to be grasped.
  • the presentation unit 130 can present the score 4 to the user by displaying the score 4 on the display unit 50 (not shown in FIG. 3).
  • the document analysis system 101 can make a user grasp
  • the search unit 30 searches a plurality of documents for keywords and / or sentences included in the document information (document data 1). Thereby, the document analysis system 101 can extract keywords and / or sentences included in the document information.
  • the first automatic sorting unit 201 When the keyword stored in the keyword database 104 is searched by the search unit 30 and a document including the keyword is extracted from the document information by the information extraction unit 24, the first automatic sorting unit 201 performs processing on the extracted document. Thus, a specific classification code is automatically assigned based on the keyword correspondence information.
  • the second automatic classification unit 301 extracts a document including related terms stored in the related term database from the document information, and based on the evaluation value of the related terms included in the extracted document and the number of the related terms.
  • a predetermined classification code is automatically assigned based on the score and related term correspondence information to a document that includes the related term and whose score exceeds a certain value. To do.
  • the classification code receiving / giving unit 131 accepts a classification code given by the user based on the relevance to the lawsuit for a plurality of documents that are extracted from the document information and to which the classification code is not given, and outputs the classification code. Give.
  • the document analysis unit 118 analyzes the document given the classification code by the classification code reception / giving unit 131. Further, the document analysis unit 118, based on the relevance to the lawsuit, in addition to the document that has been given and received the classification code from the user, in the first automatic classification unit 201 and the second automatic classification unit 301, keywords, related terms, Based on the score, the document automatically assigned with the classification code is analyzed, and the above-mentioned document automatically received with the classification code is integrated with the above-mentioned document automatically received with the classification code. You may obtain a simple analysis result. In this case, the third automatic classification unit 401 can automatically assign a classification code based on the comprehensive analysis result.
  • the classification and investigation work can be carried out through automatic classification by word search, acceptance of classification and investigation by users, automatic classification and investigation using scores, automatic classification and investigation through the learning process, and automatic classification through quality assurance. There are various ways to proceed, such as surveys.
  • the document analysis unit 118 analyzes a plurality of documents assigned classification codes together with a progress history that indicates in what order and how the various classification and investigation operations have progressed in combination, and will be described later.
  • the report creation unit 701 may report the analysis result.
  • the third automatic classification unit 401 assigns a classification code to a plurality of documents extracted from the document information based on a result obtained by analyzing the document to which the classification code is given by the classification code receiving / giving unit 131 by the document analysis unit 118. Grant automatically.
  • the trend information generation unit 124 is similar to a document to which a classification code possessed by each document is assigned based on the type, number of occurrences, and evaluation value of the word included in each document for the document analysis unit 118 to analyze.
  • the trend information indicating the degree of the is generated.
  • the quality inspection unit 501 compares the classification code received by the classification code reception / giving unit 131 with the classification code given by the trend information by the document analysis unit 118, and the classification code received by the classification code reception / granting unit 131. Verify the validity of.
  • the learning unit 601 learns the weighting of each keyword or related term based on the result of sorting the document.
  • the learning unit 601 learns the weighting of each keyword or related term using Expression (3) based on the first to fourth processing results (described later).
  • the learning unit 601 may reflect the learning result on the keyword database 104, the related term database 105, or the score calculation database 106.
  • the report creation unit 701 outputs an optimal investigation report according to the type of litigation or the investigation type of the fraud investigation based on the result of separating the documents.
  • the lawsuit includes, for example, antitrust, patent, foreign bribery prohibition (FCPA), product liability (PL), and the like.
  • the fraud investigation includes, for example, information leakage and fictitious billing.
  • the lawyer review reception unit 133 receives reviews of the chief attorney or the lead patent attorney in order to improve the quality of the classification survey and the report and clarify the responsibility of the classification survey and the report.
  • the language determination unit 120 determines the language type of the extracted document.
  • the translation unit 122 receives the designation from the user or automatically translates the extracted document.
  • the language delimiter in the language determination unit be smaller than one sentence so that it can be used for a single-sentence multilingual compound language.
  • one or both of predictive coding and character coding may be used for language determination.
  • a process of excluding an HTML (Hyper Text Markup Language) header or the like from translation targets may be performed.
  • the data storage unit 108 stores digital information acquired from a plurality of computers or servers in the digital information storage area 102 for use in analysis of lawsuits or fraud investigations.
  • the data storage unit 108 includes a survey basic database 103, a keyword database 104, a related term database 105, a score calculation database 106, and a report creation database 107.
  • the data storage unit 108 may be a recording medium included in the document analysis system 101 or an external recording medium connected to the document analysis system 101 so as to be communicable. It may be.
  • the basic research database 103 includes, for example, litigation matters including antitrust, patents, foreign bribery prohibition (Foreign Corrupt Practices Act) (FCPA), product liability (Products Liability, PL), and / or information leakage, fictitious claims, etc. It holds the case attribute, company name, person in charge, custodian, and the structure of the investigation or classification input screen indicating which of the fraud investigations includes
  • the keyword database 104 includes a specific classification code of a document, a keyword having a close relationship with the specific classification code, and a correspondence relationship between the specific classification code and the keyword included in the acquired digital information. Holds keyword correspondence information.
  • the related term database 105 includes a predetermined classification code, a related term composed of words having a high appearance frequency in a document to which the predetermined classification code is assigned, and a relationship indicating a correspondence relationship between the predetermined classification code and the related term. Holds term correspondence information.
  • the score calculation database 106 holds weights of words included in the document in order to calculate a score indicating the strength of connection between the document and the classification code.
  • the report creation database 107 holds a report format determined according to the category, custodian, and contents of the classification work.
  • the database management unit 109 manages the update of data contents of the survey basic database 103, the keyword database 104, the related term database 105, the score calculation database 106, and the report creation database 107.
  • the database management unit 109 may be connected to the information storage device 902 via a dedicated connection line or the Internet line 901. In this case, the database management unit 109 determines whether the survey basic database 103, the keyword database 104, the related term database 105, the score calculation database 106, and the report creation database 107 are based on the contents of data stored in the information storage device 902. Data content may be updated.
  • the “classification code” is an identifier used for classifying documents, and is an identifier indicating the degree of relevance with the lawsuit so that the document can be easily used in the lawsuit. For example, when document information is used as evidence in a lawsuit, it may be given according to the type of evidence.
  • Document is data including one or more words, and may be, for example, e-mail, presentation materials, spreadsheet materials, meeting materials, contracts, organization charts, business plans, and the like.
  • “Word” is a group of the smallest character strings having meaning. For example, a sentence “document means data including one or more words” includes “document”, “one”, “more”, “word”, “include”, “data”, “ The word "” is included.
  • Keyword is a group of character strings having a certain meaning in a certain language. For example, if a keyword is selected from a sentence “classify a document”, it can be set to “document” or “classify”. In the present embodiment, keywords such as “infringement”, “lawsuit”, or “patent publication XX” are selected with priority.
  • the “keyword” may include a morpheme.
  • Key correspondence information is information representing the correspondence between a keyword and a specific classification code. For example, when the classification code “important” representing an important document in a lawsuit has a close relationship with the keyword “infringer”, the above “keyword correspondence information” uses the classification code “important” and the keyword “infringer”. It may be information managed in association with each other.
  • the “related term” is a term having an evaluation value of a certain value or more among words having a high appearance frequency in common with a document to which a predetermined classification code is assigned.
  • the appearance frequency may be, for example, a ratio of related terms appearing in the total number of words appearing in one document.
  • “Evaluation value” is a value indicating the amount of information that is exhibited in a document with each word.
  • the “evaluation value” may be calculated based on the amount of transmitted information.
  • the “related term” may refer to the name of the technical field to which the product belongs, the country where the product is sold, the name of a similar product of the product, and the like.
  • “related terms” in the case of assigning the product name of the apparatus that performs the image encoding process as a classification code includes “encoding process”, “Japan”, “encoder”, and the like.
  • “Related term correspondence information” refers to information indicating the correspondence between related terms and classification codes. For example, when the classification code “product A”, which is the product name related to the lawsuit, has a related term “image encoding”, which is a function of the product A, the “related term correspondence information” is the classification code “product A”. And the related term “image coding” may be managed in association with each other.
  • Score refers to a value obtained by quantitatively evaluating the strength of association with a specific classification code in a document as described above. In each embodiment of the present invention, for example, the score is calculated according to the above-described [Equation 1] to [Equation 3].
  • the document analysis system 101 may extract words that frequently appear in documents having a common classification code assigned by the user. Then, for each document, the extracted word type, the evaluation value of each word, and the trend information of the number of appearances included in each document are analyzed for each document, and the classification code is not accepted by the classification code acceptance and grant unit 131. Among them, a common classification code may be assigned to documents having the same tendency as the analyzed trend information.
  • the “trend information” is information representing the degree of similarity of each document with a classification code, and is based on the type of word, the number of occurrences, and the word evaluation value included in each document.
  • Information represented by the degree of association with a predetermined classification code For example, when each document is similar in degree of relevance between a document given a predetermined classification code and the predetermined classification code, the two documents are said to have the same trend information.
  • documents having the same evaluation value and the same number of occurrences may be documents having the same tendency.
  • FIG. 4 is a flowchart illustrating an example of processing executed by the document analysis system 101.
  • the flow shown in FIG. 2 may be executed as a process independent of the flow shown in FIG. 4 or may be executed as a process included in any part of the flow shown in FIG. .
  • a category corresponding to a lawsuit including antitrust, patent, FCPA, PL, or fraud investigation including information leakage and fictitious claims is displayed. It can be specified (S11).
  • a use database such as a survey basic database and a document analysis database can be identified (S12).
  • the information storage device 902 that stores the latest database can be accessed.
  • the information storage device 902 may be installed inside an organization that performs sorting or may be installed outside the organization. As a case where the information storage device 902 is installed outside the organization, for example, it may be installed in a partner law firm or patent office.
  • authentication by ID and password can be performed to maintain security (S13).
  • the usage database such as the survey basic database and the document analysis database can be updated to the guideline database (S14).
  • the updated survey basic database is searched (S15), and the name of the company, the person in charge, and the custodian can be presented on the screen of the display device (S16). If the names of the person in charge and the custodian displayed on the screen of the display device are different from the names of the persons in charge and the custodian actually, the user corrects the names of the person in charge and the custodian on the screen of the display device.
  • the document analysis system can accept the user's correction input and specify the names of the actual person in charge and the custodian (S17).
  • digital document information can be extracted in order to perform document analysis work (S18).
  • the updated document analysis database the updated keyword database, related term database, and score calculation database are searched (S19), and a classification code can be assigned to the extracted document information (S20).
  • the classification code by the reviewer can be received and the classification code can be given to the extracted document information (S21).
  • the database is searched using the classification result as teacher data, and a classification code can be assigned to the extracted document information (S22).
  • a review by the chief attorney or patent attorney can be accepted (S23). This can improve the quality of the survey.
  • the category is specified by the user's argument designation (S24), and the report creation database can be specified according to the specified category (S25).
  • the format of the report can be determined by the identified report creation database, and the report can be automatically output (S26).
  • FIG. 5 is a flowchart showing an example of the investigation and classification process according to the investigation type in the example of the process shown in FIG.
  • the survey type can be input (S31).
  • the user will try to carry out from a fraud investigation including antitrust, patents, litigation cases including overseas bribery prohibition (FCPA), product liability (PL) or information leakage, fictitious claims, etc. Enter the category corresponding to the survey and sorting work.
  • the document analysis system can accept a user category input and specify a category to be investigated.
  • the type of survey and document analysis processing and the type of database to be used can be determined (S32).
  • a stock of information stored in a usage database such as a survey basic database or a document analysis database may be accessed (S33).
  • the survey basic database is accessed according to the specified category, and each keyword input screen corresponding to the specified category can be displayed (S34).
  • the survey basic database is accessed according to the identified category, and each text input screen can be displayed according to the identified category (S35).
  • the survey basic database is accessed according to the identified category and identified.
  • a keyword or document can be extracted according to the category (S36).
  • weighting can be added to the teacher data for automatic classification code assignment (predictive coding) (S37).
  • the extracted documents and information can be narrowed down by performing a keyword search in the document analysis database (S38).
  • FIG. 6 is a flowchart showing an example of predictive coding according to the investigation type in the example of the process shown in FIG.
  • the document analysis system can ask the user for input according to the type of survey, and can accept the user's input for that. For example, regarding cartels in relation to the antitrust law, user input is requested for target products, parties (name and email address), related organizations (name and department), and time, and user input is accepted. it can. In addition, regarding related organizations, it is possible to request user input regarding competitor companies and customer companies, and accept user input in response to the input (S51).
  • the keyword and related terms are updated and registered in advance using the result of the past classification process (S100).
  • the keyword and the related term are updated and registered together with the keyword correspondence information and the related term correspondence information which are correspondence information between the classification code and the keyword or the related term.
  • a document including the keyword updated and registered in the first stage is extracted from all document information.
  • the updated keyword correspondence information recorded in the first stage is referred to, and the classification corresponding to the keyword is performed.
  • a first separation process for assigning a code is performed (S200).
  • the document including the related term updated and registered in the first stage is extracted from the document information that has not been given the classification code in the second stage, and the score of the document including the related term is calculated.
  • a second classification process is performed in which a classification code is assigned (S300).
  • the classification code given by the user is accepted for the document information that has not been given the classification code by the third stage, and the classification code accepted from the user is given to the document information.
  • the document information provided with the classification code received from the user is analyzed, the document without the classification code is extracted based on the analysis result, and the third classification for adding the classification code to the extracted document Process. For example, words that frequently appear in documents with a common classification code assigned by the user are extracted, and the types of extracted words, evaluation values possessed by each word, and trend information on the number of appearances are included for each document.
  • the common classification code is assigned to the document having the same tendency as the trend information (S400).
  • the classification code to be given is determined based on the analyzed trend information for the document to which the user has given the classification code in the fourth stage, and the determined classification code and the classification code given by the user are determined.
  • the validity of the classification process is verified by comparison (S500). Moreover, you may perform a learning process based on the result of a document analysis process as needed.
  • the trend information used in the fourth and fifth stage processing refers to the degree of similarity between each document and the document to which the classification code is assigned.
  • the type of word included in each document the number of occurrences, This is based on the evaluation value of a word. For example, when each document is similar in degree of relevance between a document assigned a predetermined classification code and the predetermined classification code, the two documents have the same tendency information. In addition, even if the types of words included are different, documents having the same evaluation value and the same number of occurrences may be documents having the same tendency.
  • the keyword database 104 creates a management table for each classification code based on the result of classifying documents in past lawsuits, and specifies keywords corresponding to each classification code (S111).
  • the document to which each classification code is assigned is analyzed, and the number of occurrences of each keyword in the document and the evaluation value are used.
  • a method, a method of manual selection by the user, or the like may be used.
  • the keyword correspondence information indicating that the keyword has a special relationship is created (S112). Then, the identified keyword is registered in the keyword database 104. At this time, the identified keyword is associated with the keyword correspondence information and recorded in the management table of the classification code “important” in the keyword database 104 (S113).
  • the related term database 105 creates a management table for each classification code based on the result of classifying documents in past lawsuits, and registers the related terms corresponding to each classification code (S121).
  • S121 classification code
  • encoding process” and “product a” are registered as related terms of “product A”
  • decoding” and “product b” are registered as related terms of “product B”.
  • the related term correspondence information indicating which classification code each registered related term corresponds to is created (S122) and recorded in each management table (S123). At this time, the related term correspondence information also records a threshold value serving as a score necessary for determining an evaluation value and a classification code of each related term.
  • the keyword and the keyword correspondence information, and the related term and the related term correspondence information are updated and registered (S113, S123).
  • the first automatic sorting unit 201 extracts documents including the keywords “infringement” and “patent attorney” registered in the keyword database 104 in the first step (S100) from the document information (S211).
  • a management table in which the keyword is recorded is referred to from the keyword correspondence information to the extracted document (S212), and a classification code of “important” is given (S213).
  • the second automatic classification unit 301 assigns the classification codes “product A” and “product B” to the document information that has not been assigned the classification code in the second stage (S200). Process.
  • the second automatic classification unit 301 records a document including related terms “encoding process”, “product a”, “decoding”, and “product b” recorded in the related term database 105 in the first stage. Extract (S311). For the extracted document, a score is calculated by the score calculation unit 116 using Expression (1) based on the appearance frequency and evaluation value of the four related terms recorded (S312). The score represents the degree of association between each document and the classification codes “product A” and “product B”.
  • the appearance frequency of the related terms “encoding process” and “product a” and the evaluation value of the related term “encoding process” are high, and the score indicating the degree of association with the classification code “product A” is a threshold value. Is exceeded, the document is given a classification code “Product A”.
  • the second automatic sorting unit 301 recalculates the evaluation value of the related term by the following [Equation 4] using the score calculated in S432 in the fourth stage, and weights the evaluation value (S315). ).
  • w i, L represents the weight of the i-th selected keyword after the L-th learning
  • ⁇ L represents a learning parameter in the L-th learning
  • represents a learning effect threshold value. For example, if there are more than a certain number of documents where the appearance frequency of “decryption” is very high but the score is lower than a certain value, the evaluation value of the related term “decoding” is lowered and the related term correspondence information is again displayed. Record.
  • the classification code from the reviewer is given to a certain percentage of the document information extracted from the document information to which the classification code is not given. Acceptance and the accepted classification code are assigned to the document information.
  • the document information given the classification code received from the reviewer is analyzed, and based on the analysis result, the classification code is given to the document information to which the classification code is not given.
  • a process of assigning classification codes of “important”, “product A”, and “product B” is performed on the document information. The fourth stage is further described below.
  • the information extraction unit 24 samples a document at random from the document information to be processed in the fourth stage and displays it on the display unit 50.
  • 20% of the document information to be processed is extracted at random and set as a classification target by the reviewer.
  • Sampling may be an extraction method in which documents are arranged in order of document creation date and time or in order of name, and 30% of documents are selected from the top.
  • the user browses the document display screen shown in FIG. 18 displayed on the display unit 50, and selects a classification code to be assigned to each document.
  • the classification code reception / giving unit 131 receives the classification code selected by the user (S411) and classifies the classification code based on the given classification code (S412).
  • the document analysis unit 118 extracts words that frequently appear in the documents classified by classification code by the classification code reception and grant unit 131 (S421).
  • the evaluation value of the extracted common word is analyzed by equation (2) (S422), and the appearance frequency of the common word in the document is analyzed (S423).
  • FIG. 14 is a graph showing a result of analyzing words frequently appearing in the document to which the classification code “important” is assigned in S424.
  • the vertical axis R_hot includes a word selected as a word associated with the classification code “important” among all documents to which the classification code “important” is assigned by the user, and the classification code “important” is assigned. Shows the percentage of documents that were used.
  • the horizontal axis indicates the ratio of documents including the word extracted in S421 by the classification code receiving and assigning unit 131 among all documents subjected to the classification process by the user.
  • the processing from S421 to S424 is also executed for the documents to which the classification codes “product A” and “product B” are assigned, and the trend information of the documents is analyzed.
  • the third automatic classification unit 401 performs processing on the document that has not been given the classification code by the classification code reception / giving unit 131 in step S411 out of the document information to be processed in the fourth stage.
  • a document having the same trend information as the trend information of the document assigned with the classification codes “important”, “product A”, and “product B” analyzed in S 424 from such a document. Are extracted (S431), and a score is calculated for the extracted document using equation (1) based on the trend method (S432). Further, an appropriate classification code is assigned to the document extracted in S431 based on the trend information (S433).
  • the third automatic sorting unit 401 further reflects the sorting result in each database using the score calculated in S432 (S434). Specifically, a process of lowering the evaluation values of keywords and related terms included in a document having a low score and increasing the evaluation values of keywords and related terms included in a document having a high score may be performed.
  • the third automatic classification unit 401 may perform a classification process on the document information that has not been accepted by the classification code reception / giving unit 131 in step S411 out of the document information to be processed in the fourth stage. .
  • the same trend information as the trend information of the document to which the classification code “important” is assigned is analyzed from the document in S424. Is extracted (S442), and the score of the extracted document is calculated using equation (1) based on the trend information (S443). Further, an appropriate classification code is assigned to the document extracted in S442 based on the trend information (S444).
  • the third automatic sorting unit 401 further reflects the sorting result in each database using the score calculated in S443 (S445). Specifically, the evaluation value of the keyword and the related term included in the document with a low score is lowered, while the evaluation value of the keyword and the related term included in the document with a high score is increased.
  • the data for score calculation is collectively stored in the score calculation database 106. May be stored.
  • ⁇ Fifth stage (S500)> A detailed processing flow of the quality inspection unit 501 in the fifth stage will be described with reference to FIG.
  • the classification code reception / giving unit 131 determines the classification code to be given based on the trend information analyzed by the document analysis unit 118 in S424 for the document received in S411 (S511). .
  • the classification code received by the classification code reception / giving unit 131 is compared with the classification code determined in S511 (S512), and the validity of the classification code received in S411 is verified (S513).
  • the document analysis system 101 may include a learning unit 601.
  • the learning unit 601 learns the weighting of each keyword or related term based on the first to fourth processing results using Expression (2).
  • the learning result may be reflected in the keyword database 104, the related term database 105, or the score calculation database 106.
  • the document analysis system 101 is based on the result of the document analysis process. It is possible to provide a report creation unit 701 for outputting an optimum survey report according to the survey type (eg, fictitious billing).
  • the survey type eg, fictitious billing
  • the contents of the survey vary depending on the survey type. For example, 1. When and how did the competing personnel communicate with the cartel (price adjustment)? 2. Who is the organization involved? Is the point.
  • a document survey report system, a document survey report method, and a document survey report program according to another example of the embodiment of the present invention will be described below.
  • a document that has already been given a classification code is analyzed in correspondence with similar search information, and a range in which the classification code is assigned based on the analysis result is determined. adjust. Then, based on the range to which the adjusted classification code is assigned, the classification work and the survey work are performed, and a report is created based on the results of the classification work and the survey work.
  • the method of adjusting the range to which the classification code is assigned by clustering similar search information corresponding to the similar search information There is a method to perform prediction classification by learning.
  • a common classification code may be given to the reply document of the reply document of the original document.
  • the same or similar classification codes are given to similar search information by learning to integrate similar search information for the classification results.
  • the reliability of the analysis result varies depending on the number of documents to be analyzed.
  • a statistical method may be added to the total number of documents to be classified to determine at what time point the percentage of all documents to be adjusted for the range to which the classification code is assigned based on the analysis results. .
  • the classification is performed by clustering the search information corresponding to the similar search information.
  • the range of the document to which the classification code is assigned may be adjusted by executing both the method of adjusting the range to be performed and the method of performing the prediction classification by learning the classification result.
  • a report is created based on the results of these sorting operations and surveys.
  • a display screen control unit that controls a display screen that presents the type of information extracted by the survey type determination unit to the user may be provided.
  • an input receiving unit that receives a keyword and / or sentence input by a user corresponding to the type of information presented on the display screen control unit may be provided.
  • the embodiment of the present invention automatically updates the database according to a category by accepting a user input for a category of litigation case or fraud investigation case.
  • the burden of office work for inputting the names of persons in charge, custodians, etc. is reduced.
  • the search word is adjusted by the database automatically updated according to the category, and a classification code is automatically assigned to the document information using the adjusted search word. This reduces the burden of sorting the document information used for litigation or fraud investigation cases. That is, according to the present invention, it becomes easy to analyze document information used in a lawsuit.
  • the control blocks (particularly the control unit 10) of the document analysis system 100 and the document analysis system 101 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or a CPU (Central Processing). Unit) and may be realized by software.
  • the document analysis systems 100 and 101 are CPUs that execute instructions of a program (control program for the document analysis systems 100 and 101) that is software for realizing each function, and the programs and various data are computers (or CPUs).
  • ROM Read Only Memory
  • storage device referred to as “recording medium” recorded in such a manner as to be readable, and a RAM (Random Access Memory) for expanding the program.
  • a computer reads the said program from the said recording medium and runs it.
  • a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.
  • the program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program.
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
  • a control program for a document analysis system is a control program for a document analysis system that analyzes a document, and includes a computer, (document analysis system 100), a generation function, and multiplication. Functions and calculation functions are realized.
  • the generation function, multiplication function, and calculation function can be realized by the generation unit 12, the multiplication unit 13, and the calculation unit 14, respectively. Details are as described above.
  • the present invention can be widely applied to arbitrary computers such as personal computers, workstations, and mainframes.
  • 1 document data (document), 2: keyword vector, 3: correlation vector, 4: score, 5: most frequent sentence (sentence corresponding to a keyword vector indicating that a predetermined keyword is contained most), 6: summary information (Summary), 7: phase information (phase), 8: change information (change in phase), 12: generation unit, 13: multiplication unit, 14: calculation unit, 15: extraction unit, 16: summary unit, 17: phase Identification unit, 18: change estimation unit, 100: document analysis system, 101: document analysis system

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention calcule avec précision un score qui reflète correctement la signification d'une phrase. La présente invention concerne un système d'analyse de document qui comprend : une unité de génération qui génère pour chaque phrase faisant partie d'un document un vecteur de mots clés qui indique si des mots clés prescrits se trouvent dans la phrase; une unité de multiplication qui obtient un vecteur de corrélation pour chaque phrase en multipliant les vecteurs de mots clés générés par une matrice de corrélation qui indique la corrélation entre les mots clés prescrits et des mots clés autres que les mots clés prescrits; et une unité de calcul qui, sur la base d'une valeur qui regroupe la totalité des vecteurs de corrélation, calcule un score qui indique la puissance de l'association entre le document et un code de tri qui indique le degré de rapprochement entre un document et un événement prescrit.
PCT/JP2014/062743 2014-05-13 2014-05-13 Système d'analyse de document, procédé de commande destiné à un système d'analyse de document et programme de commande destiné à un système d'analyse de document WO2015173894A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2015510547A JP5815911B1 (ja) 2014-05-13 2014-05-13 文書分析システム、文書分析システムの制御方法、および、文書分析システムの制御プログラム
PCT/JP2014/062743 WO2015173894A1 (fr) 2014-05-13 2014-05-13 Système d'analyse de document, procédé de commande destiné à un système d'analyse de document et programme de commande destiné à un système d'analyse de document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/062743 WO2015173894A1 (fr) 2014-05-13 2014-05-13 Système d'analyse de document, procédé de commande destiné à un système d'analyse de document et programme de commande destiné à un système d'analyse de document

Publications (1)

Publication Number Publication Date
WO2015173894A1 true WO2015173894A1 (fr) 2015-11-19

Family

ID=54479466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/062743 WO2015173894A1 (fr) 2014-05-13 2014-05-13 Système d'analyse de document, procédé de commande destiné à un système d'analyse de document et programme de commande destiné à un système d'analyse de document

Country Status (2)

Country Link
JP (1) JP5815911B1 (fr)
WO (1) WO2015173894A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102615420B1 (ko) * 2022-11-16 2023-12-19 에이치엠컴퍼니 주식회사 인공지능 기반의 법률 문서에 대한 자동 분석 장치

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003016106A (ja) * 2001-06-29 2003-01-17 Fuji Xerox Co Ltd 関連度値算出装置
JP2009098811A (ja) * 2007-10-15 2009-05-07 Toshiba Corp 文書分類装置およびプログラム
WO2013129548A1 (fr) * 2012-02-29 2013-09-06 株式会社Ubic Système de classification de documents, procédé de classification de documents et programme de classification de documents
WO2014057965A1 (fr) * 2012-10-09 2014-04-17 株式会社Ubic Système médicolégal, procédé médicolégal et programme médicolégal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003016106A (ja) * 2001-06-29 2003-01-17 Fuji Xerox Co Ltd 関連度値算出装置
JP2009098811A (ja) * 2007-10-15 2009-05-07 Toshiba Corp 文書分類装置およびプログラム
WO2013129548A1 (fr) * 2012-02-29 2013-09-06 株式会社Ubic Système de classification de documents, procédé de classification de documents et programme de classification de documents
WO2014057965A1 (fr) * 2012-10-09 2014-04-17 株式会社Ubic Système médicolégal, procédé médicolégal et programme médicolégal

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102615420B1 (ko) * 2022-11-16 2023-12-19 에이치엠컴퍼니 주식회사 인공지능 기반의 법률 문서에 대한 자동 분석 장치

Also Published As

Publication number Publication date
JP5815911B1 (ja) 2015-11-17
JPWO2015173894A1 (ja) 2017-04-20

Similar Documents

Publication Publication Date Title
JP5627820B1 (ja) 文書分析システム及び文書分析方法並びに文書分析プログラム
JP5596213B1 (ja) 文書分析システム及び文書分析方法並びに文書分析プログラム
JP5627750B1 (ja) 文書分析システム及び文書分析方法並びに文書分析プログラム
JP5683749B1 (ja) 文書分析システム、文書分析方法、および、文書分析プログラム
JP5986687B2 (ja) データ分別システム、データ分別方法、データ分別のためのプログラム、及び、このプログラムの記録媒体
JP5622969B1 (ja) 文書分析システム、文書分析方法、および、文書分析プログラム
WO2015118619A1 (fr) Système, procédé et programme d'analyse de documents
JP5815911B1 (ja) 文書分析システム、文書分析システムの制御方法、および、文書分析システムの制御プログラム
JP5669904B1 (ja) 事前情報を提供する文書調査システム、文書調査方法、及び文書調査プログラム
KR101658890B1 (ko) 온라인 특허 평가 방법
WO2015145524A1 (fr) Système d'analyse de document, procédé d'analyse de document et programme d'analyse de document
JP5851007B2 (ja) 文書分析システム及び文書分析方法並びに文書分析プログラム
JP2015056185A (ja) 文書分析システム及び文書分析方法並びに文書分析プログラム
JP5829768B2 (ja) 電子メール分析システム、電子メール分析方法、および、電子メール分析プログラム
KR20150015424A (ko) 온라인 특허 평가 방법
JP5990562B2 (ja) 事前情報を提供する文書調査システム、文書調査方法、及び文書調査プログラム
JP5745676B1 (ja) 文書分析システム、文書分析方法、および、文書分析プログラム

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2015510547

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14891798

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14891798

Country of ref document: EP

Kind code of ref document: A1