USRE49576E1 - Standard exact clause detection - Google Patents
Standard exact clause detection Download PDFInfo
- Publication number
- USRE49576E1 USRE49576E1 US17/086,288 US202017086288A USRE49576E US RE49576 E1 USRE49576 E1 US RE49576E1 US 202017086288 A US202017086288 A US 202017086288A US RE49576 E USRE49576 E US RE49576E
- Authority
- US
- United States
- Prior art keywords
- clause
- clauses
- documents
- standard
- policy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title description 54
- 238000000034 method Methods 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 description 22
- 230000015654 memory Effects 0.000 description 16
- 230000008569 process Effects 0.000 description 16
- 238000012015 optical character recognition Methods 0.000 description 15
- 230000008676 import Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 7
- 238000012937 correction Methods 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000012805 post-processing Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Definitions
- the disclosure generally relates to the field of natural language processing, and in particular, to identifying and extracting information from documents.
- a contract is a document that defines legally enforceable agreements between two or more parties. During the negotiation process, parties to the contract often agree to make multiple amendments or addendums, and these amendments or addendums can be stored in random formats in different locations.
- a contract and amendments may include the clauses that contain wording such as “net 30 days,” “within 30 days,” “30 day's notice,” and “2% penalty.”
- one of the amendments may include the non-standard clauses such as “5 working days” with “60% penalty.”
- FIG. 1 illustrates one embodiment of a standard exact clause detection system for a contractual document.
- FIG. 2 illustrates an input processor of the standard exact clause detection system configured to process input data.
- FIG. 3 illustrates a discovery engine of the standard exact clause detection system to properly structure and to normalize the input data.
- FIG. 4 illustrates a representation of data stored as discreet database documents with different indexes.
- FIG. 5 illustrates an analysis engine of the standard exact clause detection system to define standard exact clauses in contractual documents.
- FIG. 6 illustrates a flow chart of a method of obtaining standard exact clauses and non-standard clauses.
- FIG. 7 illustrates a process for determining a policy to specify clauses for extraction.
- FIG. 8 illustrates components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller).
- a standard exact clause herein refers to a clause including words and an order of words matching those of a predefined clause example.
- a non-standard clause herein refers to a clause semantically related to a predefined clause example, but including words or an order of words different from those of the predefined clause example.
- the system includes an input processor to configure raw input data into a format that can be structurally analyzed by a discovery engine.
- the discovery engine generates a predefined policy to be applied in a search engine.
- the discovery engine prepares initial search results to allow an administrator to select items to build and test a new custom policy along with all the predefined polices in a format that can be viewed by an end user.
- the end user can view the initial search results, and also customize the predefined policy to define a primary policy.
- the analysis engine and the semantic language evaluator perform semantic language analysis, and first determine the standard clauses. Among the standard clauses, standard exact clauses with words and an order of the words exactly matching clause examples are identified. Furthermore, the analysis engine and the semantic language evaluator perform another semantic language analysis with a less restrictive secondary policy to extract the non-standard clauses.
- FIG. 1 illustrates one exemplary embodiment of a standard exact clause detection system 10 including one or more input processors (generally an input processor 110 ), a discovery engine 120 , an analysis engine 130 , a semantic language evaluator 140 , and an analysis database 150 .
- Each of these components may be embodied as hardware, software, firmware or a combination thereof.
- engines or modules include software (or firmware) structured to operate with processing components of a computing system to form a machine that operates as described herein for the corresponding engines or modules. Further, two or more engines may interoperate to form a machine that operates as described herein. Examples of the processing components of the computing system are described with respect to FIG. 8 .
- the system 10 also comprises a discovery database 160 to store data for identifying standard exact clauses and non-standard clauses.
- the input processor 110 aggregates one or more raw data 100 ( 0 ), 100 ( 1 ) . . . 100 (N) (generally 100 ) and processes them in an appropriate format.
- the discovery engine 120 is communicatively coupled to the input processor 110 .
- the analysis engine 130 is coupled to the discovery engine 120 .
- the discovery engine 120 develops a predefined policy and initial search results.
- the analysis engine 130 performs a semantic language analysis by applying policies to the semantic language evaluator 140 , and determines the non-standard clauses, standard clauses, and standard exact clauses used in the raw data 100 .
- the discovery database 160 stores the initial search results, metadata, and the predefined policy.
- the discovery database 160 is communicatively coupled to the input processor 110 , the discovery engine 120 , and the analysis engine 130 .
- the analysis engine 130 is coupled to the analysis database 150 and stores information for performing semantic language evaluation.
- the discovery database 160 and the analysis database 150 can be combined into one database.
- FIG. 2 it illustrates an exemplary embodiment of an input processor 110 that may aggregate the raw data 100 , and refine them into acceptable formats in the following stages.
- the input processor 110 includes a file import system module 212 , a correction module 213 , and a format standardization module 214 .
- the file import system module 212 receives the raw data 100 from any one of file systems, emails, Content Management Systems and physical document scanning devices. The file import system module 212 also detects potential contracts and checks if any duplicates of documents exist in the discovery database 160 already. In addition, the file import system module 212 can convert a physical document into another electronic format, for example Portable Document Format (PDF), Microsoft Office format, Tagged Image File Format (TIFF), Graphics Interchange Format (GIF), Join Photographic Experts Group (JPEG) and etc. Moreover, the file import system module 212 may include an image file processor module with an optical character recognition (OCR) engine 218 . The OCR engine 218 may be an ABBYY fine reader engine or a standard iFilter OCR engine.
- OCR optical character recognition
- the file import system module 212 detects the language of the contractual document and how many words exist within.
- the OCR engine 218 of the file import system module 212 determines a quality of the OCR performed for each character or each word, and generates a quality score indicating a quality of the OCR performed for each character or each word.
- the correction module 213 in the input processor 110 receives the data imported from the file import system module 212 .
- the correction module 213 also is configured to apply typographical corrections or OCR corrections.
- the format standardization module 214 tailors the format of the data imported from the file import system module 212 for further processing.
- the format standardization module 214 applies filters to extract textual information.
- the input processor 110 may remove passwords to access a protected contractual document only when the owners of the documents agree to remove such passwords.
- the format standardization module 214 includes a file protection function that creates copies of potential contractual documents identified. These identified contractual documents are stored in the discovery database 160 with security access attributes.
- FIG. 3 illustrates an embodiment of the discovery engine 120 that structurally analyzes an input data from the input processor 110 and generates the predefined policy.
- the predefined policy includes, but not limited to, predefined rules, predefined features, and predefined clause examples.
- the discovery engine 120 also applies the predefined policy into the search engine (not shown) and prepares initial search results along with the predefined policy and metadata in a format that allows the end user to view.
- the discovery engine 120 includes a pre-normalization module 321 , a language detection module 322 , a processing queue module 323 , a structuration function module 324 , a rules processing module 325 , a post processing and reduction module 326 , and a high level processing module 327 .
- the pre-normalization module 321 receives the imported data in the standardized format obtained from the input processor 110 , and converts the imported data into the standard XML or HyperText Markup Language (HTML) document. Also, the language detection module 322 can identify the language used in the XML or HTML converted document (e.g., English, German, and etc.), and place the document in the processing queue module 323 .
- HTML HyperText Markup Language
- the structuration function module 324 structurally analyzes the XML or HTML converted document into a plurality of hierarchical levels.
- FIG. 4 illustrated is a representation of data stored as discreet database documents: a sentence level 401 , a paragraph level 402 , a section level 403 , and a document level 404 . Analyzing the documents or data in the structure mentioned above allows locating of terminologies and clauses used in the contractual documents.
- the rules processing module 325 applies the predefined rules to generate the predefined features.
- the predefined rules determine the logic or sequence of words, sentences, phrases, NLP (natural language processing) features, or terminologies.
- the rules processing module 325 generates the predefined features from the predefined rules for the end user to customize in the analysis engine 130 .
- the predefined features can be a key reference or a descriptive verb that can describe the document and the information held within. For instance, the predefined features can be a start date, a termination date, a contract type, and etc.
- the post processing and reduction module 326 reduces and normalizes the predefined features from the rules processing module 325 . It is to be noted that in addition to sentence and paragraph boundaries, the discovery engine 120 can identify contractual section boundaries such as termination, limitation of liability, indemnity sections of a contract, and etc. Moreover, the post processing and reduction module 326 prepares the predefined features for the end user to customize in the analysis engine 130 .
- Normalization in the post processing and reduction module 326 reduces the common notations into a standard format.
- the same date can be expressed in multiple ways (e.g. Oct. 23, 1992, Oct. 23, 1992, 10/23/1992, 23/10/1992, 1992/10/23 and etc.), and the normalization can convert various formats into standard ISO format. Normalizing to the standard format can eliminate confusions and improve processing speed.
- the discovery engine 120 can reduce any duplicate terms in different formats.
- the high level processing module 327 creates metadata and stores them in the discovery database 160 . Additionally, the search engine communicatively coupled to the discovery database 160 obtains initial search results to determine the eligibility for analytics processing. Moreover, the high level processing module 327 prepares the predefined policy as well as the initial search results in a format that the end user can view. Furthermore, either one or both of an internal search engine and an external search engine may perform a search function.
- the analysis engine 130 identifies standard exact clauses, standard clauses, and non-standard clauses.
- the analysis engine 130 includes an analysis engine queue module 531 , a variable detection module 570 , a custom feature generation module 532 , a document parsing module 533 , a policy definition module 534 , a standard clause detection module 535 , a standard exact clause detection module 536 , a non-standard clause detection module 537 , and an update discovery database module 538 .
- the discovery engine 120 transfers a data set including the predefined policy, search indexes, and the initial search results to the analysis engine queue module 531 .
- the custom feature generation module 532 allows the end user to customize the predefined features obtained from the discovery engine 120 and to define primary features.
- the variable detection module 570 receives search indexes or the initial search results and provides available variations of clauses to the custom feature generation module 532 .
- the variable detection module 570 may receive the search indexes or the initial search results from the discovery engine 120 directly or from the analysis engine queue module 531 .
- the variable detection module 570 may detect allowable variations of clauses according to examples stored in the discovery engine 120 and provide the detected allowable variations of clauses with associated variables to the custom feature generation module 532 .
- the custom feature generation module 532 receives the predefined features from the analysis engine queue module 531 to define primary features to be used in semantic language evaluation.
- the custom feature generation module 532 may also receive detected allowable variations from the variable detection module 570 to define the primary features.
- the custom feature generation module 532 presents to a user a list of clauses or features within a template. The user may select which clauses are to be considered as standard clauses. In addition, the user may select which clauses or words in the standard clauses can be varied. In one approach, the user may assign a variable to each set of selected clauses or words allowed to be varied. Alternatively, the custom feature generation module 532 may assign a variable to a set of clauses or words allowed to be changed.
- the custom feature generation module 532 provides the primary features comprising selected clauses examples and variables associated with allowable variations to a document parsing module 533 .
- the document parsing module 533 replaces the actual text, phrases or clauses with the primary features.
- the document parsing module 533 replaces words or clauses with allowed variations with corresponding variables.
- the semantic language evaluator 140 formed with the primary features replaced data set ensures the accuracy and quality of the data. That is, the semantic language evaluator 140 accounts for minor anomalies within the clauses, allowing the analysis engine 130 to locate and group clauses based on the core semantics.
- the document parsing module 533 transfers clause examples to the semantic language evaluator 140 , and the semantic language evaluator assesses the similarity to each of the examples.
- the semantic language evaluator 140 may be a Latent Symantec Index (LSI) module, which may provide a cosine vector score based on the similarity and classify clauses accordingly. For instance, a cosine vector score of 1 indicates a high degree of similarity, when 0 indicates a low degree of similarity.
- LSI Latent Symantec Index
- the policy definition module 534 allows the end user to define the primary policy that includes primary rules, primary features or clause examples (herein also referred to as “primary clause examples”) and a first threshold.
- a recommended value for the first threshold is ‘95’ or between ‘90’ and ‘99,’ when the semantic language evaluator is the LSI module.
- the standard clause detection module 535 obtains standard clauses based on the primary policy.
- the standard clause detection module 535 applies the primary policy with the first threshold to the semantic language evaluator 140 to obtain the standard clauses.
- the primary policy with the first threshold allows the analysis engine 130 to locate clauses that are almost identical to the primary clause examples.
- the standard clause detection module 535 may provide a standard feature data set comprising the standard clauses to the custom feature generation module 532 .
- the custom feature generation module 532 may modify clause examples based on the standard clauses or present the standard clauses detected to a user to allow a list of clause examples or allowable variations of clauses to be re-selected.
- the standard clause detection module 535 may also store the standard feature data set in the discovery database 160 .
- the standard exact clause detection module 536 obtains the standard exact feature data set comprising standard exact clauses based on the clause examples. In one embodiment, the standard exact clause detection module 536 replaces words or clauses allowed to be changed with corresponding variables instead of the document parsing module 533 .
- the standard exact clause detection module 536 compares each word and an order of words from a document with each word and an order of words from clause examples to obtain standard exact clauses exactly matching the clause examples.
- the textual matching is performed word by word, and in this example a word can be seen as a token.
- a token can be made from any contiguous textual items, numbers, text, symbols.
- Each token is compared against the clause examples provided within the primary policy, in the exact word order it is within the clause, with the system rejecting an item as soon as the first Token is found to not match.
- the LSI module may not consider an order of the words, thus the standard exact clause detection module 536 obtains N-Gram of different words or tokens to compare an order of words. By replacing words or clauses allowed to be changed with their corresponding variables, the standard exact clause detection module 536 can reduce a number of comparisons performed to identify standard exact clauses while taking into account for each variation of clause examples.
- the standard exact clause detection module 536 also identifies a candidate standard exact clause including an obscure word (or a character of the word) with poor optical character recognition based on the quality score provided from the OCR engine 218 . Responsive to determining the quality of the OCR performed on the obscure word is poor (e.g., the quality score of the obscure word is below a quality threshold value), the standard exact clause detection module 536 determines whether qualities of the OCR performed on a preceding word and a succeeding word of the obscure word are acceptable.
- the standard exact clause detection module 536 determines whether any of the clause examples and the variables include the preceding word, a candidate word, and the succeeding word in that sequence. If a clause example including the preceding word, the candidate word, and the succeeding word in that sequence is found, a clause including the preceding word, the obscure word, and the succeeding word is determined to be a candidate standard exact clause.
- the standard exact clause detection module 536 may add the candidate standard exact clause to the standard exact feature data set.
- the standard exact clause detection module 536 may provide the standard exact feature data set to the custom feature generation module 532 .
- the custom feature generation module 532 may modify clause examples based on the standard exact clauses or candidate standard exact clauses.
- the custom feature generation module 532 may also present the standard exact clauses (or candidate standard exact clauses) detected to a user to allow a list of clause examples or allowable variations of clauses to be re-selected.
- the standard exact clause detection module 536 may also store the standard exact feature data set in the discovery database 160 .
- the non-standard clause detection module 537 may create a secondary policy, which is a copy of the primary policy that does not contain any rules, but includes a second threshold lower than the first threshold.
- a recommended value for the second threshold is ‘60’ or between ‘50’ and ‘70, when the semantic language evaluator 140 is the LSI module.
- the non-standard clause detection module 537 extracts a mirror feature data set with the secondary policy.
- the secondary policy allows the analysis engine 130 to locate all clauses that are semantically similar to the primary search examples. It is to be noted that, not only the mirror feature data set contains more data, but also contains exact match from the standard feature data set. That is, the mirror feature data set encompasses the standard feature data set, where the standard feature data set encompasses the standard exact feature data set.
- the non-standard clause detection module 537 subtracts the standard exact feature data set from the mirror feature data set to obtain the non-standard clauses. In this embodiment, standard clauses that are not standard exact clauses would be identified as non-standard clauses.
- the non-standard clause detection module 537 subtracts the standard feature data set from the mirror feature data set to obtain the non-standard clauses.
- the non-standard clause detection module 537 may obtain the non-standard clauses after the standard clauses are obtained in the standard clause detection module 535 but before the standard exact clauses are obtained in the standard exact clause detection module 536 .
- the non-standard clause detection module 537 can obtain the non-standard clauses after the standard exact clauses are obtained in the standard exact clause detection module 536 .
- the update discovery database module 538 may update the discovery database 160 with the standard clauses, standard exact clauses and the non-standard clauses obtained.
- FIG. 6 illustrates a process of obtaining standard exact clauses and non-standard clauses.
- the process may be performed by the variable detection module 570 , policy definition module 534 , standard clause detection module 535 , standard exact clause detection module, 536 , non-standard clause detection module 537 of the analysis engine 130 .
- the steps of FIG. 6 may be performed by different or additional components.
- Other embodiments can perform the steps of FIG. 6 in different orders.
- other embodiments can include different and/or additional steps than the ones described here.
- the variable detection module 570 receives an input document 610 .
- the variable detection module 570 obtains 620 allowable variations of standard clauses and a corresponding variable for the variations.
- the policy definition module 534 obtains 630 the primary policy including the primary rules, the primary features, the primary clause examples and the first threshold for determining similarities.
- the policy definition module 534 obtains 640 the secondary policy which is a copy of the primary policy that does not contain any rules but includes a second threshold lower than the first threshold.
- the primary policy, the secondary policy and the allowable variations of standard clauses may be obtained in different orders.
- the standard clause detection module 535 obtains 650 a standard feature data set comprising standard clauses based on primary policy from the input document.
- the standard exact clause detection module 536 generates 660 a mirror document by replacing allowable variations with corresponding variables, and obtains 670 a standard exact feature data set comprising standard exact clauses exactly matching the clause examples from the mirror document.
- the non-standard clause detection module obtains 680 mirror feature data set comprising related clauses based on secondary policy from the input document.
- the non-standard clause detection module 537 obtains 690 a difference between the mirror feature data set and the standard exact feature data set to obtain non-standard clauses.
- FIG. 7 illustrates an example process of determining the policy.
- the primary policy provides guidance on how and where to look for contract specific terminologies.
- the primary policy may include the primary rules, the primary features, the primary clause examples and the first threshold for determining similarities.
- Other embodiments can perform the steps of FIG. 7 in different orders.
- the steps of FIG. 7 may be performed by custom feature generation module 532 , document parsing module 533 , and policy definition module 534 of the analysis engine 130 .
- the steps of FIG. 7 may be performed by different or additional components (e.g., variable detection module 570 ).
- other embodiments can include different and/or additional steps than the ones described here.
- the discovery engine 120 provides a discovery search index to the analysis engine 130 to perform a clause example search 710 , and presents the predefined clause examples to the end user.
- the end user may search for the primary clause examples in the clause selection 720 , either under a section or a paragraph. If the end user decides to look for a clause under the section, the custom feature generation module 532 loads the feature replaced data in the section selection 721 . In a find similar section 723 , the document parsing module 533 requests the semantic language evaluator 140 to query if similar features exist already within the index. Likewise, if the end user decides to look for a clause under the paragraph, the custom feature generation module 532 loads the feature replaced data from the analysis database 150 in a paragraph selection 722 . In a find similar paragraph 724 , document parsing module 533 requests the semantic language evaluator 140 to query if similar features exist already within the index.
- the policy definition module 534 enables the end user to select the primary clause examples from the search results in a clause example selection 730 . Additionally, the end user may repeat the clause selection 720 , and select new clauses.
- the policy definition module 534 enables the end user to select the primary rules to determine the logic or sequence of words, sentences, phrases, or terminologies to be searched in a rule selection 740 , and to evaluate the selected rule in a sentence rule evaluated 750 .
- the end user may repeat the clause selection 720 to select new clauses to be applied or repeat the rule selection 740 to modify selected rules or add additional rules.
- the policy definition module 534 updates the primary policy as well as the analysis database 150 in the nested policy definition 760 .
- the secondary policy may be generated based on the primary policy, or through the similar steps described above.
- each data set associated with a clause that may contain a unique identification number or following indexes: a sentence level 401 , a paragraph level 402 , a section level 403 , and a document level 404 .
- each data set may include actual text or features replaced for the clause, and the position of the clause.
- the discovery engine 120 and the analysis engine 130 communicates frequently with the discovery database 160 and the analysis database 150 for core processing repository and metadata storage location.
- both databases contain information related to policies and the analysis database 150 may reside in the same hardware with the discovery database 160 .
- the data structures in the analysis database 150 provide for two differing data sets for each sentence, paragraph, and section: one for exact text and another for features. Therefore, the storage requirement of the analysis database 150 may be demanding, but the analysis engine 130 can achieve advanced functionality including feature replacements. To reduce the extra storage requirement, the analysis database 150 may use a pointer, instead of creating copies of the entire data set.
- FIG. 8 it is a block diagram illustrating components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller).
- FIG. 8 shows a diagrammatic representation of a machine in the example form of a computer system 800 within which instructions 824 (e.g., software or program code) for causing the machine to perform (execute) any one or more of the methodologies described with FIGS. 1 - 7 . That is, the methodologies illustrated and described through FIGS. 1 - 7 can be embodied as instructions 824 that are stored and executable by the computer system 800 .
- the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
- the machine may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions 824 (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA personal digital assistant
- STB set-top box
- a cellular telephone a smartphone
- smartphone a web appliance
- network router switch or bridge
- the example computer system 800 includes a processor 802 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 804 , and a static memory 806 , which are configured to communicate with each other via a bus 808 .
- the processing components are the processor 802 and memory 804 . These components can be configured to operate the engines or modules with the instructions that correspond with the functionality of the respective engines or modules.
- the computer system 800 may further include graphics display unit 810 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)).
- graphics display unit 810 e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)
- the computer system 800 may also include alphanumeric input device 812 (e.g., a keyboard), a cursor control device 814 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 816 , a signal generation device 818 (e.g., a speaker), and a network interface device 820 , which also are configured to communicate via the bus 808 .
- PDP plasma display panel
- LCD liquid crystal display
- CTR cathode ray tube
- the computer system 800 may also include al
- the storage unit 816 includes a machine-readable medium 822 on which is stored instructions 824 (e.g., software) embodying any one or more of the methodologies or functions described herein.
- the storage unit 816 may be implemented as volatile memory (static RAM (SRAM) or dynamic RAM (DRAM)) and/or non-volatile memory (read-only memory (ROM), flash memory, magnetic computer storage devices (e.g., hard disks, floppy discs and magnetic tape), optical discs and etc.).
- the instructions 824 may also reside, completely or at least partially, within the main memory 804 or within the processor 802 (e.g., within a processor's cache memory) during execution thereof by the computer system 800 , the main memory 804 and the processor 802 also constituting machine-readable media.
- the instructions 824 (e.g., software) may be transmitted or received over a network 826 via the network interface device 820 .
- machine-readable medium 822 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 824 ).
- the term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 824 ) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein.
- the term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.
- Modules may constitute either software modules (e.g., program code embodied as instructions 824 stored on a machine-readable medium e.g., memory 804 and/or storage unit 816 , and executable by one or more processors (e.g., processor 802 )) or hardware modules.
- a hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner.
- one or more computer systems may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
- software e.g., an application or application portion
- a hardware module may be implemented mechanically or electronically.
- a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations.
- a hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
- processors e.g., processor 802
- processors may be temporarily configured (e.g., by software) or permanently configured to perform the relevant operations.
- processors may constitute processor-implemented modules that operate to perform one or more operations or functions.
- the modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
- the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
- SaaS software as a service
- the performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines.
- the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
- any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment.
- the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- Coupled and “connected” along with their derivatives.
- some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact.
- the term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
- the embodiments are not limited in this context.
- the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion.
- a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
- “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Embodiments relate to a system and a method for identifying, from contractual documents, (i) standard exact clauses matching clause examples and (ii) non-standard clauses semantically related to but not matching the clause examples. A standard feature data set comprising standard exact clauses matching clause examples is obtained. In addition, a mirror feature data set comprising semantically related clauses of the clause examples is obtained using semantic language analysis, where the mirror feature data set encompasses the standard feature data set. Non-standard clauses are obtained by extracting a difference between the mirror feature data set and the standard exact feature data set.
Description
This application is a reissue of U.S. Pat. No. 10,185,712, which was filed as U.S. application Ser. No. 15/723,023 on Oct. 2, 2017, which is a continuation of U.S. application Ser. No. 14/797,959, filed Jul. 13, 2015, now U.S. Pat. No. 9,805,025, which are incorporated herein by reference in their entirety. More than one reissue application has been filed for the reissue of U.S. Pat. No. 10,185,712. The reissue applications are application Ser. Nos. 17/086,288 (the present application) filed Oct. 30, 2020 and 17/588,656 filed Jan. 31, 2022, which is both a continuation reissue of application Ser. No. 17/086,288 and a reissue of application Ser. No. 15/723,023.
1. Field of Art
The disclosure generally relates to the field of natural language processing, and in particular, to identifying and extracting information from documents.
2. Description of the Related Art
A contract is a document that defines legally enforceable agreements between two or more parties. During the negotiation process, parties to the contract often agree to make multiple amendments or addendums, and these amendments or addendums can be stored in random formats in different locations.
Frequent changes in contracts often present challenges to conventional approaches for finding contracts and amendments, as conventional approaches typically focus on the unstructured text only and are not able to extract relevant and important information correctly. For example, a contract and amendments may include the clauses that contain wording such as “net 30 days,” “within 30 days,” “30 day's notice,” and “2% penalty.” On the other hand, one of the amendments may include the non-standard clauses such as “5 working days” with “60% penalty.” Without the ability to discover the clauses and types of the clauses accounting for their semantic variations, any party not keeping track of the amendments or the addendums is vulnerable to a significant amount of risk of overlooking unusual contractual terminologies.
The disclosed embodiments have advantages and features which will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below.
The Figures (FIGS.) and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
Configuration Overview
One embodiment of a disclosed configuration is a system (or a method or a non-transitory computer readable medium) for identifying standard exact clauses and non-standard clauses used in contractual documents. A standard exact clause herein refers to a clause including words and an order of words matching those of a predefined clause example. A non-standard clause herein refers to a clause semantically related to a predefined clause example, but including words or an order of words different from those of the predefined clause example. By identifying standard exact clauses and non-standard clauses from a corpus amount of contractual documents, exact clauses and semantically related clauses can be identified promptly to improve contract review process. It is noted that although described in a context of contracts, the principles described herein can apply to other structured documents.
In one embodiment, the system includes an input processor to configure raw input data into a format that can be structurally analyzed by a discovery engine. The discovery engine generates a predefined policy to be applied in a search engine. With the predefined policy, the discovery engine prepares initial search results to allow an administrator to select items to build and test a new custom policy along with all the predefined polices in a format that can be viewed by an end user. In the analysis engine, the end user can view the initial search results, and also customize the predefined policy to define a primary policy. With the primary policy, the analysis engine and the semantic language evaluator perform semantic language analysis, and first determine the standard clauses. Among the standard clauses, standard exact clauses with words and an order of the words exactly matching clause examples are identified. Furthermore, the analysis engine and the semantic language evaluator perform another semantic language analysis with a less restrictive secondary policy to extract the non-standard clauses.
Non-Standard and Standard Clause Detection System
As illustrated in FIG. 1 , the input processor 110 aggregates one or more raw data 100(0), 100(1) . . . 100(N) (generally 100) and processes them in an appropriate format. Also, the discovery engine 120 is communicatively coupled to the input processor 110. In addition, the analysis engine 130 is coupled to the discovery engine 120. The discovery engine 120 develops a predefined policy and initial search results. Additionally, the analysis engine 130 performs a semantic language analysis by applying policies to the semantic language evaluator 140, and determines the non-standard clauses, standard clauses, and standard exact clauses used in the raw data 100. Throughout the process the discovery database 160 stores the initial search results, metadata, and the predefined policy. In addition, the discovery database 160 is communicatively coupled to the input processor 110, the discovery engine 120, and the analysis engine 130. Additionally, the analysis engine 130 is coupled to the analysis database 150 and stores information for performing semantic language evaluation. In one embodiment, the discovery database 160 and the analysis database 150 can be combined into one database.
Turning to FIG. 2 , it illustrates an exemplary embodiment of an input processor 110 that may aggregate the raw data 100, and refine them into acceptable formats in the following stages. As shown in FIG. 2 , the input processor 110 includes a file import system module 212, a correction module 213, and a format standardization module 214.
The file import system module 212 receives the raw data 100 from any one of file systems, emails, Content Management Systems and physical document scanning devices. The file import system module 212 also detects potential contracts and checks if any duplicates of documents exist in the discovery database 160 already. In addition, the file import system module 212 can convert a physical document into another electronic format, for example Portable Document Format (PDF), Microsoft Office format, Tagged Image File Format (TIFF), Graphics Interchange Format (GIF), Join Photographic Experts Group (JPEG) and etc. Moreover, the file import system module 212 may include an image file processor module with an optical character recognition (OCR) engine 218. The OCR engine 218 may be an ABBYY fine reader engine or a standard iFilter OCR engine. It is to be noted that other types of OCR engine or any combinations of OCR engine may be implemented. Furthermore, the file import system module 212 detects the language of the contractual document and how many words exist within. In one aspect, the OCR engine 218 of the file import system module 212 determines a quality of the OCR performed for each character or each word, and generates a quality score indicating a quality of the OCR performed for each character or each word.
The correction module 213 in the input processor 110 receives the data imported from the file import system module 212. The correction module 213 also is configured to apply typographical corrections or OCR corrections.
In an exemplary embodiment, the format standardization module 214 tailors the format of the data imported from the file import system module 212 for further processing. The format standardization module 214 applies filters to extract textual information. In addition, the input processor 110 may remove passwords to access a protected contractual document only when the owners of the documents agree to remove such passwords. Furthermore, the format standardization module 214 includes a file protection function that creates copies of potential contractual documents identified. These identified contractual documents are stored in the discovery database 160 with security access attributes.
Next, FIG. 3 illustrates an embodiment of the discovery engine 120 that structurally analyzes an input data from the input processor 110 and generates the predefined policy. The predefined policy includes, but not limited to, predefined rules, predefined features, and predefined clause examples.
The discovery engine 120 also applies the predefined policy into the search engine (not shown) and prepares initial search results along with the predefined policy and metadata in a format that allows the end user to view. As shown in FIG. 3 , the discovery engine 120 includes a pre-normalization module 321, a language detection module 322, a processing queue module 323, a structuration function module 324, a rules processing module 325, a post processing and reduction module 326, and a high level processing module 327.
The pre-normalization module 321 receives the imported data in the standardized format obtained from the input processor 110, and converts the imported data into the standard XML or HyperText Markup Language (HTML) document. Also, the language detection module 322 can identify the language used in the XML or HTML converted document (e.g., English, German, and etc.), and place the document in the processing queue module 323.
Once the XML or HTML converted document is out of the processing queue module 323, the structuration function module 324 structurally analyzes the XML or HTML converted document into a plurality of hierarchical levels. In FIG. 4 , illustrated is a representation of data stored as discreet database documents: a sentence level 401, a paragraph level 402, a section level 403, and a document level 404. Analyzing the documents or data in the structure mentioned above allows locating of terminologies and clauses used in the contractual documents.
Referring back to FIG. 3 , following the structuration function module 324 is the rules processing module 325. In this stage, the discovery engine 120 applies the predefined rules to generate the predefined features. The predefined rules determine the logic or sequence of words, sentences, phrases, NLP (natural language processing) features, or terminologies. In addition, the rules processing module 325 generates the predefined features from the predefined rules for the end user to customize in the analysis engine 130. The predefined features can be a key reference or a descriptive verb that can describe the document and the information held within. For instance, the predefined features can be a start date, a termination date, a contract type, and etc.
In addition, the post processing and reduction module 326 reduces and normalizes the predefined features from the rules processing module 325. It is to be noted that in addition to sentence and paragraph boundaries, the discovery engine 120 can identify contractual section boundaries such as termination, limitation of liability, indemnity sections of a contract, and etc. Moreover, the post processing and reduction module 326 prepares the predefined features for the end user to customize in the analysis engine 130.
Normalization in the post processing and reduction module 326 reduces the common notations into a standard format. For instance, the same date can be expressed in multiple ways (e.g. Oct. 23, 1992, Oct. 23, 1992, 10/23/1992, 23/10/1992, 1992/10/23 and etc.), and the normalization can convert various formats into standard ISO format. Normalizing to the standard format can eliminate confusions and improve processing speed. Most importantly, by consolidating into same notations, the discovery engine 120 can reduce any duplicate terms in different formats.
After the feature creation and normalization, the high level processing module 327 creates metadata and stores them in the discovery database 160. Additionally, the search engine communicatively coupled to the discovery database 160 obtains initial search results to determine the eligibility for analytics processing. Moreover, the high level processing module 327 prepares the predefined policy as well as the initial search results in a format that the end user can view. Furthermore, either one or both of an internal search engine and an external search engine may perform a search function.
Referring now to FIG. 5 , illustrated is one embodiment of the analysis engine 130, which identifies standard exact clauses, standard clauses, and non-standard clauses. As illustrated, the analysis engine 130 includes an analysis engine queue module 531, a variable detection module 570, a custom feature generation module 532, a document parsing module 533, a policy definition module 534, a standard clause detection module 535, a standard exact clause detection module 536, a non-standard clause detection module 537, and an update discovery database module 538.
The discovery engine 120 transfers a data set including the predefined policy, search indexes, and the initial search results to the analysis engine queue module 531. Following the analysis engine queue module 531, the custom feature generation module 532 allows the end user to customize the predefined features obtained from the discovery engine 120 and to define primary features.
The variable detection module 570 receives search indexes or the initial search results and provides available variations of clauses to the custom feature generation module 532. The variable detection module 570 may receive the search indexes or the initial search results from the discovery engine 120 directly or from the analysis engine queue module 531. The variable detection module 570 may detect allowable variations of clauses according to examples stored in the discovery engine 120 and provide the detected allowable variations of clauses with associated variables to the custom feature generation module 532.
The custom feature generation module 532 receives the predefined features from the analysis engine queue module 531 to define primary features to be used in semantic language evaluation. The custom feature generation module 532 may also receive detected allowable variations from the variable detection module 570 to define the primary features. In one approach, the custom feature generation module 532 presents to a user a list of clauses or features within a template. The user may select which clauses are to be considered as standard clauses. In addition, the user may select which clauses or words in the standard clauses can be varied. In one approach, the user may assign a variable to each set of selected clauses or words allowed to be varied. Alternatively, the custom feature generation module 532 may assign a variable to a set of clauses or words allowed to be changed. The custom feature generation module 532 provides the primary features comprising selected clauses examples and variables associated with allowable variations to a document parsing module 533.
Following is an example passage of a document with clause examples replaced with associated variables.
-
- “absText”: “WatchtowerNumber. Code of Conduct.
- WatchtowerParty Descriptors acknowledges the terms of WatchtowerLocation
- WatchtowerPartyDescriptors Code of Business Conduct\nand Ethics
- WatchtowerPartySubjectVerb WatchtowerPartyDescriptors (i) that ail of
- WatchtowerPartyDescriptors dealings with WatchtowerPartyDescriptors
- WatchtowerLocation, whether pursuant to\nthis Agreement or otherwise, shall be in general alignment with the requirements of the Code, and (ii)\nnot to induce or otherwise cause any WatchtowerPartyDescriptors WatchtowerLocation associate to violate the Code, with Code Number WatchtowerNumber. Should this be violated, the WatchtowerContractingParties agree to pay WatchtowerSealMoney within WatchtowerDuration.\n”, “offsetStart”: 12704, “offsetEnd”: 13085.
In the example passage above, various clauses are replaced with corresponding variables. Specifically, variations of a contract number, a party involved in the contract, another party involved in the contract, a specific location, a specific act, amount and duration can be replaced with a variable “WatchtowerNumber,” “WatchtowerPartyDescriptors,” “WatchtowerContractingParties,” “WatchtowerLocation,” “WatchtowerPartySubjectVerb,” “WatchtowerSealMoney,” and “WatchtowerDuration” respectively.
With the user defined primary features, the document parsing module 533 replaces the actual text, phrases or clauses with the primary features. In one embodiment, the document parsing module 533 replaces words or clauses with allowed variations with corresponding variables. The semantic language evaluator 140 formed with the primary features replaced data set, ensures the accuracy and quality of the data. That is, the semantic language evaluator 140 accounts for minor anomalies within the clauses, allowing the analysis engine 130 to locate and group clauses based on the core semantics. The document parsing module 533 transfers clause examples to the semantic language evaluator 140, and the semantic language evaluator assesses the similarity to each of the examples. In one exemplary embodiment, the semantic language evaluator 140 may be a Latent Symantec Index (LSI) module, which may provide a cosine vector score based on the similarity and classify clauses accordingly. For instance, a cosine vector score of 1 indicates a high degree of similarity, when 0 indicates a low degree of similarity.
The policy definition module 534 allows the end user to define the primary policy that includes primary rules, primary features or clause examples (herein also referred to as “primary clause examples”) and a first threshold. In one exemplary embodiment, a recommended value for the first threshold is ‘95’ or between ‘90’ and ‘99,’ when the semantic language evaluator is the LSI module.
The standard clause detection module 535 obtains standard clauses based on the primary policy. In one implementation, the standard clause detection module 535 applies the primary policy with the first threshold to the semantic language evaluator 140 to obtain the standard clauses. The primary policy with the first threshold allows the analysis engine 130 to locate clauses that are almost identical to the primary clause examples. The standard clause detection module 535 may provide a standard feature data set comprising the standard clauses to the custom feature generation module 532. The custom feature generation module 532 may modify clause examples based on the standard clauses or present the standard clauses detected to a user to allow a list of clause examples or allowable variations of clauses to be re-selected. The standard clause detection module 535 may also store the standard feature data set in the discovery database 160.
The standard exact clause detection module 536 obtains the standard exact feature data set comprising standard exact clauses based on the clause examples. In one embodiment, the standard exact clause detection module 536 replaces words or clauses allowed to be changed with corresponding variables instead of the document parsing module 533. The standard exact clause detection module 536 compares each word and an order of words from a document with each word and an order of words from clause examples to obtain standard exact clauses exactly matching the clause examples. The textual matching is performed word by word, and in this example a word can be seen as a token. A token can be made from any contiguous textual items, numbers, text, symbols. Each token is compared against the clause examples provided within the primary policy, in the exact word order it is within the clause, with the system rejecting an item as soon as the first Token is found to not match. In one implementation, the LSI module may not consider an order of the words, thus the standard exact clause detection module 536 obtains N-Gram of different words or tokens to compare an order of words. By replacing words or clauses allowed to be changed with their corresponding variables, the standard exact clause detection module 536 can reduce a number of comparisons performed to identify standard exact clauses while taking into account for each variation of clause examples.
In one embodiment, the standard exact clause detection module 536 also identifies a candidate standard exact clause including an obscure word (or a character of the word) with poor optical character recognition based on the quality score provided from the OCR engine 218. Responsive to determining the quality of the OCR performed on the obscure word is poor (e.g., the quality score of the obscure word is below a quality threshold value), the standard exact clause detection module 536 determines whether qualities of the OCR performed on a preceding word and a succeeding word of the obscure word are acceptable. If the qualities of the OCR performed on the preceding word and the succeeding word are acceptable, the standard exact clause detection module 536 determines whether any of the clause examples and the variables include the preceding word, a candidate word, and the succeeding word in that sequence. If a clause example including the preceding word, the candidate word, and the succeeding word in that sequence is found, a clause including the preceding word, the obscure word, and the succeeding word is determined to be a candidate standard exact clause. The standard exact clause detection module 536 may add the candidate standard exact clause to the standard exact feature data set.
The standard exact clause detection module 536 may provide the standard exact feature data set to the custom feature generation module 532. The custom feature generation module 532 may modify clause examples based on the standard exact clauses or candidate standard exact clauses. The custom feature generation module 532 may also present the standard exact clauses (or candidate standard exact clauses) detected to a user to allow a list of clause examples or allowable variations of clauses to be re-selected. The standard exact clause detection module 536 may also store the standard exact feature data set in the discovery database 160.
The non-standard clause detection module 537 may create a secondary policy, which is a copy of the primary policy that does not contain any rules, but includes a second threshold lower than the first threshold. In one exemplary embodiment, a recommended value for the second threshold is ‘60’ or between ‘50’ and ‘70, when the semantic language evaluator 140 is the LSI module. In addition, the non-standard clause detection module 537 extracts a mirror feature data set with the secondary policy. The secondary policy allows the analysis engine 130 to locate all clauses that are semantically similar to the primary search examples. It is to be noted that, not only the mirror feature data set contains more data, but also contains exact match from the standard feature data set. That is, the mirror feature data set encompasses the standard feature data set, where the standard feature data set encompasses the standard exact feature data set.
In one embodiment, the non-standard clause detection module 537 subtracts the standard exact feature data set from the mirror feature data set to obtain the non-standard clauses. In this embodiment, standard clauses that are not standard exact clauses would be identified as non-standard clauses.
In another embodiment, the non-standard clause detection module 537 subtracts the standard feature data set from the mirror feature data set to obtain the non-standard clauses. In this embodiment, the non-standard clause detection module 537 may obtain the non-standard clauses after the standard clauses are obtained in the standard clause detection module 535 but before the standard exact clauses are obtained in the standard exact clause detection module 536. Alternatively, the non-standard clause detection module 537 can obtain the non-standard clauses after the standard exact clauses are obtained in the standard exact clause detection module 536.
Once the analysis engine 130 obtains the standard clauses, standard exact clauses and non-standard clauses, the update discovery database module 538 may update the discovery database 160 with the standard clauses, standard exact clauses and the non-standard clauses obtained.
Standard Exact Clause and Non-Standard Clause Detection Process
The variable detection module 570 receives an input document 610. The variable detection module 570 obtains 620 allowable variations of standard clauses and a corresponding variable for the variations. The policy definition module 534 obtains 630 the primary policy including the primary rules, the primary features, the primary clause examples and the first threshold for determining similarities. The policy definition module 534 obtains 640 the secondary policy which is a copy of the primary policy that does not contain any rules but includes a second threshold lower than the first threshold. In one embodiment, the primary policy, the secondary policy and the allowable variations of standard clauses may be obtained in different orders.
The standard clause detection module 535 obtains 650 a standard feature data set comprising standard clauses based on primary policy from the input document. The standard exact clause detection module 536 generates 660 a mirror document by replacing allowable variations with corresponding variables, and obtains 670 a standard exact feature data set comprising standard exact clauses exactly matching the clause examples from the mirror document. Moreover, the non-standard clause detection module obtains 680 mirror feature data set comprising related clauses based on secondary policy from the input document. Furthermore, the non-standard clause detection module 537 obtains 690 a difference between the mirror feature data set and the standard exact feature data set to obtain non-standard clauses.
Policy Definition Process
In this example, the discovery engine 120 provides a discovery search index to the analysis engine 130 to perform a clause example search 710, and presents the predefined clause examples to the end user. The end user may search for the primary clause examples in the clause selection 720, either under a section or a paragraph. If the end user decides to look for a clause under the section, the custom feature generation module 532 loads the feature replaced data in the section selection 721. In a find similar section 723, the document parsing module 533 requests the semantic language evaluator 140 to query if similar features exist already within the index. Likewise, if the end user decides to look for a clause under the paragraph, the custom feature generation module 532 loads the feature replaced data from the analysis database 150 in a paragraph selection 722. In a find similar paragraph 724, document parsing module 533 requests the semantic language evaluator 140 to query if similar features exist already within the index.
The policy definition module 534 enables the end user to select the primary clause examples from the search results in a clause example selection 730. Additionally, the end user may repeat the clause selection 720, and select new clauses.
Following the clause example selection 730, the policy definition module 534 enables the end user to select the primary rules to determine the logic or sequence of words, sentences, phrases, or terminologies to be searched in a rule selection 740, and to evaluate the selected rule in a sentence rule evaluated 750. In addition, the end user may repeat the clause selection 720 to select new clauses to be applied or repeat the rule selection 740 to modify selected rules or add additional rules. The policy definition module 534 updates the primary policy as well as the analysis database 150 in the nested policy definition 760.
In embodiment, the secondary policy may be generated based on the primary policy, or through the similar steps described above.
Data Storage Method
Referring back to FIG. 4 , illustrated is a representation of data stored as discreet database documents. To enable the detection of the standard exact clauses and non-standard clauses, each data set associated with a clause that may contain a unique identification number or following indexes: a sentence level 401, a paragraph level 402, a section level 403, and a document level 404. In addition to the identification number, each data set may include actual text or features replaced for the clause, and the position of the clause.
During the process of defining policies and determining the non-standard and standard exact clauses, the discovery engine 120 and the analysis engine 130 communicates frequently with the discovery database 160 and the analysis database 150 for core processing repository and metadata storage location. In one exemplary embodiment, both databases contain information related to policies and the analysis database 150 may reside in the same hardware with the discovery database 160. However, the data structures in the analysis database 150 provide for two differing data sets for each sentence, paragraph, and section: one for exact text and another for features. Therefore, the storage requirement of the analysis database 150 may be demanding, but the analysis engine 130 can achieve advanced functionality including feature replacements. To reduce the extra storage requirement, the analysis database 150 may use a pointer, instead of creating copies of the entire data set.
Computing Machine Architecture
Turning now to FIG. 8 , it is a block diagram illustrating components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller). Specifically, FIG. 8 shows a diagrammatic representation of a machine in the example form of a computer system 800 within which instructions 824 (e.g., software or program code) for causing the machine to perform (execute) any one or more of the methodologies described with FIGS. 1-7 . That is, the methodologies illustrated and described through FIGS. 1-7 can be embodied as instructions 824 that are stored and executable by the computer system 800. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
The machine may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions 824 (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute instructions 824 to perform any one or more of the methodologies discussed herein.
The example computer system 800 includes a processor 802 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 804, and a static memory 806, which are configured to communicate with each other via a bus 808. The processing components are the processor 802 and memory 804. These components can be configured to operate the engines or modules with the instructions that correspond with the functionality of the respective engines or modules. The computer system 800 may further include graphics display unit 810 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)). The computer system 800 may also include alphanumeric input device 812 (e.g., a keyboard), a cursor control device 814 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 816, a signal generation device 818 (e.g., a speaker), and a network interface device 820, which also are configured to communicate via the bus 808.
The storage unit 816 includes a machine-readable medium 822 on which is stored instructions 824 (e.g., software) embodying any one or more of the methodologies or functions described herein. The storage unit 816 may be implemented as volatile memory (static RAM (SRAM) or dynamic RAM (DRAM)) and/or non-volatile memory (read-only memory (ROM), flash memory, magnetic computer storage devices (e.g., hard disks, floppy discs and magnetic tape), optical discs and etc.). The instructions 824 (e.g., software) may also reside, completely or at least partially, within the main memory 804 or within the processor 802 (e.g., within a processor's cache memory) during execution thereof by the computer system 800, the main memory 804 and the processor 802 also constituting machine-readable media. The instructions 824 (e.g., software) may be transmitted or received over a network 826 via the network interface device 820.
While machine-readable medium 822 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 824). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 824) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein. The term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.
Additional Configuration Considerations
It is noted that although the configurations as disclosed are in the context of contracts, the principles disclosed can apply to analysis of other documents that can include data corresponding to standard exact clauses and non-standard clauses. Advantages of the disclosed configurations include promptly identifying (i) exact clauses, (ii) semantically related terminologies and (iii) unusual variations of the semantically related terminologies in a large volume of documents. Moreover, while the examples herein are in the context of a contract document, the principles described herein can apply to other documents, for example web pages, having various clauses.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Certain embodiments are described herein as including logic or a number of components, engines, modules, or mechanisms, for example, as illustrated in FIGS. 1-7 . Modules may constitute either software modules (e.g., program code embodied as instructions 824 stored on a machine-readable medium e.g., memory 804 and/or storage unit 816, and executable by one or more processors (e.g., processor 802)) or hardware modules. A hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors (generally, e.g., processor 802)) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
The various operations of example methods described herein may be performed, at least partially, by one or more processors, e.g., processor 802, that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
The one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory 804). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.
As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. For example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a system and a process for detecting standard exact clauses and non-standard clauses through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.
Claims (20)
1. A non-transitory computer readable medium storing program code for determining a presence of a type of clause within a plurality of documents, the program code comprising instructions that when executed by a processor cause the processor to:
receive a clause example corresponding to the type of clause;
generate a primary policy based upon the received clause example for use in a semantic language evaluator configured to assess a level of semantic similarity between received clauses, the primary policy comprising one or more policy rules and associated with a first threshold value indicating a level of semantic similarity of a clause to the clause example;
analyze, using the semantic language evaluator, the plurality of documents according to the primary policy to automatically provide a first set of clauses corresponding of the plurality of documents, each clause of the first set corresponding to a standard clause matching the clause example in accordance with the first threshold;
generate a mirror document based upon the plurality of documents by automatically replacing one or more portions of the plurality of documents having allowable variations with corresponding variables;
parse the mirror document to generate a second set of clauses corresponding to a standard exact feature data set;
generate a secondary policy based upon the primary policy and the clause example for use in the semantic language evaluator, the secondary policy associated with a second threshold value indicating a level of semantic similarity of a clause to the clause example that is lower than the first threshold value;
analyze, using the semantic language evaluator, the plurality of documents according to the secondary policy to automatically provide a third set of clauses comprising non-standard clauses semantically related to but not matching the clause example in accordance with the second threshold, wherein the third set of clauses corresponds to a mirror feature data set;
obtain a difference between the mirror feature data set and the standard exact feature data set, the difference corresponding to non-standard clauses of the plurality of documents;
update, automatically, a database to identify the standard and non-standard clauses of the plurality of documents associated with the type of clause based upon the obtained difference, for subsequent usage in analyzing the plurality of documents.
2. The non-transitory computer readable medium of claim 1 , further comprising instructions when executed by the processor cause the processor to:
receive one or more features associated with the type of clause; and
generate, using a semantic language evaluator, a plurality of feature replaced clauses by automatically replacing one or more of a plurality of original clauses in the plurality of documents with the one or more features.
3. The non-transitory computer readable medium of claim 1 , further comprising instructions when executed by the processor cause the processor to:
identify a portion of the clause example as corresponding to an available variation of the clause example; and
replace the available variation with a variable.
4. The non-transitory computer readable medium of claim 3 , further comprising instructions when executed by the processor cause the processor to:
parse the plurality of documents to generate the second set of clauses corresponding to the standard exact feature data set containing clauses matching the clause example based upon the available variation.
5. The non-transitory computer readable medium of claim 1 , further comprising instructions when executed by the processor cause the processor to:
replace one or more clauses of the plurality of documents with one or more features, each feature of the one or more features corresponding to a reference or description of a portion of the plurality of documents,
wherein the first policy is generated based upon at least one feature of the one or more features.
6. The non-transitory computer readable medium of claim 1 , further comprising instructions when executed by the processor cause the processor to:
obtain a difference between the first set of clauses and the third set of clauses corresponding to one or more non-standard clauses.
7. A computer implemented method for determining a presence of a type of clause within a plurality of documents, the method comprising:
receiving a clause example corresponding to the type of clause;
generating a primary policy based upon the received clause example for use in a semantic language evaluator configured to assess a level of semantic similarity between received clauses, the primary policy comprising one or more policy rules and associated with a first threshold value-indicating a level of semantic similarity of a clause to the clause example;
analyzing, using the semantic language evaluator, the plurality of documents according to the primary policy to automatically provide a first set of clauses corresponding of the plurality of documents, each clause of the first set corresponding to a standard clause matching the clause example in accordance with the first threshold;
generating a mirror document based upon the plurality of documents by automatically replacing one or more portions of the plurality of documents having allowable variations with corresponding variables;
parsing the mirror document to generate a second set of clauses corresponding to a standard exact feature data set;
generating a secondary policy based upon the primary policy and the clause example for use in the semantic language evaluator, the secondary policy associated with a second threshold value indicating a level of semantic similarity of a clause to the clause example that is lower than the first threshold value;
analyzing, using the semantic language evaluator, the plurality of documents according to the secondary policy to automatically provide a third set of clauses comprising non-standard clauses semantically related to but not matching the clause example in accordance with the second threshold, wherein the third set of clauses corresponds to a mirror feature data set;
obtaining a difference between the mirror feature data set and the standard exact feature data set, the difference corresponding to non-standard clauses of the plurality of documents; and
automatically updating a database to identify the standard and non-standard clauses of the plurality of documents associated with the type of clause based upon the obtained difference, for subsequent usage in analyzing the plurality of documents.
8. The method of claim 7 , further comprising:
receiving one or more features associated with the type of clause; and
generating, using a semantic language evaluator, a plurality of feature replaced clauses by automatically replacing one or more of a plurality of original clauses in the plurality of documents with the one or more features.
9. The method of claim 7 , further comprising:
identifying a portion of the clause example as corresponding to an available variation of the clause example; and
replacing the available variation with a variable.
10. The method of claim 9 , further comprising:
parsing the plurality of documents to generate the second set of clauses corresponding to the standard exact feature data set containing clauses matching the clause example based upon the available variation.
11. The method of claim 7 , further comprising:
replacing one or more clauses of the plurality of documents with one or more features, each feature of the one or more features corresponding to a reference or description of a portion of the plurality of documents,
wherein the first policy is generated based upon at least one feature of the one or more features.
12. The method of claim 7 , further comprising:
obtain a difference between the first set of clauses and the third set of clauses corresponding to one or more non-standard clauses.
13. A system for determining a presence of a type of clause within a plurality of documents, comprising:
a document parsing module configured to receive a clause example corresponding to the type of clause;
a policy definition module configured to:
generate a primary policy based upon the received clause example for use in a semantic language evaluator configured to assess a level of semantic similarity between received clauses, the primary policy comprising one or more policy rules and associated with a first threshold value indicating a level of semantic similarity of a clause to the clause example; and
generate a secondary policy based upon the primary policy and the clause example for use in the semantic language evaluator, the secondary policy associated with a second threshold value indicating a level of semantic similarity of a clause to the clause example that is lower than the first threshold value;
an analysis engine configured to:
analyze, using the semantic language evaluator, the plurality of documents according to the primary policy to automatically provide a first set of clauses corresponding of the plurality of documents, each clause of the first set corresponding to a standard clause matching the clause example in accordance with the first threshold;
generate a mirror document based upon the plurality of documents by automatically replacing one or more portions of the plurality of documents having allowable variations with corresponding variables;
parse the mirror document to generate a second set of clauses corresponding to a standard exact feature data set;
analyze, using the semantic language evaluator, the plurality of documents according to the secondary policy to automatically provide a third set of clauses comprising non-standard clauses semantically related to but not matching the clause example in accordance with the second threshold, wherein the third set of clauses corresponds to a minor feature data set;
obtain a difference between the mirror feature data set and the standard exact feature data set, the difference corresponding to non-standard clauses of the plurality of documents; and
update, automatically, a database to identify the standard and non-standard clauses of the plurality of documents associated with the type of clause based upon the obtained difference, for subsequent usage in analyzing the plurality of documents.
14. The system of claim 13 , wherein the document parsing module is further configured to:
receive one or more features associated with the type of clause; and
generate, using a semantic language evaluator, a plurality of feature replaced clauses by automatically replacing one or more of a plurality of original clauses in the plurality of documents with the one or more features.
15. The system of claim 13 , wherein the document parsing module is further configured to:
identify a portion of the clause example as corresponding to an available variation of the clause example; and
replace the available variation with a variable.
16. The system of claim 15 , wherein the analysis engine is further configured to:
parse the plurality of documents to generate the second set of clauses corresponding to the standard exact feature data set containing clauses matching the clause example based upon the available variation.
17. The system of claim 13 , wherein the document parsing module is further configured to:
replace one or more clauses of the plurality of documents with one or more features, each feature of the one or more features corresponding to a reference or description of a portion of the plurality of documents, and
wherein the policy definition module is configured to generate the first policy based upon at least one feature of the one or more features.
18. The system of claim 13 , wherein the analysis engine is further configured to:
obtain a difference between the first set of clauses and the third set of clauses corresponding to one or more non-standard clauses.
19. A non-transitory computer readable medium comprising stored program code for determining a presence of a type of clause within a plurality of documents, the program code comprising instructions that when executed by a processor cause the processor to:
receive a clause example corresponding to the type of clause;
generate a primary policy based upon the received clause example for use in a semantic language evaluator configured to assess a level of semantic similarity between received clauses, the primary policy comprising one or more policy rules and associated with a first threshold value indicating a level of semantic similarity of a clause to the clause example;
analyze, using the semantic language evaluator, the plurality of documents according to the primary policy to automatically provide a first set of clauses of the plurality of documents, each clause of the first set corresponding to a standard clause matching the clause example in accordance with the first threshold;
generate a mirror document based upon the plurality of documents by automatically replacing one or more portions of the plurality of documents having allowable variations with corresponding variables;
generate a secondary policy based upon the primary policy and the clause example for use in the semantic language evaluator, the secondary policy associated with a second threshold value indicating a level of semantic similarity of a clause to the clause example that is lower than the first threshold value;
analyze, using the semantic language evaluator, the mirror document according to the secondary policy to automatically provide a second set of clauses corresponding to a mirror feature data set comprising non-standard clauses semantically related to but not matching the clause example in accordance with the second threshold;
obtain a difference between the mirror feature data set and the first set of clauses, the difference corresponding to non-standard clauses of the plurality of documents; and
update, automatically, a database to identify the standard and non-standard clauses of the plurality of documents associated with the type of clause based upon the obtained difference, for subsequent usage in analyzing the plurality of documents.
20. The non-transitory computer readable medium of claim 19, further comprising instructions that when executed by the processor cause the processor to:
parse the mirror document to generate a set of standard exact clauses corresponding to a standard exact feature data set based on the clause example; and
update, automatically, the database to identify the standard exact clauses of the plurality of documents.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/086,288 USRE49576E1 (en) | 2015-07-13 | 2020-10-30 | Standard exact clause detection |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/797,959 US9805025B2 (en) | 2015-07-13 | 2015-07-13 | Standard exact clause detection |
US15/723,023 US10185712B2 (en) | 2015-07-13 | 2017-10-02 | Standard exact clause detection |
US17/086,288 USRE49576E1 (en) | 2015-07-13 | 2020-10-30 | Standard exact clause detection |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/723,023 Reissue US10185712B2 (en) | 2015-07-13 | 2017-10-02 | Standard exact clause detection |
Publications (1)
Publication Number | Publication Date |
---|---|
USRE49576E1 true USRE49576E1 (en) | 2023-07-11 |
Family
ID=57776060
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/797,959 Active 2035-08-27 US9805025B2 (en) | 2015-07-13 | 2015-07-13 | Standard exact clause detection |
US15/723,023 Ceased US10185712B2 (en) | 2015-07-13 | 2017-10-02 | Standard exact clause detection |
US17/086,288 Active USRE49576E1 (en) | 2015-07-13 | 2020-10-30 | Standard exact clause detection |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/797,959 Active 2035-08-27 US9805025B2 (en) | 2015-07-13 | 2015-07-13 | Standard exact clause detection |
US15/723,023 Ceased US10185712B2 (en) | 2015-07-13 | 2017-10-02 | Standard exact clause detection |
Country Status (1)
Country | Link |
---|---|
US (3) | US9805025B2 (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2360271A1 (en) | 1998-06-24 | 2011-08-24 | Illumina, Inc. | Decoding of array sensors with microspheres |
US6429027B1 (en) | 1998-12-28 | 2002-08-06 | Illumina, Inc. | Composite arrays utilizing microspheres |
WO2000063437A2 (en) | 1999-04-20 | 2000-10-26 | Illumina, Inc. | Detection of nucleic acid reactions on bead arrays |
WO2000075373A2 (en) | 1999-05-20 | 2000-12-14 | Illumina, Inc. | Combinatorial decoding of random nucleic acid arrays |
EP2246438B1 (en) | 2001-07-12 | 2019-11-27 | Illumina, Inc. | Multiplex nucleic acid reactions |
US20040259100A1 (en) | 2003-06-20 | 2004-12-23 | Illumina, Inc. | Methods and compositions for whole genome amplification and genotyping |
JP2008528040A (en) | 2005-02-01 | 2008-07-31 | アジェンコート バイオサイエンス コーポレイション | Reagents, methods and libraries for bead-based sequencing |
EP2233582A1 (en) | 2005-02-01 | 2010-09-29 | AB Advanced Genetic Analysis Corporation | Nucleic acid sequencing by performing successive cycles of duplex extension |
AU2014364180B2 (en) | 2013-12-09 | 2021-03-04 | Illumina, Inc. | Methods and compositions for targeted nucleic acid sequencing |
US10511653B2 (en) * | 2015-10-12 | 2019-12-17 | Roman KISIN | Discussion-based document collaboration |
US11023656B2 (en) | 2017-10-20 | 2021-06-01 | Heretik Inc. | Method and system for dynamically configuring a user interface for a specified document review task |
US10467344B1 (en) | 2018-08-02 | 2019-11-05 | Sas Institute Inc. | Human language analyzer for detecting clauses, clause types, and clause relationships |
US10915710B2 (en) | 2018-09-27 | 2021-02-09 | International Business Machines Corporation | Clause analysis based on collection coherence in legal domain |
US11176271B1 (en) | 2018-12-04 | 2021-11-16 | Eightfold AI Inc. | System, method, and computer program for enabling a candidate to anonymously apply for a job |
US11030583B1 (en) | 2018-12-04 | 2021-06-08 | Eightfold AI Inc. | System, method, and computer program for automatically removing data from candidate profiles that may influence bias |
WO2021141567A1 (en) * | 2020-01-06 | 2021-07-15 | Eightfold AI Inc. | System, method, and computer program for using machine learning to calibrate job description based on diversity criteria |
US11783439B2 (en) | 2019-01-16 | 2023-10-10 | LAINA Pro, Inc. | Legal document analysis platform |
US11803706B2 (en) | 2020-01-24 | 2023-10-31 | Thomson Reuters Enterprise Centre Gmbh | Systems and methods for structure and header extraction |
US11494720B2 (en) * | 2020-06-30 | 2022-11-08 | International Business Machines Corporation | Automatic contract risk assessment based on sentence level risk criterion using machine learning |
US11972255B2 (en) * | 2021-06-25 | 2024-04-30 | International Business Machines Corporation | Compliance content generation |
US12118309B2 (en) * | 2021-09-28 | 2024-10-15 | Intuit Inc. | Converting from compressed language to natural language |
US20230409823A1 (en) * | 2022-06-16 | 2023-12-21 | The Bank Of Nova Scotia | System and Method for Reviewing and Evaluating Discrepancies Between Two or More Documents |
Citations (72)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3662400A (en) | 1970-04-28 | 1972-05-09 | Hinderstein & Silber | Subsidiary document identification system |
US5577241A (en) | 1994-12-07 | 1996-11-19 | Excite, Inc. | Information retrieval system and method with implementation extensible query architecture |
US5977972A (en) | 1997-08-15 | 1999-11-02 | International Business Machines Corporation | User interface component and method of navigating across a boundary coupled to a scroll bar display element |
US6154579A (en) | 1997-08-11 | 2000-11-28 | At&T Corp. | Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
US6263335B1 (en) | 1996-02-09 | 2001-07-17 | Textwise Llc | Information extraction system and method using concept-relation-concept (CRC) triples |
US20010018698A1 (en) | 1997-09-08 | 2001-08-30 | Kanji Uchino | Forum/message board |
US6295529B1 (en) | 1998-12-24 | 2001-09-25 | Microsoft Corporation | Method and apparatus for indentifying clauses having predetermined characteristics indicative of usefulness in determining relationships between different texts |
US20020053064A1 (en) | 2000-10-27 | 2002-05-02 | Nec Usa, Inc. | Dynamic detection and removal of inactive clauses in sat with application in image computation |
US20020103818A1 (en) | 2000-05-04 | 2002-08-01 | Kirkfire, Inc. | Information repository system and method for an internet portal system |
US20030023539A1 (en) | 2001-07-27 | 2003-01-30 | Wilce Scot D. | Systems and methods for facilitating agreement definition via an agreement modeling system |
US20030046307A1 (en) | 1997-06-02 | 2003-03-06 | Rivette Kevin G. | Using hyperbolic trees to visualize data generated by patent-centric and group-oriented data processing |
US20030135520A1 (en) * | 2002-01-11 | 2003-07-17 | Mitchell Fred C. | Dynamic legal database providing historical and current versions of bodies of law |
US20030195885A1 (en) | 2002-04-12 | 2003-10-16 | Microsoft Corporation | System and method for XML based content management |
US20030204396A1 (en) | 2001-02-01 | 2003-10-30 | Yumi Wakita | Sentence recognition device, sentence recognition method, program, and medium |
US6654731B1 (en) | 1999-03-01 | 2003-11-25 | Oracle Corporation | Automated integration of terminological information into a knowledge base |
US6675170B1 (en) | 1999-08-11 | 2004-01-06 | Nec Laboratories America, Inc. | Method to efficiently partition large hyperlinked databases by hyperlink structure |
US20040019578A1 (en) * | 2002-07-23 | 2004-01-29 | Michael Kalmes | Method for collecting and storing data regarding terms and conditions of contractual agreements |
US20040107088A1 (en) | 1994-09-30 | 2004-06-03 | Budzinski Robert L. | Memory system for storing and retrieving experience and knowledge with natural language utilizing state representation data, word sense numbers, function codes, directed graphs and/or context memory |
US20050060140A1 (en) | 2003-09-15 | 2005-03-17 | Maddox Paul Christopher | Using semantic feature structures for document comparisons |
US20050108630A1 (en) * | 2003-11-19 | 2005-05-19 | Wasson Mark D. | Extraction of facts from text |
US20050182736A1 (en) | 2004-02-18 | 2005-08-18 | Castellanos Maria G. | Method and apparatus for determining contract attributes based on language patterns |
US20050210040A1 (en) | 2004-03-18 | 2005-09-22 | Zenodata Corporation | Document organization and formatting for display |
US20060069545A1 (en) * | 2004-09-10 | 2006-03-30 | Microsoft Corporation | Method and apparatus for transducer-based text normalization and inverse text normalization |
US7171415B2 (en) | 2001-05-04 | 2007-01-30 | Sun Microsystems, Inc. | Distributed information discovery through searching selected registered information providers |
US20070073678A1 (en) * | 2005-09-23 | 2007-03-29 | Applied Linguistics, Llc | Semantic document profiling |
US20070174766A1 (en) | 2006-01-20 | 2007-07-26 | Microsoft Corporation | Hidden document data removal |
US20080154873A1 (en) | 2006-12-21 | 2008-06-26 | Redlich Ron M | Information Life Cycle Search Engine and Method |
US20080168135A1 (en) * | 2007-01-05 | 2008-07-10 | Redlich Ron M | Information Infrastructure Management Tools with Extractor, Secure Storage, Content Analysis and Classification and Method Therefor |
US20080178076A1 (en) * | 2007-01-18 | 2008-07-24 | Barry Alan Kritt | Method and apparatus for spellchecking electronic documents |
US20080189249A1 (en) * | 2007-02-05 | 2008-08-07 | Google Inc. | Searching Structured Geographical Data |
US20080306784A1 (en) * | 2007-06-05 | 2008-12-11 | Vijay Rajkumar | Computer-implemented methods and systems for analyzing clauses of contracts and other business documents |
US20090076799A1 (en) | 2007-08-31 | 2009-03-19 | Powerset, Inc. | Coreference Resolution In An Ambiguity-Sensitive Natural Language Processing System |
US20090132235A1 (en) | 2007-11-20 | 2009-05-21 | Fuji Xerox Co., Ltd. | Translation device, computer readable medium, computer data signal, and information processing method |
US20090132667A1 (en) | 2007-11-20 | 2009-05-21 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Adaptive filtering of annotated messages or the like |
US20090157385A1 (en) | 2007-12-14 | 2009-06-18 | Nokia Corporation | Inverse Text Normalization |
US20090204596A1 (en) | 2008-02-08 | 2009-08-13 | Xerox Corporation | Semantic compatibility checking for automatic correction and discovery of named entities |
US20090228777A1 (en) | 2007-08-17 | 2009-09-10 | Accupatent, Inc. | System and Method for Search |
US20090281931A1 (en) | 2006-05-08 | 2009-11-12 | Peter Axilrod | Data Storage and Processor for Storing and Processing Data Associated with Derivative Contracts and Trades Related to Derivative Contracts |
US20100088338A1 (en) | 2008-10-03 | 2010-04-08 | Pavoni Jr Donald Gordon | Red flag identification verification system and method |
US7853472B2 (en) | 2005-07-15 | 2010-12-14 | Saudi Arabian Oil Company | System, program product, and methods for managing contract procurement |
US7885981B2 (en) | 2000-10-31 | 2011-02-08 | Michael Philip Kaufman | System and method for generating automatic user interface for arbitrarily complex or large databases |
US20110093771A1 (en) | 2005-04-18 | 2011-04-21 | Raz Gordon | System and method for superimposing a document with date information |
US8001144B2 (en) * | 2005-09-20 | 2011-08-16 | International Business Machines Corporation | Detecting relationships in unstructured text |
US8024173B1 (en) | 2006-09-11 | 2011-09-20 | WordRake Holdings, LLC | Computer processes for detecting and correcting writing problems associated with nominalizations |
US20110231414A1 (en) | 2010-03-19 | 2011-09-22 | International Business Machines Corporation | Managing Processes in a Repository |
US20120209876A1 (en) | 2010-11-05 | 2012-08-16 | Gilbert Allan Thomas | Systems and methods for searching for and translating real estate descriptions from diverse sources utilizing a consumer-based product definition |
US8249856B2 (en) | 2008-03-20 | 2012-08-21 | Raytheon Bbn Technologies Corp. | Machine translation |
US20120266063A1 (en) | 2011-04-13 | 2012-10-18 | Bushnell Christopher G | Systems and Methods for Creating and Maintaining a Customized Version of a Master Document |
US8327414B2 (en) * | 2007-06-21 | 2012-12-04 | Motorola Solutions, Inc. | Performing policy conflict detection and resolution using semantic analysis |
US8335754B2 (en) | 2009-03-06 | 2012-12-18 | Tagged, Inc. | Representing a document using a semantic structure |
US8346752B2 (en) | 2009-02-03 | 2013-01-01 | Bmc Software, Inc. | Software title discovery |
US8346795B2 (en) | 2010-03-10 | 2013-01-01 | Xerox Corporation | System and method for guiding entity-based searching |
US20130006611A1 (en) * | 2011-06-30 | 2013-01-03 | Palo Alto Research Center Incorporated | Method and system for extracting shadow entities from emails |
US20130007578A1 (en) | 2011-06-30 | 2013-01-03 | Landon Ip, Inc. | Method and apparatus for displaying component documents of a composite document |
US20130006973A1 (en) | 2011-06-28 | 2013-01-03 | Microsoft Corporation | Summarization of Conversation Threads |
US8352405B2 (en) * | 2011-04-21 | 2013-01-08 | Palo Alto Research Center Incorporated | Incorporating lexicon knowledge into SVM learning to improve sentiment classification |
US8359191B2 (en) | 2008-08-01 | 2013-01-22 | International Business Machines Corporation | Deriving ontology based on linguistics and community tag clouds |
US20130204877A1 (en) | 2012-02-08 | 2013-08-08 | International Business Machines Corporation | Attribution using semantic analyisis |
US20130311490A1 (en) * | 2009-01-02 | 2013-11-21 | Apple Inc. | Efficient Data Structures for Parsing and Analyzing a Document |
US20130332164A1 (en) | 2012-06-08 | 2013-12-12 | Devang K. Nalk | Name recognition system |
US8781815B1 (en) * | 2013-12-05 | 2014-07-15 | Seal Software Ltd. | Non-standard and standard clause detection |
US20140222415A1 (en) | 2013-02-05 | 2014-08-07 | Milan Legat | Accuracy of text-to-speech synthesis |
US8818793B1 (en) * | 2002-12-24 | 2014-08-26 | At&T Intellectual Property Ii, L.P. | System and method of extracting clauses for spoken language understanding |
US8849648B1 (en) * | 2002-12-24 | 2014-09-30 | At&T Intellectual Property Ii, L.P. | System and method of extracting clauses for spoken language understanding |
US20140337345A1 (en) | 2013-05-09 | 2014-11-13 | Ricoh Company, Ltd. | System for processing data received from various data sources |
US20150106378A1 (en) * | 2013-10-14 | 2015-04-16 | Barracuda Networks, Inc. | Document Categorization By Rules and Clause Group Scores Associated with Type Profiles Apparatus and Method |
US20150248881A1 (en) | 2014-03-03 | 2015-09-03 | General Motors Llc | Dynamic speech system tuning |
US20150347390A1 (en) | 2014-05-30 | 2015-12-03 | Vavni, Inc. | Compliance Standards Metadata Generation |
US20160026620A1 (en) * | 2014-07-24 | 2016-01-28 | Seal Software Ltd. | Advanced clause groupings detection |
US9626358B2 (en) * | 2014-11-26 | 2017-04-18 | Abbyy Infopoisk Llc | Creating ontologies by analyzing natural language texts |
US9792277B2 (en) * | 2010-12-09 | 2017-10-17 | Rage Frameworks, Inc. | System and method for determining the meaning of a document with respect to a concept |
US10140295B2 (en) * | 2014-03-29 | 2018-11-27 | Camelot Uk Bidco Limited | Method, system and software for searching, identifying, retrieving and presenting electronic documents |
-
2015
- 2015-07-13 US US14/797,959 patent/US9805025B2/en active Active
-
2017
- 2017-10-02 US US15/723,023 patent/US10185712B2/en not_active Ceased
-
2020
- 2020-10-30 US US17/086,288 patent/USRE49576E1/en active Active
Patent Citations (74)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3662400A (en) | 1970-04-28 | 1972-05-09 | Hinderstein & Silber | Subsidiary document identification system |
US20040107088A1 (en) | 1994-09-30 | 2004-06-03 | Budzinski Robert L. | Memory system for storing and retrieving experience and knowledge with natural language utilizing state representation data, word sense numbers, function codes, directed graphs and/or context memory |
US5577241A (en) | 1994-12-07 | 1996-11-19 | Excite, Inc. | Information retrieval system and method with implementation extensible query architecture |
US6263335B1 (en) | 1996-02-09 | 2001-07-17 | Textwise Llc | Information extraction system and method using concept-relation-concept (CRC) triples |
US20030046307A1 (en) | 1997-06-02 | 2003-03-06 | Rivette Kevin G. | Using hyperbolic trees to visualize data generated by patent-centric and group-oriented data processing |
US6154579A (en) | 1997-08-11 | 2000-11-28 | At&T Corp. | Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
US5977972A (en) | 1997-08-15 | 1999-11-02 | International Business Machines Corporation | User interface component and method of navigating across a boundary coupled to a scroll bar display element |
US20010018698A1 (en) | 1997-09-08 | 2001-08-30 | Kanji Uchino | Forum/message board |
US6295529B1 (en) | 1998-12-24 | 2001-09-25 | Microsoft Corporation | Method and apparatus for indentifying clauses having predetermined characteristics indicative of usefulness in determining relationships between different texts |
US6654731B1 (en) | 1999-03-01 | 2003-11-25 | Oracle Corporation | Automated integration of terminological information into a knowledge base |
US6675170B1 (en) | 1999-08-11 | 2004-01-06 | Nec Laboratories America, Inc. | Method to efficiently partition large hyperlinked databases by hyperlink structure |
US20020103818A1 (en) | 2000-05-04 | 2002-08-01 | Kirkfire, Inc. | Information repository system and method for an internet portal system |
US20020053064A1 (en) | 2000-10-27 | 2002-05-02 | Nec Usa, Inc. | Dynamic detection and removal of inactive clauses in sat with application in image computation |
US7885981B2 (en) | 2000-10-31 | 2011-02-08 | Michael Philip Kaufman | System and method for generating automatic user interface for arbitrarily complex or large databases |
US20030204396A1 (en) | 2001-02-01 | 2003-10-30 | Yumi Wakita | Sentence recognition device, sentence recognition method, program, and medium |
US7171415B2 (en) | 2001-05-04 | 2007-01-30 | Sun Microsystems, Inc. | Distributed information discovery through searching selected registered information providers |
US20030023539A1 (en) | 2001-07-27 | 2003-01-30 | Wilce Scot D. | Systems and methods for facilitating agreement definition via an agreement modeling system |
US20030135520A1 (en) * | 2002-01-11 | 2003-07-17 | Mitchell Fred C. | Dynamic legal database providing historical and current versions of bodies of law |
US20030195885A1 (en) | 2002-04-12 | 2003-10-16 | Microsoft Corporation | System and method for XML based content management |
US20040019578A1 (en) * | 2002-07-23 | 2004-01-29 | Michael Kalmes | Method for collecting and storing data regarding terms and conditions of contractual agreements |
US8818793B1 (en) * | 2002-12-24 | 2014-08-26 | At&T Intellectual Property Ii, L.P. | System and method of extracting clauses for spoken language understanding |
US8849648B1 (en) * | 2002-12-24 | 2014-09-30 | At&T Intellectual Property Ii, L.P. | System and method of extracting clauses for spoken language understanding |
US20050060140A1 (en) | 2003-09-15 | 2005-03-17 | Maddox Paul Christopher | Using semantic feature structures for document comparisons |
US20050108630A1 (en) * | 2003-11-19 | 2005-05-19 | Wasson Mark D. | Extraction of facts from text |
US20050182736A1 (en) | 2004-02-18 | 2005-08-18 | Castellanos Maria G. | Method and apparatus for determining contract attributes based on language patterns |
US20050210040A1 (en) | 2004-03-18 | 2005-09-22 | Zenodata Corporation | Document organization and formatting for display |
US20060069545A1 (en) * | 2004-09-10 | 2006-03-30 | Microsoft Corporation | Method and apparatus for transducer-based text normalization and inverse text normalization |
US20110093771A1 (en) | 2005-04-18 | 2011-04-21 | Raz Gordon | System and method for superimposing a document with date information |
US7853472B2 (en) | 2005-07-15 | 2010-12-14 | Saudi Arabian Oil Company | System, program product, and methods for managing contract procurement |
US8001144B2 (en) * | 2005-09-20 | 2011-08-16 | International Business Machines Corporation | Detecting relationships in unstructured text |
US20070073678A1 (en) * | 2005-09-23 | 2007-03-29 | Applied Linguistics, Llc | Semantic document profiling |
US20070174766A1 (en) | 2006-01-20 | 2007-07-26 | Microsoft Corporation | Hidden document data removal |
US20090281931A1 (en) | 2006-05-08 | 2009-11-12 | Peter Axilrod | Data Storage and Processor for Storing and Processing Data Associated with Derivative Contracts and Trades Related to Derivative Contracts |
US8024173B1 (en) | 2006-09-11 | 2011-09-20 | WordRake Holdings, LLC | Computer processes for detecting and correcting writing problems associated with nominalizations |
US20080154873A1 (en) | 2006-12-21 | 2008-06-26 | Redlich Ron M | Information Life Cycle Search Engine and Method |
US20080168135A1 (en) * | 2007-01-05 | 2008-07-10 | Redlich Ron M | Information Infrastructure Management Tools with Extractor, Secure Storage, Content Analysis and Classification and Method Therefor |
US20080178076A1 (en) * | 2007-01-18 | 2008-07-24 | Barry Alan Kritt | Method and apparatus for spellchecking electronic documents |
US20080189249A1 (en) * | 2007-02-05 | 2008-08-07 | Google Inc. | Searching Structured Geographical Data |
US20080306784A1 (en) * | 2007-06-05 | 2008-12-11 | Vijay Rajkumar | Computer-implemented methods and systems for analyzing clauses of contracts and other business documents |
US8327414B2 (en) * | 2007-06-21 | 2012-12-04 | Motorola Solutions, Inc. | Performing policy conflict detection and resolution using semantic analysis |
US20090228777A1 (en) | 2007-08-17 | 2009-09-10 | Accupatent, Inc. | System and Method for Search |
US20090076799A1 (en) | 2007-08-31 | 2009-03-19 | Powerset, Inc. | Coreference Resolution In An Ambiguity-Sensitive Natural Language Processing System |
US20090132667A1 (en) | 2007-11-20 | 2009-05-21 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Adaptive filtering of annotated messages or the like |
US20090132235A1 (en) | 2007-11-20 | 2009-05-21 | Fuji Xerox Co., Ltd. | Translation device, computer readable medium, computer data signal, and information processing method |
US20090157385A1 (en) | 2007-12-14 | 2009-06-18 | Nokia Corporation | Inverse Text Normalization |
US20090204596A1 (en) | 2008-02-08 | 2009-08-13 | Xerox Corporation | Semantic compatibility checking for automatic correction and discovery of named entities |
US8249856B2 (en) | 2008-03-20 | 2012-08-21 | Raytheon Bbn Technologies Corp. | Machine translation |
US8359191B2 (en) | 2008-08-01 | 2013-01-22 | International Business Machines Corporation | Deriving ontology based on linguistics and community tag clouds |
US20100088338A1 (en) | 2008-10-03 | 2010-04-08 | Pavoni Jr Donald Gordon | Red flag identification verification system and method |
US20150324338A1 (en) * | 2009-01-02 | 2015-11-12 | Apple Inc. | Identification of Layout and Content Flow of an Unstructured Document |
US8892992B2 (en) * | 2009-01-02 | 2014-11-18 | Apple Inc. | Methods for efficient cluster analysis |
US20130311490A1 (en) * | 2009-01-02 | 2013-11-21 | Apple Inc. | Efficient Data Structures for Parsing and Analyzing a Document |
US8346752B2 (en) | 2009-02-03 | 2013-01-01 | Bmc Software, Inc. | Software title discovery |
US8335754B2 (en) | 2009-03-06 | 2012-12-18 | Tagged, Inc. | Representing a document using a semantic structure |
US8346795B2 (en) | 2010-03-10 | 2013-01-01 | Xerox Corporation | System and method for guiding entity-based searching |
US20110231414A1 (en) | 2010-03-19 | 2011-09-22 | International Business Machines Corporation | Managing Processes in a Repository |
US20120209876A1 (en) | 2010-11-05 | 2012-08-16 | Gilbert Allan Thomas | Systems and methods for searching for and translating real estate descriptions from diverse sources utilizing a consumer-based product definition |
US9792277B2 (en) * | 2010-12-09 | 2017-10-17 | Rage Frameworks, Inc. | System and method for determining the meaning of a document with respect to a concept |
US20120266063A1 (en) | 2011-04-13 | 2012-10-18 | Bushnell Christopher G | Systems and Methods for Creating and Maintaining a Customized Version of a Master Document |
US8352405B2 (en) * | 2011-04-21 | 2013-01-08 | Palo Alto Research Center Incorporated | Incorporating lexicon knowledge into SVM learning to improve sentiment classification |
US20130006973A1 (en) | 2011-06-28 | 2013-01-03 | Microsoft Corporation | Summarization of Conversation Threads |
US20130007578A1 (en) | 2011-06-30 | 2013-01-03 | Landon Ip, Inc. | Method and apparatus for displaying component documents of a composite document |
US20130006611A1 (en) * | 2011-06-30 | 2013-01-03 | Palo Alto Research Center Incorporated | Method and system for extracting shadow entities from emails |
US20130204877A1 (en) | 2012-02-08 | 2013-08-08 | International Business Machines Corporation | Attribution using semantic analyisis |
US20130332164A1 (en) | 2012-06-08 | 2013-12-12 | Devang K. Nalk | Name recognition system |
US20140222415A1 (en) | 2013-02-05 | 2014-08-07 | Milan Legat | Accuracy of text-to-speech synthesis |
US20140337345A1 (en) | 2013-05-09 | 2014-11-13 | Ricoh Company, Ltd. | System for processing data received from various data sources |
US20150106378A1 (en) * | 2013-10-14 | 2015-04-16 | Barracuda Networks, Inc. | Document Categorization By Rules and Clause Group Scores Associated with Type Profiles Apparatus and Method |
US8781815B1 (en) * | 2013-12-05 | 2014-07-15 | Seal Software Ltd. | Non-standard and standard clause detection |
US20150248881A1 (en) | 2014-03-03 | 2015-09-03 | General Motors Llc | Dynamic speech system tuning |
US10140295B2 (en) * | 2014-03-29 | 2018-11-27 | Camelot Uk Bidco Limited | Method, system and software for searching, identifying, retrieving and presenting electronic documents |
US20150347390A1 (en) | 2014-05-30 | 2015-12-03 | Vavni, Inc. | Compliance Standards Metadata Generation |
US20160026620A1 (en) * | 2014-07-24 | 2016-01-28 | Seal Software Ltd. | Advanced clause groupings detection |
US9626358B2 (en) * | 2014-11-26 | 2017-04-18 | Abbyy Infopoisk Llc | Creating ontologies by analyzing natural language texts |
Non-Patent Citations (2)
Title |
---|
International Search Report and Written Opinion, Patent Cooperation Treaty Application No. PCT/US2014/057893, dated Jan. 2, 2015, twenty-one pages. |
United States Office Action, U.S. Appl. No. 14/797,959, dated Jan. 9, 2017, 14 pages. |
Also Published As
Publication number | Publication date |
---|---|
US9805025B2 (en) | 2017-10-31 |
US10185712B2 (en) | 2019-01-22 |
US20170017641A1 (en) | 2017-01-19 |
US20180024992A1 (en) | 2018-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USRE49576E1 (en) | Standard exact clause detection | |
US8781815B1 (en) | Non-standard and standard clause detection | |
US10402496B2 (en) | Advanced clause groupings detection | |
US8468167B2 (en) | Automatic data validation and correction | |
US8892579B2 (en) | Method and system of data extraction from a portable document format file | |
US20180181646A1 (en) | System and method for determining identity relationships among enterprise data entities | |
US20220342921A1 (en) | Systems and methods for parsing log files using classification and a plurality of neural networks | |
US20200342059A1 (en) | Document classification by confidentiality levels | |
RU2491622C1 (en) | Method of classifying documents by categories | |
US9098487B2 (en) | Categorization based on word distance | |
US11537795B2 (en) | Document processing device, document processing method, and document processing program | |
US11544306B2 (en) | System and method for concept-based search summaries | |
US20240028650A1 (en) | Method, apparatus, and computer-readable medium for determining a data domain associated with data | |
Colavizza et al. | The references of references: a method to enrich humanities library catalogs with citation data | |
US11886477B2 (en) | System and method for quote-based search summaries | |
US11941565B2 (en) | Citation and policy based document classification | |
US10140289B2 (en) | Identifying propaganda in global social media | |
Zoya et al. | Assessing Urdu Language Processing Tools via Statistical and Outlier Detection Methods on Urdu Tweets | |
JP5550959B2 (en) | Document processing system and program | |
CN110083817B (en) | Naming disambiguation method, device and computer readable storage medium | |
Vieira et al. | A distantly supervised approach for recognizing product mentions in user-generated content | |
Deshpande | INSERT from Reality: A Schema-driven Approach to Image Capture of Structured Information | |
US20150178867A1 (en) | Linked Addendum Detection | |
JP6476638B2 (en) | Specific term candidate extraction device, specific term candidate extraction method, and specific term candidate extraction program | |
CN117273451A (en) | Enterprise risk information processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: DOCUSIGN INTERNATIONAL (EMEA) LIMITED, IRELAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SEAL SOFTWARE LIMITED;REEL/FRAME:055102/0447 Effective date: 20210129 |