US11301633B2 - Technical document issues scanner - Google Patents
Technical document issues scanner Download PDFInfo
- Publication number
- US11301633B2 US11301633B2 US16/313,337 US201816313337A US11301633B2 US 11301633 B2 US11301633 B2 US 11301633B2 US 201816313337 A US201816313337 A US 201816313337A US 11301633 B2 US11301633 B2 US 11301633B2
- Authority
- US
- United States
- Prior art keywords
- named entities
- model
- lstm
- entities
- named
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G06K9/00442—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G06K2209/504—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/13—Type of disclosure document
- G06V2201/134—Technical report or standard
Definitions
- Implementations described herein discloses a technical document scanner determines and categorizes various common issues among a large number of documents.
- An implementation of the technical document scanner is implemented using various computer process instructions including scanning a technical document to extract content, applying named entity recognition on the extracted content to extract named entities from the technical document, applying relation extraction on the extracted entities to extract relations between the entities, and analyzing the relations between the named entities to compose lists of high relevance entities for issue checking.
- FIG. 1 illustrates an example implementation of a system for technical document issues (TDI) scanner.
- TTI technical document issues
- FIG. 2 illustrates an example implementation of natural language processing (NLP) operations used by the TDI scanner disclosed herein.
- NLP natural language processing
- FIG. 3 illustrates example implementation of a relation extraction model of the TDI scanner disclosed herein.
- FIG. 4 illustrates example operations for description language classification using machine learning (ML) according to implementations disclosed herein.
- FIG. 5 illustrates example operations for definition detection using ML according to implementations disclosed herein.
- FIG. 6 illustrates an example computing system that may be useful in implementing the described technology.
- the technical document scanner may use natural language processing (NLP) and machine learning (ML) approaches to scan the documents to categorize various common issues among a large number of documents.
- NLP natural language processing
- ML machine learning
- an implementation of the technology may use named entity recognition (NER) and relation extraction NLP processes to extract relations between named entities and to analyze the relations between the named entities.
- NER named entity recognition
- TDIs some of the common technical document issues (TDIs) may include missing definition, inconsistent naming, wrong reference, inconsistent values, conflicting descriptions, etc.
- the technology disclosed herein solves a technical problem of identifying issues in technical documents using technological solutions that include use of machine learning models.
- the technology disclosed herein uses a NER ML model and a relation extraction model that includes a long short-term memory (LSTM) ML model.
- the LSTM model includes representation of one or more of named entities using bidirectional LSTM-recursive neural networks (RNNs).
- RNNs bidirectional LSTM-recursive neural networks
- An implementation of the ML model includes a feature extraction operation using term frequency-inverse term frequency (TF-IDF) on unigrams scanned from the technical document and a classifier training operation using a support vector machine (SVM) classifier to classify the extracted features.
- TF-IDF term frequency-inverse term frequency
- SVM support vector machine
- the technical document scanner disclosed herein may use ML models such as supervised learning models such as a support vector machine (SVM) model, deep learning LSTM models, or other deep learning models.
- ML models such as supervised learning models such as a support vector machine (SVM) model, deep learning LSTM models, or other deep learning models.
- SVM support vector machine
- the technology disclosed herein docs not rely on hard coded validation rules by extracting information with self-defined named entitles and their relations using NLP and ML methodologies. As the coded rules based approach is hard to maintain and it can only check hard-coded problems, the technical document scanner disclosed herein provides a better solution.
- While the technology disclosed herein is disclosed in view of scanning and analyzing technical documents, it may also be used to scan and analyze other types of documents as well.
- an alternative implementation of the document scanner disclosed herein may be used to scan and analyze legal documents, medical documents, contracts, product descriptions, etc.
- the technology disclosed herein may be used by researchers/engineers in other communities.
- the technology disclosed herein may also assist human reviewers of documents and in contrast with laborious and expensive manual inspection approaches, the technology disclosed herein provides advantage in both document checking efficiency and accuracy.
- the document scanner technology disclosed herein is an extendable solution in that over time its performance can be improved by training better ML models.
- the document scanner technology disclosed herein may be deployed on a cloud environment.
- An implementation of the TDI scanner system disclosed herein is a collect-and-feedback system that operates by imitating a human being who has the background knowledge of the technical documents.
- FIG. 1 illustrates an example implementation of a system for technical document issues (TDI) scanner 100 .
- the TDI scanner 100 may be implemented on a computing device such as the computing device such as a laptop, a desktop, a server, or a mobile computing device.
- a computing device such as a laptop, a desktop, a server, or a mobile computing device.
- FIG. 6 An example of a computing device and its components are further disclosed in FIG. 6 below.
- each component of the TDI scanner 100 may implemented on a separate computing device on a cloud.
- the implementation of the TDI scanner 100 may be understood to be divided in three modules as disclosed in FIG. 1 .
- three modules are a reader module 104 , an information scanning module 110 , and a checking module 140 .
- the reader module 104 ingests technical documents 102 , reads the technical documents 102 , and stores the content in self-defined structures.
- the information scanning module 110 extracts information from the stored content by reader module 104 , in both natural languages and description languages. Specifically, the information scanning module 110 uses NLP models for scanning natural language content such as description of structures, implementation details, etc. In one implementation, the NLP models extract important information front technical documents which will be used for completeness and consistency checking. On the other hand, the information scanning module 110 uses description language processing (DLP) models for scanning description language content, such as code.
- DLP description language processing
- Examples of the NLP models may include a named entity recognition (NER) model 122 that is configured to retrieve the entities of interest which represent information (e.g. size, type, etc.) and a relation extraction (RE) model ( 124 ) to associate the retrieved entities with the ones which represent object definitions or object references (e.g. field definition, structure reference, etc.).
- NER named entity recognition
- RE relation extraction
- Examples of the DLP models may include a description language (DL) type prediction module 132 that may be implemented using a support vector machine (SVM) classifier to predict the type of DL and a pacing module 134 to parse the content with regular expressions according to the type of DL.
- DL description language
- SVM support vector machine
- all the objects with the associated information from both NL and DL are inserted to either of a definition list or a reference list.
- An example of a definition list may include FieldDefName1, StructureDefName1, etc.
- an example of a reference list may include FieldRefName1, FieldRefName2, StructureRefName1, etc.
- the checking module 140 may include a definition detection module 142 that is implemented using an SVM classifier to locate the definition from the definition list for each entity in the reference list output by the information scanning module 310 .
- a consistency checking module 144 may compare the extracted information contained by referred entities with related definitions for consistency check and generate identified issues 144 .
- Various module of the TDI scanner 100 are disclosed in further detail below in FIGS. 2-5 .
- the checking module 140 analyzes the relations between the entities, to compose lists of high relevance entities for issue checking. Such analysis may include inserting an entity into one of the lists to rind related entities based on entity relations to compose a record for that list. Subsequently, information between various lists is compared to according to the named entity.
- FIG. 2 illustrates an example implementation of natural language processing (NLP) operations 200 used by the TDI scanner disclosed herein.
- NLP operations 200 illustrate extracting information from a document 202 .
- the document 202 includes the following content:
- Such content may be from a technical document such as a blog, user manual, online instructions, a protocol specification document, etc.
- An operation 204 tokenizes the content to generate a row of a content table 208 . Specifically, the tokenizer breaks down each part of the content A in tokens 0 to 9.
- an NER operation 206 categorizes the tokens 0 to 9 into various entities. For example, the token 0, “Hdr,” is categorized as FieldDef, whereas the token 7, “TS_RAIL_PDU_HEADER,” is categorized in the StructureRef category. For technical documents, the important entities could be field name, structure name, size, type, etc.
- an NER model used by operation 206 may be trained using a generally available named entity recognizer model such as the Stanford NER model. For example, the following seventeen (17) customized named entity labels may be used by the NER model:
- the NER operation 206 may use a Conditional Random Field (CRF) sequence model.
- CRF Conditional Random Field
- a relation extraction operation 210 extracts the relations between various tokens 0 to 9 to generate the extracted information 220 .
- the relation extraction operation 210 retrieve the relation between entities recognized in NER operation 206 so that the information can be associated to the corresponding objects.
- an ML classifier may be used to predict relations between two entities.
- the relation extraction operation 210 is described in further detail in FIG. 3 below.
- the extracted information 220 suggests that content A provides a definition as follows:
- FIG. 3 illustrates example implementation of a relation extraction model 300 of the TDI scanner disclosed herein.
- the relation extraction model 300 uses Long short-term memory (LSTM) layer that is capable of exploiting longer range of temporal dependencies in the sequences and avoiding gradient varnishing or exploding
- the relation extraction model 300 consists of three layers, an input layer 302 , an LSTM layer 304 , and an output layer 306 .
- the input layer 302 generates representation of each named entities, such as FieldDef, Size, etc., received from previous operations.
- the LSTM layer represents the named entity sequence of the sentence with bidirectional LSTM-recursive neural networks (RNNs).
- RNNs bidirectional LSTM-recursive neural networks
- ⁇ denotes the logistic function
- i, f, o, c and h are respectively the input gate, forget gate, output gate, cell activation vectors, and hidden state vector.
- W are weight matrices and b are bias vectors.
- the output layer 306 outputs a relation label sequence that represents the relations between a current entity and a first named entity.
- a named entity sequence may include entities A, B, C, and D and an output of relation sequence may include relations E, F, G, and H, where E represents a relation of entity A with itself, F represents a relation between the entity A and the entity B, G represents a relation between the entity A and the entity C, H represents a relation between the entity A and the entity D, etc.
- the relation extraction model 300 extracts relations between the first named entity in input and a current entity
- the named entities are removed from the start so as to predict several relations with different inputs to get all the relations in an input sentence.
- the relation is predicted several times with different input of named entity to extract all relations in a sentence.
- no relations existed between “O” (others) and other named entities so the relation extraction model 300 ignores entities that are tagged with “O.”
- the relation extraction model 300 needs to predict the relations with following four input sequences:
- FIG. 4 illustrates example operations 400 for description language (DL) processing (DL) using machine learning (ML) according to implementations disclosed herein.
- the DL processing operations 400 may include DL type prediction and DL parsing.
- the operation 400 predicts the type of DL 404 using an ML model 410 that may be trained on training data 402 .
- the feature extraction module 406 of the ML model may use term frequency—inverse term frequency (TF-IDF) on unigrams scanned from the DL 404 to identify features from the DL 404 .
- TF-IDF term frequency—inverse term frequency
- the feature extraction module 406 also extracts features with conjunctions of characters, such as [ ], [ ⁇ ⁇ ], [STRING, . . . , etc.
- a classifier training module 408 using SVM allows generating prediction 420 of the type of the DL 404 .
- the classifier training module 408 may be implemented using a library of SVM (LibSVM), however, other ML classifier models may also be used.
- the DL classification operations 400 predicts that the type of DL 404 is JSON.
- FIG. 5 illustrates alternative example operations 500 for definition detection using ML according to implementations disclosed herein.
- the operations 500 may use an ML model 510 with a feature extraction module 514 and a classifier training module 516 .
- a list of candidate definition items 504 is selected from a set of definition items 502 .
- an edit-distance algorithm may be used to generate the candidate definition items 504 the definition items 502 .
- the edit-distance algorithm may include the following considerations:
- a set of definition items 506 is generated from the candidate definition items 504 .
- the definition items 506 are input tn the ML model 510 together with reference items 508 .
- the feature extraction module 514 may use similarities between the definition items 506 and the reference items 508 to extract the features from the definition items.
- the ML model 510 generates a prediction 520 and a result selection module 522 selects the results of the prediction 520 to find the definition item 524 .
- FIG. 6 illustrates an example system 600 that may be useful in implementing the described technology for providing attestable and destructible device identity.
- the example hardware and operating environment of FIG. 6 for implementing the described technology includes a computing device, such as a general-purpose computing device in the form of a computer 20 , a mobile telephone, a personal data assistant (PDA), a tablet, smart watch, gaming remote, or other type of computing device.
- the computer 20 includes a processing unit 21 , a system memory 22 , and a system bus 23 that operatively couples various system components including the system memory to the processing unit 21 .
- the computer 20 may be a conventional computer, a distributed computer, or any other type of computer; the implementations are not so limited.
- the system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, a switched fabric, point-to-point connections, and a local bus using any of a variety of bus architectures.
- the system memory may also be referred to as simply the memory, and includes read only memory (ROM) 24 and random access, memory (RAM) 25 .
- ROM read only memory
- RAM random access, memory
- a basic input/output system (BIOS) 26 containing the basic routines that help to transfer information between elements within the computer 20 , such as during start-up, is stored in ROM 24 .
- the computer 20 further includes a hard disk drive 27 for reading from and writing to a hard disk, not shown, a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29 , and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD ROM, DVD, or other optical media.
- a hard disk drive 27 for reading from and writing to a hard disk, not shown
- a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29
- an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD ROM, DVD, or other optical media.
- the hard disk drive 27 , magnetic disk drive 28 , and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32 , a magnetic disk drive interface 33 , and an optical disk drive interface 34 , respectively.
- the drives and their associated tangible computer-readable media provide non-volatile storage of computer-readable instructions, data structures, program modules and other data for the computer 20 . It should be appreciated by those skilled in the art that any type of tangible computer-readable media may be used in the example operating environment.
- a number of program modules may be stored on the hard disk drive 27 , magnetic disk 28 , optical disk 30 , ROM 24 , or RAM 25 , including an operating system 35 , one or more application programs 36 , other program modules 37 , and program data 38 .
- a user may generate reminders on the personal computer 20 through input devices such as a keyboard 40 and pointing device 42 .
- Other input devices may include a microphone (e.g., for voice input), a camera (e.g., for a natural user interface (NUI)), a joystick, a game pad, a satellite dish, a scanner, or the like.
- NUI natural user interface
- serial port interface 46 that is coupled to the system has 23 , but may be connected by other interfaces, such as a parallel port, game port, or a universal serial bus (USB) (not shown).
- a monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48 .
- computers typically include other peripheral output devices (not shown), such as speakers and printers.
- the computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as remote computer 49 . These logical connections are achieved by a communication device coupled to or a part of the computer 20 ; the implementations are not limited to a particular type of communications device.
- the remote computer 49 may be another computer, a server, a router, a network PC, a client, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 20 .
- the logical connections depicted in FIG. 10 include a local-area network (LAN) 51 and a wide-area network (WAN) 52 .
- LAN local-area network
- WAN wide-area network
- Such networking environments are commonplace in office networks, enterprise-wide computer networks, intranets and the Internet, which are all types of networks.
- the computer 20 When used in a LAN-networking environment, the computer 20 is connected to the local network 51 through a network interface or adapter 53 , which is one type of communications device.
- the computer 20 When used in a WAN-networking environment, the computer 20 typically includes a modem 54 , a network adapter, a type of communications device, or any other type of communications device for establishing communications over the wide area network 52 .
- the modem 54 which may be internal or external, is connected to the system bus 23 via the serial port interface 46 .
- program engines depicted relative to the personal computer 20 may be stored in the remote memory storage device. It is appreciated that the network connections shown are examples and other means of communications devices for establishing a communications link between the computers may be used.
- software or firmware instructions for providing attestable and destructible device identity may be stored in memory 22 and/or storage devices 29 or 31 and processed by the processing unit 21 .
- One or more ML, NLP, or DLP models disclosed herein may be stored in memory 22 and/or storage devices 29 or 31 as persistent data stores.
- a TDI scanner 602 may be implemented on the computer 20 (alternatively, the TDI scanner 602 may be implemented on a server or in a cloud environment). The TDI scanner 602 may utilize one of more of the processing unit 21 , the memory 22 , the system bus 23 , and other components of the personal computer 20 .
- intangible computer-readable communication signals may embody computer readable instructions, data structures, program modules or other data resident in a modulated data signal, such as a earlier wave or other signal transport mechanism.
- modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- intangible communication signals include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RP, infrared and other wireless media.
- the implementations described herein are implemented as logical steps in one or more computer systems.
- the logical operations may be implemented (1) as a sequence of processor-implemented steps executing in one or more computer systems and (2) as interconnected machine or circuit modules within one or more computer systems.
- the implementation is a matter of choice, dependent on the performance requirements of the computer system being utilized. Accordingly, the logical operations making up the implementations described herein are referred to variously as operations, steps, objects, or modules.
- logical operations may be performed in any order, unless explicitly claimed otherwise or a specific order is inherently necessitated by the claim language.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Machine Translation (AREA)
- Multimedia (AREA)
Abstract
Description
-
- StructureDef, StructureRef, FieldDef, FieldRef, TypeBasic, TypeModifier, FieldModifier, EnumOrFlag, Value, ValueModifierLevel, ValueModifierRestriction, Size, CollectionLength, SectionName, SectionNum, ReferredDoc, OperationRef, “O” (others).
-
- “Hdr, Type: TS_RAIL_PDU_HEADER, Size: 4 bytes”
l t=σ(W xi x t +W hi h t-1 +W ci c t-1 +b i)
f t=σ(W xf x t +W hf h t-1 +W cf c t-1 +b f)
c t =f t c t-1 +i t tanh(W xc x t +W hc h t-1 +b c)
o t=σ(W xo x t +W ho h t-1 +W co c t +b o)
h t =o t tanh(c t)
h t (r)=tanh(W rh[y t-1 ;h t]+b rh)
y t=softmax(W ry h t (r) +b y)
-
- [FieldDef, O, Size, O, Size, TypeModifier, TypeBasic],
- [Size, O, Size, TypeModifier, TypeBasic],
- [Size, TypeModifier, TypeBasic],
- [TypeModifier, TypeBasic]
-
- length (2 bytes): A 16-bit, unsigned integer that specifies the packet size. This field MUST be set to 0x0008 (8 bytes).
-
- Field_Size: length, 2 bytes
- Field_Size: length, 16-bit
- Field_Type: length, unsigned integer
- Field_Value: length, 0x0008
-
- length, [Size: 2 bytes], [Size: 16-bit], [Type: unsigned integer], [Value: 0x0008]
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/657,405 US11763088B2 (en) | 2018-12-25 | 2022-03-31 | Technical document issues scanner |
US18/365,504 US20230376692A1 (en) | 2018-12-25 | 2023-08-04 | Technical document issues scanner |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/123351 WO2020132850A1 (en) | 2018-12-25 | 2018-12-25 | Technical document issues scanner |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/123351 A-371-Of-International WO2020132850A1 (en) | 2018-12-25 | 2018-12-25 | Technical document issues scanner |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/657,405 Continuation US11763088B2 (en) | 2018-12-25 | 2022-03-31 | Technical document issues scanner |
Publications (2)
Publication Number | Publication Date |
---|---|
US20210312131A1 US20210312131A1 (en) | 2021-10-07 |
US11301633B2 true US11301633B2 (en) | 2022-04-12 |
Family
ID=71126801
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/313,337 Active US11301633B2 (en) | 2018-12-25 | 2018-12-25 | Technical document issues scanner |
US17/657,405 Active US11763088B2 (en) | 2018-12-25 | 2022-03-31 | Technical document issues scanner |
US18/365,504 Pending US20230376692A1 (en) | 2018-12-25 | 2023-08-04 | Technical document issues scanner |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/657,405 Active US11763088B2 (en) | 2018-12-25 | 2022-03-31 | Technical document issues scanner |
US18/365,504 Pending US20230376692A1 (en) | 2018-12-25 | 2023-08-04 | Technical document issues scanner |
Country Status (2)
Country | Link |
---|---|
US (3) | US11301633B2 (en) |
WO (1) | WO2020132850A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022187215A1 (en) * | 2021-03-01 | 2022-09-09 | Schlumberger Technology Corporation | System and method for automated document analysis |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105468605A (en) | 2014-08-25 | 2016-04-06 | 济南中林信息科技有限公司 | Entity information map generation method and device |
CN105512197A (en) | 2015-11-27 | 2016-04-20 | 广州宝钢南方贸易有限公司 | Digitized archiving device of documents and archiving and searching device thereof |
US20170017897A1 (en) * | 2015-07-17 | 2017-01-19 | Knoema Corporation | Method and system to provide related data |
US20170060835A1 (en) | 2015-08-27 | 2017-03-02 | Xerox Corporation | Document-specific gazetteers for named entity recognition |
US20180218284A1 (en) * | 2017-01-31 | 2018-08-02 | Xerox Corporation | Method and system for learning transferable feature representations from a source domain for a target domain |
-
2018
- 2018-12-25 US US16/313,337 patent/US11301633B2/en active Active
- 2018-12-25 WO PCT/CN2018/123351 patent/WO2020132850A1/en active Application Filing
-
2022
- 2022-03-31 US US17/657,405 patent/US11763088B2/en active Active
-
2023
- 2023-08-04 US US18/365,504 patent/US20230376692A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105468605A (en) | 2014-08-25 | 2016-04-06 | 济南中林信息科技有限公司 | Entity information map generation method and device |
US20170017897A1 (en) * | 2015-07-17 | 2017-01-19 | Knoema Corporation | Method and system to provide related data |
US20170060835A1 (en) | 2015-08-27 | 2017-03-02 | Xerox Corporation | Document-specific gazetteers for named entity recognition |
CN105512197A (en) | 2015-11-27 | 2016-04-20 | 广州宝钢南方贸易有限公司 | Digitized archiving device of documents and archiving and searching device thereof |
US20180218284A1 (en) * | 2017-01-31 | 2018-08-02 | Xerox Corporation | Method and system for learning transferable feature representations from a source domain for a target domain |
Non-Patent Citations (2)
Title |
---|
"International Search Report and Written Opinion Issued in PCT Application No. PCT/CN18/123351", dated Aug. 28, 2019, 9 Pages. |
"Open Specifications", Retrieved From: http://web.archive.org/web/20180726104704/https://msdn.microsoft.com/en-us/library/dd208104.aspx, Retrieved Date: Jul. 26, 2018, 2 Pages. |
Also Published As
Publication number | Publication date |
---|---|
US20220222443A1 (en) | 2022-07-14 |
US11763088B2 (en) | 2023-09-19 |
US20210312131A1 (en) | 2021-10-07 |
US20230376692A1 (en) | 2023-11-23 |
WO2020132850A1 (en) | 2020-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112164391B (en) | Statement processing method, device, electronic equipment and storage medium | |
CN111626048A (en) | Text error correction method, device, equipment and storage medium | |
CN111680159A (en) | Data processing method and device and electronic equipment | |
CN112632226A (en) | Semantic search method and device based on legal knowledge graph and electronic equipment | |
US20140058983A1 (en) | Systems and methods for training and classifying data | |
WO2023134083A1 (en) | Text-based sentiment classification method and apparatus, and computer device and storage medium | |
CN113095080A (en) | Theme-based semantic recognition method and device, electronic equipment and storage medium | |
US20230376692A1 (en) | Technical document issues scanner | |
Hasan et al. | Sentiment analysis using out of core learning | |
CN110674300B (en) | Method and apparatus for generating information | |
CN114970540A (en) | Method and device for training text audit model | |
CN113705192A (en) | Text processing method, device and storage medium | |
CN112632223A (en) | Case and event knowledge graph construction method and related equipment | |
CN112632948A (en) | Case document ordering method and related equipment | |
US11392772B2 (en) | Coding information extractor | |
CN116010545A (en) | Data processing method, device and equipment | |
CN117077678B (en) | Sensitive word recognition method, device, equipment and medium | |
CN113722496B (en) | Triple extraction method and device, readable storage medium and electronic equipment | |
CN116994076B (en) | Small sample image recognition method based on double-branch mutual learning feature generation | |
US11386310B2 (en) | Systems for font replacement in print workflows | |
Mehta et al. | Multilingual short text analysis of twitter using random forest approach | |
US20240121119A1 (en) | Method and Apparatus for Classifying Blockchain Address | |
US20230236802A1 (en) | Intelligent industry compliance reviewer | |
Jony et al. | Domain specific fine tuning of pre-trained language model in NLP | |
CN116204829A (en) | Emotion recognition method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, YING;LI, MIN;LU, MENGYAN;REEL/FRAME:047853/0548 Effective date: 20181211 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |