WO2015148410A1 - Image interface for extracting patent features - Google Patents
Image interface for extracting patent features Download PDFInfo
- Publication number
- WO2015148410A1 WO2015148410A1 PCT/US2015/022084 US2015022084W WO2015148410A1 WO 2015148410 A1 WO2015148410 A1 WO 2015148410A1 US 2015022084 W US2015022084 W US 2015022084W WO 2015148410 A1 WO2015148410 A1 WO 2015148410A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- elements
- narrative
- nodes
- generation
- structural
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services; Handling legal documents
- G06Q50/184—Intellectual property management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the invention relates to the field of semantic networks specifically relating to extracting syntactic and semantic content to derive a semantic network from a patent and carrying out a comparison between established patent documents and one or more patent submissions that require validation of claims; and the method of relating additional information to functional as well as structural graph. Further the additional information is in the form of additional pictures and images that can be associated with the structural representation of the patent elements.
- the process and method described herein establishes the process from which to extract relevant syntactic and semantic relationships to establish a difference in graph nodes between patents and patent applications. Further the invention relates to methods for searching structural as well as functional relations in graphs stored in databases or memory using pictures through a user interface.
- the present invention relates to the field of analytics, in particular to patent overlap identification and analysis or more precisely the obviousness in comparing a new submission with prior art.
- This concept is integrated into the "restricted" narrative order and format of a patent which has additional form that provides the functionality of a patent document which the examiner uses to evaluate the proposed invention.
- This second level of function follows form materializes through section restriction, order of presentation, and restriction of syntax.
- the prior art can be established in one of several categories.
- the first category establishes statistical processes (frequency of words) to discriminate relevance of prior art and establish if a submission is similar in content.
- This category may establish, basic statistical mechanisms, weighted scores, statistical co-occurences and latent semantic analysis among other techniques to establish relevance .
- US 8,060,505 B2;US 2008/0195568 Al; US 2008/0235220 A1;US 2013/0132154 Al; US 2013/0124515 Al; US 2008/0288489 Al; US 2010/0114587 Al US 8,060,505 B2;US 2008/0195568 Al; US 2008/0235220 A1;US 2013/0132154 Al; US 2013/0124515 Al; US 2008/0288489 Al; US 2010/0114587 Al.
- the second category pertains in clustering and network analysis on established criteria to try to differentiate previous work from new work ( US 8, 412, 659 B2) .
- a node structure of elements is shown in US 8,423,489 B2.
- a related field is by analyzing patent blocks based on queries to show relationship between patent portfolios on graph mode (US 2011/0246473 Al) .
- a combination of the first category with the second category is given in US 8,504,560 B2.
- the third category uses search criteria based on regular expression and querying language such as Boolean expressions to search for relevant matches (US 2013/0198182 Al) and to compare a target sequence and a sequence stored in a database.
- search criteria based on regular expression and querying language such as Boolean expressions to search for relevant matches (US 2013/0198182 Al) and to compare a target sequence and a sequence stored in a database.
- Boolean expressions to search for relevant matches (US 2013/0198182 Al) and to compare a target sequence and a sequence stored in a database.
- the fourth category is to use an ontology to categorize patents is used in (US 2013/0086070 Al; US 2010/0131513 Al;US 2013/0086045 A1;US 2013/0086047 Al) .
- a fifth category creates ontologies automatically by using data. The ontology based method of creating data starts by first creating a lexical graph, then prominent terms are targeted and finally clustering is performed on the lexical graph ( 8,620,964).
- the prior art can be divided into three main areas.
- the first relates to the methods and processes of devising a user interface to select portions of an image (8,559,732 B2;US 8571326 B2) .
- the second group relates to searches to retrieve images based on particular characteristics such as shape properties, etc (6,801,661 Bl ; 8 , 027 , 549 B2;US 6,834,288 B2
- the third category relates to specific algorithms designed to target image similarities (US 7,706,612 B2) .
- the prior art in the second and third categories rely on the use of automated algorithms to determine the relevant features of the images submitted to the systems and processes of the prior art.
- the prior art does not relate the success of these algorithms to the relevance given by a user of the system.
- the methods in the first category provide a useful interface to get the user to provide the relevance to a picture element but do not go into the details of what to do with the picture after the image is submitted for processing.
- the current submission aims to provide a complete methodology were the user submits the image segment which is relevant to the query.
- the image is then segmented to extract the element and match it to an image stored that is related to or is a figure of a submitted patent.
- the matched figure of the patent has an associated number which will be matched to a patent element which will then be used to extract the nodes of the patent.
- Those nodes will be characterized by functional as well as structural descriptions that are located in the narrative of the patent. This process is not present in any of the previous art.
- the aim is to extract relevant structural as well as functional descriptions of the submitted image query ad provide a relevance feedback system that the user of the system can interact with to refine the performance of the system.
- An object of the present disclosure is to provide a method for creating computer representations of structural and functional elements in patent documents.
- the structural and functional representations are used to determine relative closeness of a patent, patent submission or existing product against the previous art in the form of structural and functional elements of other existing patent narratives that conform to a given structure.
- Another object of the present invention is to provide a method for deriving a way to determine the novelty obviousness and correctness of the narrative of a patent submission with regard to the existing previous art.
- Yet another object of the present invention is to provide a method for creating computer representations of structural and functional elements in patent documents further comprising: semantic relationships by exploiting the semi structured elements of the patent document to determine structural elements by parsing the body of the patent by searching descriptive elements within the narrative to form the nodes of the graph that are stored in an adjacency matrix or adjacency lists.
- Yet another object of the present invention is to provide a method for creating computer representations of structural and functional elements in patent documents further comprising: semantic relationships by exploiting the semi structured elements of the patent document to determine structural elements by parsing the body of the patent by searching descriptive elements within the narrative to form the edges of the graph that are stored in an adjacency matrix or adjacency lists.
- Another object of the present invention is to provide a method for creating computer representations of structural and functional elements in patent documents further comprising: parsing the document using parsing techniques to determine standard sections of the patent header information background of the invention, brief description of drawings, detailed description, and a claims section as well as non-standard sections of the document using previous patent patterns encoded into the parsing of the patent document.
- Another object of the present invention is to provide a complete methodology were the user submits the image segment which is relevant to the query.
- One exemplary embodiment of the present invention the user submits the image segment which is relevant to the query, wherein said image is then segmented to extract the element and match it to an image stored that is related to or is a figure of a submitted patent.
- the matched figure of the patent has an associated number which will be matched to a patent element which will then be used to extract the nodes of the patent. Those nodes will be characterized by functional as well as structural descriptions that are located in the narrative of the patent.
- invention includes “inventions”, that is, the plural of "invention”.
- invention the Applicant does not in any way admit that the present application does not include more the one patentable and non-obviously distinct invention and Applicant maintains that the present application may include more than one patentably and non-obviously distinct invention.
- the Applicant hereby asserts, that the disclosure of the present application may include more than one invention, and, in the event that there is more than one invention, that these inventions may be patentable and non-obvious one with respect to the other.
- FIG. 1 shows a conceptual representation of ideas in a graph format on a conceptual plane that serves to illustrate the underlying concepts that will be reduced to a system and method of patent analysis.
- FIG. 2 shows a graphical representation of what an inventor has to do in the narrative of a patent document which is to provide known relationships of elements within the document .
- FIG. 3 shows the actual system that implements the analysis of the concepts presented in perform of FIG. l and FIG. 2 .
- FIG. 4 shows the process of assigning preliminary category labels to patent document.
- FIG. 5 shows the steps to get the node elements from the patent submission or existing patent document.
- FIG. 6 shows storage format of the processed nodes and edges
- FIG. 7 shows the process of extracting edges from the patent submission or patent document.
- FIG. 8 shows the process of normalizing word elements from nodes and edges
- FIG. 9 shows additional discrimination in remaining text of the patent submission or patent document.
- FIG. 10 shows the process of claim analysis the detailed description section.
- FIG. 11 shows processing to determine disconnected nodes, disconnected claims, novelty and non-obviousness of document against existing prior art
- FIG. 12 shows a typical graphical interface of the proposed embodiments
- FIG. 13 shows a brief overview of the system and process of image determination and processing.
- FIG. 14 shows the steps where the user selects the relevant image segments and the system encodes the information to make the query.
- FIG. 15 shows the process of matching the submitted query with the stored images and extracting the structural as well as functional elements and order it into a narrative for the user.
- FIG. 16 shows the process of displaying the information to the user and provide feedback to the system.
- the embodiments of the invention disclosed herein may be implemented, through the use of general-programming languages (such as C or C++) .
- the program code can be disposed in any known computer-readable medium including semiconductor, magnetic disk, or optical disk (such as CD- ROM, DVD-ROM) . As such, the code can be transmitted over communication networks including the Internet.
- Computer program In the present disclosure, the terms "computer program”, “computer program medium” and “computer-usable medium” are used to generally refer to media such as a removable storage unit or a hard disk drive. Computer program medium and computer-usable medium can also refer to memories, such as system memory and graphics memory which can be memory semiconductors (e.g., DRAMs, etc.). These products are examples of how to provide software to a computer system.
- Computer program medium and computer-usable medium can also refer to memories, such as system memory and graphics memory which can be memory semiconductors (e.g., DRAMs, etc.).
- the embodiments are also directed to computer products comprising software stored on any computer-usable medium.
- software when executed in one or more data processing devices, causes a data processing device (s) to operate as described herein or, allows for the synthesis and/or manufacture of computing devices (e.g., ASICs, or processors) to perform embodiments described herein.
- Embodiments employ any computer-usable or -readable medium, and any computer-usable or -readable storage medium known now or in the future.
- Examples of computer-usable or computer-readable mediums may include, but are not limited to, primary storage devices (e.g., any type of random access memory or read-only memory) , secondary storage devices (e.g., hard drives, floppy disks, CD ROMS, ZIP disks, tapes, magnetic storage devices, optical storage devices, MEMS, nanotechnological storage devices, etc.), and communication mediums (e.g., wired and wireless communications networks, local area networks, wide area networks, intranets, etc.).
- primary storage devices e.g., any type of random access memory or read-only memory
- secondary storage devices e.g., hard drives, floppy disks, CD ROMS, ZIP disks, tapes, magnetic storage devices, optical storage devices, MEMS, nanotechnological storage devices, etc.
- communication mediums e.g., wired and wireless communications networks, local area networks, wide area networks, intranets, etc.
- the present disclosure provides an example embodiment for a method and system of extracting patent features for comparison to determine to determine similarities, novelty of the invention and non-obviousness relation between relevant art and the invention.
- an invention is physically described as an invention document in a tangible medium, such as searchable data/document medium.
- the invention comprises several distinctive elements such as structure, compounds, steps, material and other significant features.
- a search for documents related to the subject matter of the invention is performed using computer programs or personal dedicated to complete the search based on the invention description. Different search methods may apply during the search process.
- the relevant art comprises preferably searchable data, such as body, claims, description and background of the relevant art.
- FIG. 1 shows conceptual representations 3 of the similarities between the invention and relevant art, the novelty of the invention and obviousness may be represented by using squares.
- a first set of elements 2 from the invention are grouped in a first conceptual representation 1 represented by a first square.
- the first square represents similitudes between the relevant art and the invention, wherein said invention includes but is not limited to a device, method, process, composition and/or structure detailed or describe in a digital manner or by means of a searchable data.
- pluralities of similar elements between said relevant art and said invention are extracted from said searchable data, wherein said process of extraction is defined through a set of instructions.
- the elements include but are not limit to a compound, structure, temperature, particular step, material and others significant element, wherein said significant element can be represented as a node element 2 of a graph in the conceptual representation plane.
- These node elements 2 are usually introduced into the conceptual representations 3 as elements that have a specific numbering that is usually shown in the relevant art drawings and in the narrative of the relevant art body description.
- at least some different elements 4 result from the comparison between the relevant art and the invention.
- the different elements 4 are part of an invention 1 but lies in a different conceptual area 5 of knowledge that the relevant art is familiarized with.
- FIG. 2 shows a graphical representation of what an inventor has to do in the narrative of a patent document which is to provide known relationships of the elements represented as node element 2.
- This relationship can be for example of an "is-a” relationship which provides a descriptive relationships between node elements 2.
- An alternate description of a relationship 6 can be of an "functions-as" relationship described by a verb or by other relevant relationship described by another part of searchable data.
- These descriptive and functional relationships can be analyzed in a combined graph structure or separate graph structure depending on the number of nodes elements 2 and relationships 6 in a relevant art document.
- a new conceptual representation 7 can be drawn from an element 2 to an element 4 which lies in a second conceptual representation area 5 which represents a new idea domain or novelty.
- the element 4 may also be linked to a second different element 9 through a relationship 8 which lies on the second conceptual representation area 5 but does not have any link to first conceptual representation 1.
- the second different element 9 can also be connected through a relationship 10 to a node 11 which would lie on conceptual representations 3 which would still be another distinct or more particularly a third conceptual representation area 12.
- the jumps from one conceptual representation to another, the edges that linked them with specific relations and functions, and the radius of the graph will determine the scope of the invention.
- first conceptual representation 1 which is the scope of the relevant art
- second conceptual representation area 5 which is outside the scope of the relevant art
- this represents a new domain that can be considered new subject matter within that domain.
- first conceptual representation 1 is increased by a relationship 8
- element 9, relationship 10 and a node 11 the content of the second conceptual representation 5 and a third conceptual representation 12 becomes less obvious to someone skilled in the art.
- the number of nodes 4, 9 falling outside the previous art would represent the novelty while the distance made by relationships would represent the non-obviousness.
- the graph structure represented by node elements 2 and relationship 6 on the first conceptual representation 1 have useful analytic properties such as frequency of occurrence in a document and can have a high "in degree” and "out-degree” of occurrence of the edges. This high frequency of occurrences can be a possible measure of centrality in a relevant art document and can help in clarifying classifications of the invention and patent documents. These can then be compared to terms in the claims to determine claim structure appropriateness. Other measures could include providing weights to the node elements 2 and relationship 6 represented by the edges of a graph into a combined scoring for the node elements 2 of the graph. The weight for the node elements and relationship is accomplished by different methods, such as probability programs.
- FIG. 3 shows the actual system to perform the analysis of FIG.l and FIG. 2.
- the system is accessed through a computer terminal 13 comprising a computer-usable medium that sends a query to a processing unit 15 of the invention document.
- the processing unit 15 is connected through a communications channel 16 with the database 17.
- the database 17 contains a client database 18 that stores information about the individuals that submit information to the system.
- the information pertaining to each individual case is stored in the client cases database 20.
- a relation 19 links the information from the individuals stored in the client database 18 with their cases stored in the client cases database 20.
- the client cases 20 contains previous queries 21 made to the search index that were relevant to the case made on a search index 22.
- the search index provides rapid access to a searchable document, such as a relevant art text repository 24 through an indexing process 23.
- the database 17 contains an index 25 that provide a reference 26 to the lexicon and semantic repository 27 which is used in the part of speech tagging of relevant art text repository 24.
- a graph storage 29 contains extracted graphs that were processed using lexicon and semantic repository 27 on relevant art text repository 24.
- the relevant art text 25 contains references 29 to a repository of relevant art images 31.
- a graph repository 28 stores the graph information and also stores of the processed relevant art.
- the graph repository 28 also has stored references 30 to repository of relevant art/patent images 31.
- FIG. 4 shows the initial flow of document submission to the processing unit 15 through terminal 13 of the proposed system in a step 32.
- the process first determines the sections of the invention document in a step 33.
- Step 33 also checks that the submitted document complies with standard section of an patent application document such as containing a background of invention section, claims section, and so on.
- the step 33 determines the background of invention section.
- the background of invention section is then passed to a step 34 that determines the paragraph boundaries in the background of invention section.
- Each of the paragraphs determined in step 34 are then passed to a step 35 that determines phrase and word boundaries within the paragraphs using the lexicon and semantic repository 27.
- step 35 The words and phrase identified on step 35 are used on a step 36 to match edges and nodes extracted from the previous or relevant art in categories which is then matched against the term frequencies of the submission.
- a final step of Fig 4 is a step 37 which assigns a tentative category to the submitted invention based on step 36.
- FIG. 5 shows the steps to get the node elements 2 from the description of preferred embodiment section.
- the process begins with a step 38 which is the document submission to the processing unit 15.
- the document to be processed in step 38 may be a new document submission or an existing patent in the relevant art text repository 24.
- a step 39 is designed to segregate the sections of the patent submitted in step 38 and then a step 40 selects from the sections identified in step 39 the description of preferred embodiment.
- a step 41 removes meta information such as xml, html or document processor information or other format description tags that are format dependent from the processed document of step 40.
- a step 42 removes numbers from the document processed in step 41 that make reference to figures within the description of preferred embodiment section.
- the step 42 is followed by step 43 that extracts the claims section from the description of preferred embodiment section.
- a step 44 continues the processing of step 43 by splitting each paragraph of the description of preferred embodiment section and the claims section.
- step 45 that splits each paragraph into sentences and identifies the sentences that contains numbers in the sentences. These numbers represents the patent elements that correspond to elements in figures that are important to the narrative of the patent.
- the sentences identified with numbers in step 45 are then selected for further processing in step 46 that identifies the where the numbers are located within the sentence.
- the placement of numbers in step 46 then goes into a loop described by a step 47 and a step 48.
- Step 47 selects the word preceding the number and step 48 decides if the tag words are reached or the beginning of the sentence is reached. If step 48 decides that the tag word has not been reached or the beginning of the sentence then it redirects to step 47.
- the tag words can be words such as an, a, at, the, and said which mark the introduction of a new element.
- a step 49 pushes into a memory array of processing unit 15 the sentence segments that were identified in step 47 and 48. The process carried out in steps 45 through 49 are repeated until a step 50 determines that all sentences have been processed. Once all the selected numbered elements are in the array of step 49 a step 51 selects the least common denominator of each element that has the same number. This selected sentence fragment of step 51 will then be the node element 2 and the patent element that is also described in the figures and narrative. The selected least common denominator of step 51 will then be pushed into a memory array of processing unit 15 by a step 52.
- FIG. 6 shows node elements 2 in its structural representation as a storage format 53 that results from step 52.
- Storage format 53 can remain in memory of processing unit 15 or be stored in graph repository 28.
- the storage format 53 contains a node reference entry identifier 54 that identifies the element as a node and a referring number 55.
- a separating element 56 segregates the node reference entry identifier 54 from a node identifier 57.
- the node identifier 57 in turn is also separated by separating element 56 from the reference id 58.
- Reference id 58 links storage format 53 of graph repository 28 with repository of patent images 31.
- a storage format 59 has the same elements as storage format 53 and in addition contains a section separating element 60 that separate the elements of format 53 with different section 61 the section 61 contains the same elements of format 53.
- the storage format 59 is stored in graph repository 28 as distinct entries 62.
- the specific entry identifier 54 are described by a list of entries 67.
- List 67 describes each entry 54 by an individual list entry 63.
- List entry 63 has a key identifier 64.
- Entry 64 has different identifiers which in a typical embodiment can mean "c" for conjunction "p" for preposition and so on for each part of speech and "n" for a node element 2.
- FIG. 7 shows the process of determining edge elements.
- a step 68 selects the array of nodes elements of step 52.
- the step 68 is followed by a step 69 that matches node elements 2 with the sentences in which they occur.
- the matching sentences of step 69 are passed to a step 70 that identifies coordinating conjunctions within the sentences.
- the placement of coordinating conjunctions allows for set operations to be carried out on either nodes or edges.
- the step 70 is followed by a step 71 that matches correlative conjunctions on the sentences o step 69.
- the use of correlative conjunctions allows for additional set operations and conditionals to be applied to nodes and edges.
- a step 72 follows step 71 where the objective of step 72 is to match prepositional phrase elements within the sentences of step 69.
- Step 72 is complemented by step 73 that matches the remaining prepositions of the sentences of step 69.
- Step 73 gives way to a step 74 that matches the remaining parts of speech such as nouns and adjectives, articles, etc.
- the remaining words that are not matched are then flagged for manual intervention.
- the elements identified in steps 70 through 75 are then subjected to a step 76 that does a parse tree candidates of the sentence syntax structure.
- the parse trees identified in step 76 are then analyzed in a step 77 to identify the parse tree that represents the most probable sentence syntactic structure.
- step 77 will provide prepositions, verbs and adjectives which can then be used in a step 78 to determine the edges that will be used to link the node elements 2.
- a step 79 will store the edges of step 78 in graph repository 28. Alternate embodiments to edge assignment include scoring elements, special tags to sort important from trivial elements among others.
- FIG. 8 shows the normalization of word elements of nodes and edges for proper comparison between invention documents and patent which is one of the relevant art documents.
- the process starts with a step 80 that selects of tagged elements consisting of nodes elements of step 52 and edge elements of step 78.
- the step 80 is followed by a step 81 that initializes the variables x that will keep track of the number of nodes and edges and a variable y that will keep track of the number of words in each node and edge.
- the step 82 will compare if the current element is the end of the array of nodes and edges of step 80. If the total x is less than the total number of nodes and edges the step 82 will give way to a step 83 which will take the current element and split the node or edge into individual words.
- step 83 Each of the individual words of step 83 are then assigned a value and then taken each one step at a time controlled by a condition 84 which will determine if all the elements have been analyzed. If all the elements have been analyzed the condition 84 will move the routine to step 82. While all the elements have not been analyzed the step 84 will pass each individual word to a step 84 which will compare the word against category entry in the lexicon and semantic repository 27 to see if there is a synonym entry for that word in that specific category. If the word is in the category step 85 will proceed to a step 86 which will change the word to the synonym entry for that category and then redirect the flow to step 84.
- step 85 will give way to a step 87 that keeps the word and then redirects the flow of the process to step 84. If the total x is greater than the total number of nodes and edges the step 82 will move to step 88 that returns the array of words.
- the array of words in step 89 are then compared in a step 89 with words selected from other patents in the same category and in a step 90 depending on the frequency are assigned a corresponding weight in the category of form or function. Step 90 also stores the results in the array.
- the steps carried out in Fig 8 also represent a subprocess carried out in step 36 and step 37.
- FIG. 9 shows additional discrimination in remaining additional text.
- the Step 91 selects the array of node elements 2 from step 52 and edges from step 79.
- the nodes from step 91 are then used to discriminate the sentences that do not include nodes in the sentence body and therefore have not been processed.
- the sentences of step 92 are then passed to a step 93 which extracts individual words from the nodes for matching against the text of the selected sentences of step 92.
- the step 93 is followed by a step 94 which will then try synonyms of node elements against the words of selected sentences of step 92.
- the step 94 will follow with a step 95 which then uses the words from sentences of step 92 and compare them against selected edges from step 79 for a match.
- the step 95 will give way to a step 96 which will try to match edges synonyms against the words of selected sentences of step 92.
- the extracted matches of prepositions or other meaningful text that have been identified in the relationships extracted in steps 93 to 96 that are not in the existing arrays will then be added as nodes to the array of node elements 2 or edge array of step 79.
- FIG. 10 shows the process of claim analysis the detailed description section.
- the step 99 selects the array of node elements 2 and edges from step 79.
- the step 99 is followed by step 100 that selects the claims section from the invention document submission or patent document narrative.
- the selected claims section from step 100 is then processed in step 101 to match phrases or words in the claims narrative.
- the matched phrases or words of step 101 are then used in step 102 alongside array of node elements 2 to determine an exact match or synonymy between them.
- the step 102 is followed by step 103 that does exact matching of edges with words in claims section from step 100.
- the step 103 is followed by a step 104 that does synonym matching of edges with words in claims section from step 100.
- step 105 isolates the claim elements for manual intervention. Extracted elements from steps 101 and steps 102 are then stored into claim node array in step 106. Extracted elements from steps 103 and steps 104 are then stored into edge array in step 107.
- FIG. 11 shows the processing to determine disconnected nodes, disconnected claims, novelty and non obviousness of document against existing prior art.
- Step 108 selects the array of node elements 2 and edges from step 79 from the processed invention document submission.
- the step 108 is followed by a step 109 that determines by the edges from step 79 if there are any numbers of nodes that are disconnected and raises the flag for manual intervention.
- the disconnected node flag signifies that the invention document narrative is incomplete since all nodes must be connected to another node through either form or functional description.
- the step 109 is followed by step 110 that extracts the chosen samples of existing patents for comparison and makes an array of nodes 2 and an array of edges from step 79.
- Step 110 is followed by a step 111 where the corresponding array of nodes and array of edges for both the invention document submission and the existing patent documents are then compared to determine the nodes and edges from the document submission that are not part of the existing patent documents.
- Step 111 is followed by a step 112 that determines the total number of nodes and edges in the invention document submission.
- Step 113 takes the number of nodes and edges not in existing patents determined in step 111 and divides the number by the number of nodes and edges determined in step 112 to determine the novelty of the document submission.
- Step 113 is followed by a step 114 that calculates the distance of the nodes from the document submission that are not in the existing patents.
- step 114 The nodes and edges identified in step 114 are then assigned a weight in step 115 based on the distance assigned in step 114.
- Step 115 is followed by a step 116 that takes the nodes and edges identified in step 114 and then assigns a weight based on importance of the words.
- the importance score assigned in step 116 is based on a stored weight in lexicon and semantic repository 27.
- Step 117 derives a composite score from step 116 and 115.
- Step 118 takes the composite score from step 118 and divides it by the total number of nodes and edges in patent category to obtain a novelty measurement score.
- Step 118 is followed by a step 119 that displays the graphical representation of the analysis and the results of the computation.
- FIG. 12 shows the graphical interface of step 119.
- the interface is composed of a graphical interface 120 that displays the information to the user.
- Graphical interface 120 presents the user with the option 121 of analysis of function, form or both.
- the option will control underlying presentation of the processing to just show edges and nodes connected by verbs and other elements that constitute function or adjectives and other parts of speech that represent form or both.
- the graphical interface 120 will display the graphical representation of the nodes and edges in a graph 122.
- the graphical interface 120 will have a statistics section 123 that will display relevant information of the invention document submission with regards to the relevant art.
- the graphical interface 120 will have a section 124 that will link the nodes and edges of graph 122 with the invention document submission fragments where the nodes and edges appear.
- FIG. 13 through FIG. 16 are directed to a system were the user submits the image segment which is relevant.
- the system then converts the image into an appropriate encoding that can be submitted as a query.
- the query is then used to match the descriptors of the image segment with those figures or images related to those of a submitted patent.
- the matched figure of the patent has an associated number which will be matched to a patent element which will then be used to extract the nodes of the patent. Those nodes will be characterized by functional as well as structural descriptions that are located in the narrative of the patent.
- the system will then use the relevant segments extracted from the patent to construct a narrative that describes the functionality and structure of the element of the image that is described by a patent. For example FIG.
- the image grabbing device 201 that captures images, photos or other pictorial representations.
- the image grabbing device 201 is used to capture a physical object 202.
- the physical object 202 is converted into an image representation 203 by image grabbing device 201.
- the image representation 203 contains the object of interest 204.
- the object of interest 204 is selected by the user of the system by an outline 205 that can be drawn on the image grabbing device 201.
- the image grabbing device 201 makes a transmission to a processing unit 206.
- the processing unit submits a query 207 of the image descriptor to a database 208.
- the database 208 stores a graph 209 of patent elements.
- the graph 209 is extracted from a patent document 210 which also contains a patent figure 211 that has been processed into descriptors by processing unit 206 and stored into database 208.
- the processing unit 206 receives the query 207 by a result set 212 that matches the descriptor extracted from image 203 as outline 205 with descriptors extracted from patent figure 211.
- the processing unit 206 returns the results as a narrative in a message 213 to device 201.
- the processing unit 206 displays the narrative in a structured format narrative 214.
- FIG. 2 the step of converting both the object of interest 204 into a description to be submitted to processing unit 206 and the representation of patent figure 211.
- the process starts with a step 215 which represents the selection of the outline of the object from the picture.
- step 215 is then followed by a step 216 that normalizes the image to a standard size resolution.
- step 217 takes the normalized image from step 216 and applies a threshold to clean the image from noise.
- step 218 extracts the edges through an algorithm such as Sobel.
- Step 219 is a condition where a flag is evaluated whether the request is a query or a submission to expand the database. If the decision of step 219 is a query then the edges of step 218 are passed to step 220 where a descriptor such as compactness, chain codes, Fourier descriptors, or other algorithm is utilized to extract robust features for matching.
- Step 221 tries to match the descriptors of step 220 that belong to object of interest 204 and matches it against a basic representation of patent figure 211.
- the results of the matches of step 221 are in a typical embodiment passed through a second refined search 222 that may include cross correlation or dynamic time warping or other algorithm to fit the candidate further with 3 dimensional representations cutouts or exploded views in a patent image.
- An alternate embodiment can consist of a second refined 222 search consisting of matching color and texture of the object of interest 204 against a picture of patent element if it is available.
- step 222 is completed then the pictures elements are matched against an interpretation tree were the matched features are matched against patent elements within the figure if it contains multiple patent elements.
- a patent image may contain ten patent elements referenced by the numbers in the figure. Such numbers are in an interpretation tree that based on the matched sub elements are then cross referenced to the interpretation tree to extract the element numbers that are relevant to the partial or complete match.
- the step 219 condition can also be a direct submission to the database to further expand the images associated with a patent with either two dimensional representations or three dimensional representations.
- Step 219 in submission mode goes to a step 224 that is a decision based on the two dimensional or three dimensional representation of the submitted image. If step 224 is answered as a two dimensional representation it will go to a step 225 where the interface will let you select a particular spot on object of interest 204 that can then be tagged as being an element number of the relevant patent selected.
- the step 225 will follow with a step 226 that will integrate the marked spot of step 225 into the interpretation tree.
- the step 226 will be followed by a step 227 that will extract the relevant descriptors from the image and integrate them into the search database of descriptors in database 208.
- step 228 will provide image rectification of the image scene and match it to a sequence of submitted images.
- the step 228 i followed by a three dimensional reconstruction in step 229 which can be a reconstruction up to a projective transformation.
- Step 229 will provide a projective reconstruction from which points in three dimensional spaces can be computed and stored in data base 208.
- the Step 229 is followed by a step 230 that will integrate the marked spot of an object of interest 204 from a submitted picture that will be associated with a patent element number and the projective reconstruction points that will be stored into database 208.
- the patent elements that have been tagged in step 230 are then stored into an interpretation tree in a step 231.
- the marked spot of an object of interest 204 that are mapped into projective reconstruction points are then processed to extract descriptors such as Fourier descriptors, chain codes or other relevant descriptor in a step 232.
- descriptors such as Fourier descriptors, chain codes or other relevant descriptor in a step 232.
- the descriptor information of step 232 will then be stored with all the information of the previous steps in a step 233.
- FIG 3. shows the process of matching the submitted query with the stored images and extracting the structural as well as functional elements and order it into a narrative for the user.
- the process of figure 3 starts wit a step 234 that selects the matched elements of step 223 of the interpretation tree.
- a decision step 235 follows where the process checks whether the matched elements belong to an added image to a patent or if it is one of the patent figures. If the matched drawing is a not a patent figure a step 236 is taken to match the external two dimensional or three dimensional image information to the information stored of the relevant patent figure and element numbers of the patent.
- the step that follows step 235 and step 236 is step 237 where the identified patent elements are matched to the graph elements of the patent.
- the step 237 gives way to a step 238 where the relevant nodes and edges of the graph will point to the structural as well as functional elements of the patent. These nodes and edges from step 238 will also have markers to the original patent sentences from which they were extracted in step 239.
- the matched content belongs to a patent that will also contain background information such as the background of the invention from which relevant background can be provided to the user and is extracted in a step 240 that follows step 239. the extracted material from step 240 will then be organized for presentation in a step 241.
- FIG 4. show the process of presenting the ordered information of step 241.
- the process of presenting the ordered information is through a step 244 that checks if the information is to be presented textually or in audio format.
- the affirmative action of step 244 as textual information is followed by a step 245 that present first the background of the identified invention in the patent document.
- the step 245 is followed by a step 246 which displays the structural information based on the patent graph.
- the step 246 is followed by a step 247 that displays the functional information of the identified patent graph.
- step 244 to present textual information gives way to audio format by moving to a step 248 that narrates the background of the identified invention in the patent document.
- the step 248 is followed by a step 249 which narrates the structural information based on the patent graph.
- the step 249 is followed by a step 250 that narrates the functional information of the identified patent graph.
- step 251 The presentation of the identified patent of steps 247 and step 250 give way to a feedback step 251 where the user is presented with a feedback queue to determine if the result was helpful. If the answer to the feedback queue of step 251 is negative the step 252 will then move to display less relevant matches.
- step 252 gives way to a step 253 where the region searched is modified or different weights in the algorithms or descriptors are modified to get different results and if necessary further displays of the narrative are made in a step 254.
Abstract
Description
Claims
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201414665883A | 2014-03-23 | 2014-03-23 | |
US14/665,883 | 2014-03-23 | ||
US14/577,462 US9984066B2 (en) | 2013-12-19 | 2014-12-19 | Method and system of extracting patent features for comparison and to determine similarities, novelty and obviousness |
US14/577,462 | 2014-12-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015148410A1 true WO2015148410A1 (en) | 2015-10-01 |
Family
ID=54196275
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2015/022084 WO2015148410A1 (en) | 2014-03-23 | 2015-03-23 | Image interface for extracting patent features |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2015148410A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9430720B1 (en) | 2011-09-21 | 2016-08-30 | Roman Tsibulevskiy | Data processing systems, devices, and methods for content analysis |
FR3109650A1 (en) * | 2020-04-22 | 2021-10-29 | Emmanuel Beaudouin-Lafon | Process for establishing the nomenclature of a patent document |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050220351A1 (en) * | 2004-03-02 | 2005-10-06 | Microsoft Corporation | Method and system for ranking words and concepts in a text using graph-based ranking |
US20090138466A1 (en) * | 2007-08-17 | 2009-05-28 | Accupatent, Inc. | System and Method for Search |
US20110072342A1 (en) * | 2000-02-29 | 2011-03-24 | Tran Bao Q | Patent Analyzer |
US20120102427A1 (en) * | 2010-10-21 | 2012-04-26 | Marc Aaron Fenster | Systems and methods for automated claim chart generation |
US20130132442A1 (en) * | 2011-11-21 | 2013-05-23 | Motorola Mobility, Inc. | Ontology construction |
-
2015
- 2015-03-23 WO PCT/US2015/022084 patent/WO2015148410A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110072342A1 (en) * | 2000-02-29 | 2011-03-24 | Tran Bao Q | Patent Analyzer |
US20050220351A1 (en) * | 2004-03-02 | 2005-10-06 | Microsoft Corporation | Method and system for ranking words and concepts in a text using graph-based ranking |
US20090138466A1 (en) * | 2007-08-17 | 2009-05-28 | Accupatent, Inc. | System and Method for Search |
US20120102427A1 (en) * | 2010-10-21 | 2012-04-26 | Marc Aaron Fenster | Systems and methods for automated claim chart generation |
US20130132442A1 (en) * | 2011-11-21 | 2013-05-23 | Motorola Mobility, Inc. | Ontology construction |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9430720B1 (en) | 2011-09-21 | 2016-08-30 | Roman Tsibulevskiy | Data processing systems, devices, and methods for content analysis |
US9508027B2 (en) | 2011-09-21 | 2016-11-29 | Roman Tsibulevskiy | Data processing systems, devices, and methods for content analysis |
US9558402B2 (en) | 2011-09-21 | 2017-01-31 | Roman Tsibulevskiy | Data processing systems, devices, and methods for content analysis |
US9953013B2 (en) | 2011-09-21 | 2018-04-24 | Roman Tsibulevskiy | Data processing systems, devices, and methods for content analysis |
US10311134B2 (en) | 2011-09-21 | 2019-06-04 | Roman Tsibulevskiy | Data processing systems, devices, and methods for content analysis |
US10325011B2 (en) | 2011-09-21 | 2019-06-18 | Roman Tsibulevskiy | Data processing systems, devices, and methods for content analysis |
US11232251B2 (en) | 2011-09-21 | 2022-01-25 | Roman Tsibulevskiy | Data processing systems, devices, and methods for content analysis |
US11830266B2 (en) | 2011-09-21 | 2023-11-28 | Roman Tsibulevskiy | Data processing systems, devices, and methods for content analysis |
FR3109650A1 (en) * | 2020-04-22 | 2021-10-29 | Emmanuel Beaudouin-Lafon | Process for establishing the nomenclature of a patent document |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9984066B2 (en) | Method and system of extracting patent features for comparison and to determine similarities, novelty and obviousness | |
US10503828B2 (en) | System and method for answering natural language question | |
US9715493B2 (en) | Method and system for monitoring social media and analyzing text to automate classification of user posts using a facet based relevance assessment model | |
US9009134B2 (en) | Named entity recognition in query | |
US20040049499A1 (en) | Document retrieval system and question answering system | |
CA2774278C (en) | Methods and systems for extracting keyphrases from natural text for search engine indexing | |
US20100205198A1 (en) | Search query disambiguation | |
WO2013125286A1 (en) | Non-factoid question answering system and computer program | |
Tsur et al. | Identifying web queries with question intent | |
US20150269693A1 (en) | Method and System of querying patent information based on image interface | |
Van de Camp et al. | The socialist network | |
CN110888991B (en) | Sectional type semantic annotation method under weak annotation environment | |
Aquino et al. | Keyword identification in spanish documents using neural networks | |
Kanagarajan et al. | Intelligent sentence retrieval using semantic word based answer generation algorithm with cuckoo search optimization | |
Charbel et al. | Resolving XML semantic ambiguity | |
Saif et al. | Mapping Arabic WordNet synsets to Wikipedia articles using monolingual and bilingual features | |
Kamdi et al. | Keywords based closed domain question answering system for indian penal code sections and indian amendment laws | |
WO2015148410A1 (en) | Image interface for extracting patent features | |
Corrada-Emmanuel et al. | Answer passage retrieval for question answering | |
Yousaf et al. | How to identify appropriate key-value pairs for querying osm | |
Kong et al. | Topic selection of web documents using specific domain ontology | |
Andresel et al. | An approach for curating collections of historical documents with the use of topic detection technologies | |
Do et al. | Exploiting the wikipedia structure in local and global classification of taxonomic relations | |
Gope et al. | Knowledge extraction from bangla documents using nlp: A case study | |
Ajitha et al. | EFFECTIVE FEATURE EXTRACTION FOR DOCUMENT CLUSTERING TO ENHANCE SEARCH ENGINE USING XML. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15769435 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WPC | Withdrawal of priority claims after completion of the technical preparations for international publication |
Ref document number: 14/665,883 Country of ref document: US Date of ref document: 20160902 Free format text: WITHDRAWN AFTER TECHNICAL PREPARATION FINISHED |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15769435 Country of ref document: EP Kind code of ref document: A1 |