WO2015148410A1

WO2015148410A1 - Image interface for extracting patent features

Info

Publication number: WO2015148410A1
Application number: PCT/US2015/022084
Authority: WO
Inventors: Geigel ARTURO
Original assignee: Arturo Geigel
Priority date: 2014-03-23
Filing date: 2015-03-23
Publication date: 2015-10-01

Abstract

A system for submitting an image segment which is relevant, wherein said system then converts the image into an appropriate encoding that can be submitted as a query. The query is then used to match the descriptors of the image segment with those figure or images related to those of a submitted patent. The matched figure of the patent has an associated number which will be matched to a patent element which will then be used to extract the nodes of the patent. Those nodes will be characterized by functional as well as structural descriptions that are located in the narrative of the patent. The system will then use the relevant segments extracted from the patent to construct a narrative that describes the functionality and structure of the element of the image that is described by a patent.

Description

TITLE OF THE INVENTION

IMAGE INTERFACE FOR EXTRACTING PATENT FEATURES

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

N/A

BACKGROUND OF THE INVENTION

Field of the Invention

The invention relates to the field of semantic networks specifically relating to extracting syntactic and semantic content to derive a semantic network from a patent and carrying out a comparison between established patent documents and one or more patent submissions that require validation of claims; and the method of relating additional information to functional as well as structural graph. Further the additional information is in the form of additional pictures and images that can be associated with the structural representation of the patent elements.

The process and method described herein establishes the process from which to extract relevant syntactic and semantic relationships to establish a difference in graph nodes between patents and patent applications. Further the invention relates to methods for searching structural as well as functional relations in graphs stored in databases or memory using pictures through a user interface.

Discussion of the Background

The present invention relates to the field of analytics, in particular to patent overlap identification and analysis or more precisely the obviousness in comparing a new submission with prior art.

Modern evaluation methods in this area perform analysis based on Boolean, vector space models, probabilistic models, latent semantic models, etc. These metrics abstract much of the relationships inherent in natural language narrative and leave the resulting score devoid elements and relationships that take advantage of the doctrine: "function follows form". This methodology can be applied in idea conceptualization, patent analysis and infringement analysis. While previous art has tried to exploit to some extent such principle they only apply it to one level of analysis and leave multiple levels of analysis to explore. Function follows form in the context of patent narrative is the process of going from an abstract concept to a concrete invention description where the inventors role is to organize disparate ideas into a coherent functional or descriptive concept by providing "bindings" of unrelated concepts through union of a coherent relationships at different levels of abstractions. This concept is integrated into the "restricted" narrative order and format of a patent which has additional form that provides the functionality of a patent document which the examiner uses to evaluate the proposed invention. This second level of function follows form materializes through section restriction, order of presentation, and restriction of syntax. By analyzing these two levels of "function follows form" in patent documents, one can arrive at a useful method of analysis that can in turn be reduced to a method and processes of analysis that can be implemented in a computerized system. This method and process become analogous to the principles on which examiners analyze the obviousness of the patent in relation to another. The resulting method and process in a computerized system can in turn help attorneys, patent examiners, agents and interested parties in evaluating obviousness in a patent as well as the possibility of determining infringement of a patent .

The prior art can be established in one of several categories. The first category establishes statistical processes (frequency of words) to discriminate relevance of prior art and establish if a submission is similar in content. This category may establish, basic statistical mechanisms, weighted scores, statistical co-occurences and latent semantic analysis among other techniques to establish relevance . (US 8,060,505 B2;US 2008/0195568 Al; US 2008/0235220 A1;US 2013/0132154 Al; US 2013/0124515 Al; US 2008/0288489 Al; US 2010/0114587 Al) .

The second category pertains in clustering and network analysis on established criteria to try to differentiate previous work from new work ( US 8, 412, 659 B2) . A node structure of elements is shown in US 8,423,489 B2. A related field is by analyzing patent blocks based on queries to show relationship between patent portfolios on graph mode (US 2011/0246473 Al) . A combination of the first category with the second category is given in US 8,504,560 B2.

The third category uses search criteria based on regular expression and querying language such as Boolean expressions to search for relevant matches (US 2013/0198182 Al) and to compare a target sequence and a sequence stored in a database. A conjuntion method of comparison between claims in different patents matched against a database is described in US 2011/0179022 Al .

The fourth category is to use an ontology to categorize patents is used in (US 2013/0086070 Al; US 2010/0131513 Al;US 2013/0086045 A1;US 2013/0086047 Al) . A fifth category creates ontologies automatically by using data. The ontology based method of creating data starts by first creating a lexical graph, then prominent terms are targeted and finally clustering is performed on the lexical graph ( 8,620,964).

The shortcomings of the prior art is that it is either to restrictive such as using pre-established generic fields such as quantifying company name occurrences or inventors name frequency to gather into classes. On the other extreme there are processes where allow too much liberty (using Boolean operators) where the person looking for search matches requires to learn the workings of a good query to have successful matches. Other approaches such as statistical methods account for word occurrences, co- occurrences and mathematical formalism to carry out the search. These methods fall short because they do not exploit semantic relationships and structure of the patent document. No previous method explores the possibility of narrative to narrative comparison using graph theory.

Regarding to the field of analytics, in particular with associating an image with structural and functional components that are describe by means of a narrative such as the one in a patent submission currently relies on the use of automated algorithms to determine the relevant features of the images submitted to the systems and processes of the prior art. The prior art does not relate the success of these algorithms to the relevance given by a user of the system. Further such structural and functional narratives of patents do not necessarily include all necessary elements.

The prior art can be divided into three main areas. The first relates to the methods and processes of devising a user interface to select portions of an image (8,559,732 B2;US 8571326 B2) .

The second group relates to searches to retrieve images based on particular characteristics such as shape properties, etc (6,801,661 Bl ; 8 , 027 , 549 B2;US 6,834,288 B2

The third category relates to specific algorithms designed to target image similarities (US 7,706,612 B2) .

The prior art in the second and third categories rely on the use of automated algorithms to determine the relevant features of the images submitted to the systems and processes of the prior art. The prior art does not relate the success of these algorithms to the relevance given by a user of the system. The methods in the first category provide a useful interface to get the user to provide the relevance to a picture element but do not go into the details of what to do with the picture after the image is submitted for processing.

The current submission aims to provide a complete methodology were the user submits the image segment which is relevant to the query. The image is then segmented to extract the element and match it to an image stored that is related to or is a figure of a submitted patent. The matched figure of the patent has an associated number which will be matched to a patent element which will then be used to extract the nodes of the patent. Those nodes will be characterized by functional as well as structural descriptions that are located in the narrative of the patent. This process is not present in any of the previous art. The aim is to extract relevant structural as well as functional descriptions of the submitted image query ad provide a relevance feedback system that the user of the system can interact with to refine the performance of the system.

SUMMARY OF THE INVENTION An object of the present disclosure is to provide a method for creating computer representations of structural and functional elements in patent documents. The structural and functional representations are used to determine relative closeness of a patent, patent submission or existing product against the previous art in the form of structural and functional elements of other existing patent narratives that conform to a given structure.

Further, another object of the present invention is to provide a method for deriving a way to determine the novelty obviousness and correctness of the narrative of a patent submission with regard to the existing previous art.

Yet another object of the present invention is to provide a method for creating computer representations of structural and functional elements in patent documents further comprising: semantic relationships by exploiting the semi structured elements of the patent document to determine structural elements by parsing the body of the patent by searching descriptive elements within the narrative to form the nodes of the graph that are stored in an adjacency matrix or adjacency lists.

Yet another object of the present invention is to provide a method for creating computer representations of structural and functional elements in patent documents further comprising: semantic relationships by exploiting the semi structured elements of the patent document to determine structural elements by parsing the body of the patent by searching descriptive elements within the narrative to form the edges of the graph that are stored in an adjacency matrix or adjacency lists.

Another object of the present invention is to provide a method for creating computer representations of structural and functional elements in patent documents further comprising: parsing the document using parsing techniques to determine standard sections of the patent header information background of the invention, brief description of drawings, detailed description, and a claims section as well as non-standard sections of the document using previous patent patterns encoded into the parsing of the patent document.

Another object of the present invention is to provide a complete methodology were the user submits the image segment which is relevant to the query. One exemplary embodiment of the present invention the user submits the image segment which is relevant to the query, wherein said image is then segmented to extract the element and match it to an image stored that is related to or is a figure of a submitted patent. The matched figure of the patent has an associated number which will be matched to a patent element which will then be used to extract the nodes of the patent. Those nodes will be characterized by functional as well as structural descriptions that are located in the narrative of the patent.

The invention itself, both as to its configuration and its mode of operation will be best understood, and additional objects and advantages thereof will become apparent, by the following detailed description of a preferred embodiment taken in conjunction with the accompanying drawings .

When the word "invention" is used in this specification, the word "invention" includes "inventions", that is, the plural of "invention". By stating

"invention", the Applicant does not in any way admit that the present application does not include more the one patentable and non-obviously distinct invention and Applicant maintains that the present application may include more than one patentably and non-obviously distinct invention. The Applicant hereby asserts, that the disclosure of the present application may include more than one invention, and, in the event that there is more than one invention, that these inventions may be patentable and non-obvious one with respect to the other.

Further, the purpose of the accompanying abstract is to enable the U.S. Patent and Trademark Office and the public generally, and especially the scientists, engineers, and practitioners in the art who are not familiar with patent or legal terms or phraseology, to determine quickly from a cursory inspection the nature and essence of the technical disclosure of the application. The abstract is neither intended to define the invention of the application, which is measured by the claims, nor is it intended to be limiting as to the scope of the invention in any way.

BRIEF DESCRIPTION OF THE DRAWINGS

The following drawings should be read with reference to the detailed description. Like numbers refer to like elements. The drawings, which are not necessarily to scale, illustratively depict embodiments of the present invention and are not intended to limit the scope of the invention.

FIG. 1 shows a conceptual representation of ideas in a graph format on a conceptual plane that serves to illustrate the underlying concepts that will be reduced to a system and method of patent analysis. FIG. 2 shows a graphical representation of what an inventor has to do in the narrative of a patent document which is to provide known relationships of elements within the document .

FIG. 3 shows the actual system that implements the analysis of the concepts presented in perform of FIG. l and FIG. 2 . FIG. 4 shows the process of assigning preliminary category labels to patent document.

FIG. 5 shows the steps to get the node elements from the patent submission or existing patent document.

FIG. 6 shows storage format of the processed nodes and edges

FIG. 7 shows the process of extracting edges from the patent submission or patent document.

FIG. 8 shows the process of normalizing word elements from nodes and edges

FIG. 9 shows additional discrimination in remaining text of the patent submission or patent document.

FIG. 10 shows the process of claim analysis the detailed description section.

FIG. 11 shows processing to determine disconnected nodes, disconnected claims, novelty and non-obviousness of document against existing prior art FIG. 12 shows a typical graphical interface of the proposed embodiments

FIG. 13 shows a brief overview of the system and process of image determination and processing.

FIG. 14 shows the steps where the user selects the relevant image segments and the system encodes the information to make the query.

FIG. 15 shows the process of matching the submitted query with the stored images and extracting the structural as well as functional elements and order it into a narrative for the user.

FIG. 16 shows the process of displaying the information to the user and provide feedback to the system. DESCRIPTION OF THE PREFERRED EMBODIMENT

The embodiments of the invention disclosed herein may be implemented, through the use of general-programming languages (such as C or C++) . The program code can be disposed in any known computer-readable medium including semiconductor, magnetic disk, or optical disk (such as CD- ROM, DVD-ROM) . As such, the code can be transmitted over communication networks including the Internet.

In the present disclosure, the terms "computer program", "computer program medium" and "computer-usable medium" are used to generally refer to media such as a removable storage unit or a hard disk drive. Computer program medium and computer-usable medium can also refer to memories, such as system memory and graphics memory which can be memory semiconductors (e.g., DRAMs, etc.). These products are examples of how to provide software to a computer system.

The embodiments are also directed to computer products comprising software stored on any computer-usable medium. Such software, when executed in one or more data processing devices, causes a data processing device (s) to operate as described herein or, allows for the synthesis and/or manufacture of computing devices (e.g., ASICs, or processors) to perform embodiments described herein. Embodiments employ any computer-usable or -readable medium, and any computer-usable or -readable storage medium known now or in the future. Examples of computer-usable or computer-readable mediums may include, but are not limited to, primary storage devices (e.g., any type of random access memory or read-only memory) , secondary storage devices (e.g., hard drives, floppy disks, CD ROMS, ZIP disks, tapes, magnetic storage devices, optical storage devices, MEMS, nanotechnological storage devices, etc.), and communication mediums (e.g., wired and wireless communications networks, local area networks, wide area networks, intranets, etc.).

The present disclosure provides an example embodiment for a method and system of extracting patent features for comparison to determine to determine similarities, novelty of the invention and non-obviousness relation between relevant art and the invention.

Primarily an invention is physically described as an invention document in a tangible medium, such as searchable data/document medium. The invention comprises several distinctive elements such as structure, compounds, steps, material and other significant features. Further a search for documents related to the subject matter of the invention is performed using computer programs or personal dedicated to complete the search based on the invention description. Different search methods may apply during the search process. After the search is completed the relevant art is selected. The relevant art comprises preferably searchable data, such as body, claims, description and background of the relevant art.

Once the selected relevant art, more particularly the searchable data and the invention description, more particularly the invention searchable data is stored in a computer program medium the method and system for extracting patent features, to determine similarities with relevant art, novelty of the invention and non-obviousness relation between relevant art and the invention is performed.

For example, FIG. 1 shows conceptual representations 3 of the similarities between the invention and relevant art, the novelty of the invention and obviousness may be represented by using squares. A first set of elements 2 from the invention are grouped in a first conceptual representation 1 represented by a first square. The first square represents similitudes between the relevant art and the invention, wherein said invention includes but is not limited to a device, method, process, composition and/or structure detailed or describe in a digital manner or by means of a searchable data. Further, pluralities of similar elements between said relevant art and said invention are extracted from said searchable data, wherein said process of extraction is defined through a set of instructions. The elements include but are not limit to a compound, structure, temperature, particular step, material and others significant element, wherein said significant element can be represented as a node element 2 of a graph in the conceptual representation plane. These node elements 2 are usually introduced into the conceptual representations 3 as elements that have a specific numbering that is usually shown in the relevant art drawings and in the narrative of the relevant art body description. During the parsing process, at least some different elements 4 result from the comparison between the relevant art and the invention. The different elements 4 are part of an invention 1 but lies in a different conceptual area 5 of knowledge that the relevant art is familiarized with.

FIG. 2 shows a graphical representation of what an inventor has to do in the narrative of a patent document which is to provide known relationships of the elements represented as node element 2. These node elements 2 joined together into a narrative by the use of words such as verbs, conjunctions, prepositions, etc., that can be represented by a relationship 6 which is shown in the graph structure as an edge. This relationship can be for example of an "is-a" relationship which provides a descriptive relationships between node elements 2. An alternate description of a relationship 6 can be of an "functions-as" relationship described by a verb or by other relevant relationship described by another part of searchable data. These descriptive and functional relationships can be analyzed in a combined graph structure or separate graph structure depending on the number of nodes elements 2 and relationships 6 in a relevant art document. A new conceptual representation 7 can be drawn from an element 2 to an element 4 which lies in a second conceptual representation area 5 which represents a new idea domain or novelty. The element 4 may also be linked to a second different element 9 through a relationship 8 which lies on the second conceptual representation area 5 but does not have any link to first conceptual representation 1. The second different element 9 can also be connected through a relationship 10 to a node 11 which would lie on conceptual representations 3 which would still be another distinct or more particularly a third conceptual representation area 12. The jumps from one conceptual representation to another, the edges that linked them with specific relations and functions, and the radius of the graph will determine the scope of the invention. As the relationships 7 move from the first conceptual representation 1, which is the scope of the relevant art, to a different or second conceptual representation area 5, which is outside the scope of the relevant art, this represents a new domain that can be considered new subject matter within that domain. As the distance from first conceptual representation 1 is increased by a relationship 8, element 9, relationship 10 and a node 11 the content of the second conceptual representation 5 and a third conceptual representation 12 becomes less obvious to someone skilled in the art. The number of nodes 4, 9 falling outside the previous art would represent the novelty while the distance made by relationships would represent the non-obviousness.

The graph structure represented by node elements 2 and relationship 6 on the first conceptual representation 1 have useful analytic properties such as frequency of occurrence in a document and can have a high "in degree" and "out-degree" of occurrence of the edges. This high frequency of occurrences can be a possible measure of centrality in a relevant art document and can help in clarifying classifications of the invention and patent documents. These can then be compared to terms in the claims to determine claim structure appropriateness. Other measures could include providing weights to the node elements 2 and relationship 6 represented by the edges of a graph into a combined scoring for the node elements 2 of the graph. The weight for the node elements and relationship is accomplished by different methods, such as probability programs.

FIG. 3 shows the actual system to perform the analysis of FIG.l and FIG. 2. The system is accessed through a computer terminal 13 comprising a computer-usable medium that sends a query to a processing unit 15 of the invention document. The processing unit 15 is connected through a communications channel 16 with the database 17. The database 17 contains a client database 18 that stores information about the individuals that submit information to the system. The information pertaining to each individual case is stored in the client cases database 20. A relation 19 links the information from the individuals stored in the client database 18 with their cases stored in the client cases database 20. The client cases 20 contains previous queries 21 made to the search index that were relevant to the case made on a search index 22. The search index provides rapid access to a searchable document, such as a relevant art text repository 24 through an indexing process 23. The database 17 contains an index 25 that provide a reference 26 to the lexicon and semantic repository 27 which is used in the part of speech tagging of relevant art text repository 24. A graph storage 29 contains extracted graphs that were processed using lexicon and semantic repository 27 on relevant art text repository 24. The relevant art text 25 contains references 29 to a repository of relevant art images 31. A graph repository 28 stores the graph information and also stores of the processed relevant art. The graph repository 28 also has stored references 30 to repository of relevant art/patent images 31.

FIG. 4 shows the initial flow of document submission to the processing unit 15 through terminal 13 of the proposed system in a step 32. The process first determines the sections of the invention document in a step 33. Step 33 also checks that the submitted document complies with standard section of an patent application document such as containing a background of invention section, claims section, and so on. The step 33 determines the background of invention section. The background of invention section is then passed to a step 34 that determines the paragraph boundaries in the background of invention section. Each of the paragraphs determined in step 34 are then passed to a step 35 that determines phrase and word boundaries within the paragraphs using the lexicon and semantic repository 27. The words and phrase identified on step 35 are used on a step 36 to match edges and nodes extracted from the previous or relevant art in categories which is then matched against the term frequencies of the submission. A final step of Fig 4 is a step 37 which assigns a tentative category to the submitted invention based on step 36. FIG. 5 shows the steps to get the node elements 2 from the description of preferred embodiment section. The process begins with a step 38 which is the document submission to the processing unit 15. The document to be processed in step 38 may be a new document submission or an existing patent in the relevant art text repository 24. A step 39 is designed to segregate the sections of the patent submitted in step 38 and then a step 40 selects from the sections identified in step 39 the description of preferred embodiment. A step 41 removes meta information such as xml, html or document processor information or other format description tags that are format dependent from the processed document of step 40. A step 42 removes numbers from the document processed in step 41 that make reference to figures within the description of preferred embodiment section. The step 42 is followed by step 43 that extracts the claims section from the description of preferred embodiment section. A step 44 continues the processing of step 43 by splitting each paragraph of the description of preferred embodiment section and the claims section.

The split paragraphs of step 44 are then processed by a step 45 that splits each paragraph into sentences and identifies the sentences that contains numbers in the sentences. These numbers represents the patent elements that correspond to elements in figures that are important to the narrative of the patent. The sentences identified with numbers in step 45 are then selected for further processing in step 46 that identifies the where the numbers are located within the sentence. The placement of numbers in step 46 then goes into a loop described by a step 47 and a step 48. Step 47 selects the word preceding the number and step 48 decides if the tag words are reached or the beginning of the sentence is reached. If step 48 decides that the tag word has not been reached or the beginning of the sentence then it redirects to step 47. In a typical embodiment the tag words can be words such as an, a, at, the, and said which mark the introduction of a new element. A step 49 pushes into a memory array of processing unit 15 the sentence segments that were identified in step 47 and 48. The process carried out in steps 45 through 49 are repeated until a step 50 determines that all sentences have been processed. Once all the selected numbered elements are in the array of step 49 a step 51 selects the least common denominator of each element that has the same number. This selected sentence fragment of step 51 will then be the node element 2 and the patent element that is also described in the figures and narrative. The selected least common denominator of step 51 will then be pushed into a memory array of processing unit 15 by a step 52.

FIG. 6 shows node elements 2 in its structural representation as a storage format 53 that results from step 52. Storage format 53 can remain in memory of processing unit 15 or be stored in graph repository 28. The storage format 53 contains a node reference entry identifier 54 that identifies the element as a node and a referring number 55. A separating element 56 segregates the node reference entry identifier 54 from a node identifier 57. The node identifier 57 in turn is also separated by separating element 56 from the reference id 58. Reference id 58 links storage format 53 of graph repository 28 with repository of patent images 31. A storage format 59 has the same elements as storage format 53 and in addition contains a section separating element 60 that separate the elements of format 53 with different section 61 the section 61 contains the same elements of format 53. The storage format 59 is stored in graph repository 28 as distinct entries 62. The specific entry identifier 54 are described by a list of entries 67. List 67 describes each entry 54 by an individual list entry 63. List entry 63 has a key identifier 64. Entry 64 has different identifiers which in a typical embodiment can mean "c" for conjunction "p" for preposition and so on for each part of speech and "n" for a node element 2.

FIG. 7 shows the process of determining edge elements. A step 68 selects the array of nodes elements of step 52. The step 68 is followed by a step 69 that matches node elements 2 with the sentences in which they occur. The matching sentences of step 69 are passed to a step 70 that identifies coordinating conjunctions within the sentences. The placement of coordinating conjunctions allows for set operations to be carried out on either nodes or edges. The step 70 is followed by a step 71 that matches correlative conjunctions on the sentences o step 69. The use of correlative conjunctions allows for additional set operations and conditionals to be applied to nodes and edges. A step 72 follows step 71 where the objective of step 72 is to match prepositional phrase elements within the sentences of step 69. The prepositional phrases allows for the identification of spatial and temporal relationship and placement of structural characteristics of the elements. Step 72 is complemented by step 73 that matches the remaining prepositions of the sentences of step 69. Step 73 gives way to a step 74 that matches the remaining parts of speech such as nouns and adjectives, articles, etc. The remaining words that are not matched are then flagged for manual intervention. The elements identified in steps 70 through 75 are then subjected to a step 76 that does a parse tree candidates of the sentence syntax structure. The parse trees identified in step 76 are then analyzed in a step 77 to identify the parse tree that represents the most probable sentence syntactic structure. The sentence structure of step 77 will provide prepositions, verbs and adjectives which can then be used in a step 78 to determine the edges that will be used to link the node elements 2. A step 79 will store the edges of step 78 in graph repository 28. Alternate embodiments to edge assignment include scoring elements, special tags to sort important from trivial elements among others.

FIG. 8 shows the normalization of word elements of nodes and edges for proper comparison between invention documents and patent which is one of the relevant art documents. The process starts with a step 80 that selects of tagged elements consisting of nodes elements of step 52 and edge elements of step 78. The step 80 is followed by a step 81 that initializes the variables x that will keep track of the number of nodes and edges and a variable y that will keep track of the number of words in each node and edge. The step 82 will compare if the current element is the end of the array of nodes and edges of step 80. If the total x is less than the total number of nodes and edges the step 82 will give way to a step 83 which will take the current element and split the node or edge into individual words. Each of the individual words of step 83 are then assigned a value and then taken each one step at a time controlled by a condition 84 which will determine if all the elements have been analyzed. If all the elements have been analyzed the condition 84 will move the routine to step 82. While all the elements have not been analyzed the step 84 will pass each individual word to a step 84 which will compare the word against category entry in the lexicon and semantic repository 27 to see if there is a synonym entry for that word in that specific category. If the word is in the category step 85 will proceed to a step 86 which will change the word to the synonym entry for that category and then redirect the flow to step 84. If there is no synonym word the step 85 will give way to a step 87 that keeps the word and then redirects the flow of the process to step 84. If the total x is greater than the total number of nodes and edges the step 82 will move to step 88 that returns the array of words. The array of words in step 89 are then compared in a step 89 with words selected from other patents in the same category and in a step 90 depending on the frequency are assigned a corresponding weight in the category of form or function. Step 90 also stores the results in the array. The steps carried out in Fig 8 also represent a subprocess carried out in step 36 and step 37.

FIG. 9 shows additional discrimination in remaining additional text. The Step 91 selects the array of node elements 2 from step 52 and edges from step 79. The nodes from step 91 are then used to discriminate the sentences that do not include nodes in the sentence body and therefore have not been processed. The sentences of step 92 are then passed to a step 93 which extracts individual words from the nodes for matching against the text of the selected sentences of step 92. The step 93 is followed by a step 94 which will then try synonyms of node elements against the words of selected sentences of step 92. The step 94 will follow with a step 95 which then uses the words from sentences of step 92 and compare them against selected edges from step 79 for a match. The step 95 will give way to a step 96 which will try to match edges synonyms against the words of selected sentences of step 92. The extracted matches of prepositions or other meaningful text that have been identified in the relationships extracted in steps 93 to 96 that are not in the existing arrays will then be added as nodes to the array of node elements 2 or edge array of step 79.

FIG. 10 shows the process of claim analysis the detailed description section. The step 99 selects the array of node elements 2 and edges from step 79. The step 99 is followed by step 100 that selects the claims section from the invention document submission or patent document narrative. The selected claims section from step 100 is then processed in step 101 to match phrases or words in the claims narrative. The matched phrases or words of step 101 are then used in step 102 alongside array of node elements 2 to determine an exact match or synonymy between them. The step 102 is followed by step 103 that does exact matching of edges with words in claims section from step 100. The step 103 is followed by a step 104 that does synonym matching of edges with words in claims section from step 100. If claim elements are not matched by steps 101 through 104 then a step 105 isolates the claim elements for manual intervention. Extracted elements from steps 101 and steps 102 are then stored into claim node array in step 106. Extracted elements from steps 103 and steps 104 are then stored into edge array in step 107.

FIG. 11 shows the processing to determine disconnected nodes, disconnected claims, novelty and non obviousness of document against existing prior art. Step 108 selects the array of node elements 2 and edges from step 79 from the processed invention document submission. The step 108 is followed by a step 109 that determines by the edges from step 79 if there are any numbers of nodes that are disconnected and raises the flag for manual intervention. The disconnected node flag signifies that the invention document narrative is incomplete since all nodes must be connected to another node through either form or functional description. The step 109 is followed by step 110 that extracts the chosen samples of existing patents for comparison and makes an array of nodes 2 and an array of edges from step 79. Step 110 is followed by a step 111 where the corresponding array of nodes and array of edges for both the invention document submission and the existing patent documents are then compared to determine the nodes and edges from the document submission that are not part of the existing patent documents. Step 111 is followed by a step 112 that determines the total number of nodes and edges in the invention document submission. Step 113 takes the number of nodes and edges not in existing patents determined in step 111 and divides the number by the number of nodes and edges determined in step 112 to determine the novelty of the document submission. Step 113 is followed by a step 114 that calculates the distance of the nodes from the document submission that are not in the existing patents. The nodes and edges identified in step 114 are then assigned a weight in step 115 based on the distance assigned in step 114. Step 115 is followed by a step 116 that takes the nodes and edges identified in step 114 and then assigns a weight based on importance of the words. The importance score assigned in step 116 is based on a stored weight in lexicon and semantic repository 27. Step 117 derives a composite score from step 116 and 115. Step 118 takes the composite score from step 118 and divides it by the total number of nodes and edges in patent category to obtain a novelty measurement score. Step 118 is followed by a step 119 that displays the graphical representation of the analysis and the results of the computation.

FIG. 12 shows the graphical interface of step 119. The interface is composed of a graphical interface 120 that displays the information to the user. Graphical interface 120 presents the user with the option 121 of analysis of function, form or both. The option will control underlying presentation of the processing to just show edges and nodes connected by verbs and other elements that constitute function or adjectives and other parts of speech that represent form or both. The graphical interface 120 will display the graphical representation of the nodes and edges in a graph 122. The graphical interface 120 will have a statistics section 123 that will display relevant information of the invention document submission with regards to the relevant art. The graphical interface 120 will have a section 124 that will link the nodes and edges of graph 122 with the invention document submission fragments where the nodes and edges appear.

FIG. 13 through FIG. 16 are directed to a system were the user submits the image segment which is relevant. The system then converts the image into an appropriate encoding that can be submitted as a query. The query is then used to match the descriptors of the image segment with those figures or images related to those of a submitted patent. The matched figure of the patent has an associated number which will be matched to a patent element which will then be used to extract the nodes of the patent. Those nodes will be characterized by functional as well as structural descriptions that are located in the narrative of the patent. The system will then use the relevant segments extracted from the patent to construct a narrative that describes the functionality and structure of the element of the image that is described by a patent. For example FIG. 1 shows an image grabbing device 201 that captures images, photos or other pictorial representations. The image grabbing device 201 is used to capture a physical object 202. The physical object 202 is converted into an image representation 203 by image grabbing device 201. The image representation 203 contains the object of interest 204. The object of interest 204 is selected by the user of the system by an outline 205 that can be drawn on the image grabbing device 201. The image grabbing device 201 makes a transmission to a processing unit 206. The processing unit submits a query 207 of the image descriptor to a database 208. The database 208 stores a graph 209 of patent elements. The graph 209 is extracted from a patent document 210 which also contains a patent figure 211 that has been processed into descriptors by processing unit 206 and stored into database 208. The processing unit 206 receives the query 207 by a result set 212 that matches the descriptor extracted from image 203 as outline 205 with descriptors extracted from patent figure 211. The processing unit 206 returns the results as a narrative in a message 213 to device 201. The processing unit 206 then displays the narrative in a structured format narrative 214. FIG. 2 the step of converting both the object of interest 204 into a description to be submitted to processing unit 206 and the representation of patent figure 211. The process starts with a step 215 which represents the selection of the outline of the object from the picture. The step 215 is then followed by a step 216 that normalizes the image to a standard size resolution. The step 217 takes the normalized image from step 216 and applies a threshold to clean the image from noise. After step 217 then step 218 extracts the edges through an algorithm such as Sobel. Step 219 is a condition where a flag is evaluated whether the request is a query or a submission to expand the database. If the decision of step 219 is a query then the edges of step 218 are passed to step 220 where a descriptor such as compactness, chain codes, Fourier descriptors, or other algorithm is utilized to extract robust features for matching. Step 221 tries to match the descriptors of step 220 that belong to object of interest 204 and matches it against a basic representation of patent figure 211. The results of the matches of step 221 are in a typical embodiment passed through a second refined search 222 that may include cross correlation or dynamic time warping or other algorithm to fit the candidate further with 3 dimensional representations cutouts or exploded views in a patent image. An alternate embodiment can consist of a second refined 222 search consisting of matching color and texture of the object of interest 204 against a picture of patent element if it is available. After step 222 is completed then the pictures elements are matched against an interpretation tree were the matched features are matched against patent elements within the figure if it contains multiple patent elements. For example a patent image may contain ten patent elements referenced by the numbers in the figure. Such numbers are in an interpretation tree that based on the matched sub elements are then cross referenced to the interpretation tree to extract the element numbers that are relevant to the partial or complete match.

The step 219 condition can also be a direct submission to the database to further expand the images associated with a patent with either two dimensional representations or three dimensional representations. Step 219 in submission mode goes to a step 224 that is a decision based on the two dimensional or three dimensional representation of the submitted image. If step 224 is answered as a two dimensional representation it will go to a step 225 where the interface will let you select a particular spot on object of interest 204 that can then be tagged as being an element number of the relevant patent selected. The step 225 will follow with a step 226 that will integrate the marked spot of step 225 into the interpretation tree. The step 226 will be followed by a step 227 that will extract the relevant descriptors from the image and integrate them into the search database of descriptors in database 208.

If step 224 is a answered as a three dimensional representation submission then a step 228 will provide image rectification of the image scene and match it to a sequence of submitted images. The step 228 i followed by a three dimensional reconstruction in step 229 which can be a reconstruction up to a projective transformation. Step 229 will provide a projective reconstruction from which points in three dimensional spaces can be computed and stored in data base 208. The Step 229 is followed by a step 230 that will integrate the marked spot of an object of interest 204 from a submitted picture that will be associated with a patent element number and the projective reconstruction points that will be stored into database 208. The patent elements that have been tagged in step 230 are then stored into an interpretation tree in a step 231. The marked spot of an object of interest 204 that are mapped into projective reconstruction points are then processed to extract descriptors such as Fourier descriptors, chain codes or other relevant descriptor in a step 232. The descriptor information of step 232 will then be stored with all the information of the previous steps in a step 233.

FIG 3. shows the process of matching the submitted query with the stored images and extracting the structural as well as functional elements and order it into a narrative for the user. The process of figure 3 starts wit a step 234 that selects the matched elements of step 223 of the interpretation tree. After the elements are selected in step 234 a decision step 235 follows where the process checks whether the matched elements belong to an added image to a patent or if it is one of the patent figures. If the matched drawing is a not a patent figure a step 236 is taken to match the external two dimensional or three dimensional image information to the information stored of the relevant patent figure and element numbers of the patent. The step that follows step 235 and step 236 is step 237 where the identified patent elements are matched to the graph elements of the patent. The step 237 gives way to a step 238 where the relevant nodes and edges of the graph will point to the structural as well as functional elements of the patent. These nodes and edges from step 238 will also have markers to the original patent sentences from which they were extracted in step 239. The matched content belongs to a patent that will also contain background information such as the background of the invention from which relevant background can be provided to the user and is extracted in a step 240 that follows step 239. the extracted material from step 240 will then be organized for presentation in a step 241.

FIG 4. show the process of presenting the ordered information of step 241. The process of presenting the ordered information is through a step 244 that checks if the information is to be presented textually or in audio format. The affirmative action of step 244 as textual information is followed by a step 245 that present first the background of the identified invention in the patent document. The step 245 is followed by a step 246 which displays the structural information based on the patent graph. The step 246 is followed by a step 247 that displays the functional information of the identified patent graph.

The negative action of step 244 to present textual information gives way to audio format by moving to a step 248 that narrates the background of the identified invention in the patent document. The step 248 is followed by a step 249 which narrates the structural information based on the patent graph. The step 249 is followed by a step 250 that narrates the functional information of the identified patent graph.

The presentation of the identified patent of steps 247 and step 250 give way to a feedback step 251 where the user is presented with a feedback queue to determine if the result was helpful. If the answer to the feedback queue of step 251 is negative the step 252 will then move to display less relevant matches. The step 252 gives way to a step 253 where the region searched is modified or different weights in the algorithms or descriptors are modified to get different results and if necessary further displays of the narrative are made in a step 254.

The invention is not limited to the precise configuration described above. While the invention has been described as having a preferred design, it is understood that many changes, modifications, variations and other uses and applications of the subject invention will, however, become apparent to those skilled in the art without materially departing from the novel teachings and advantages of this invention after considering this specification together with the accompanying drawings. Accordingly, all such changes, modifications, variations and other uses and applications which do not depart from the spirit and scope of the invention are deemed to be covered by this invention as defined in the following claims and their legal equivalents. In the claims, means- plus-function clauses, if any, are intended to cover the structures described herein as performing the recited function and not only structural equivalents but also equivalent structures.

All of the patents, patent applications, and publications recited herein, and in the Declaration attached hereto, if any, are hereby incorporated by reference as if set forth in their entirety herein. All, or substantially all, the components disclosed in such patents may be used in the embodiments of the present invention, as well as equivalents thereof. The details in the patents, patent applications, and publications incorporated by reference herein may be considered to be incorporable at applicant's option, into the claims during prosecution as further limitations in the claims to patentable distinguish any amended claims from any applied prior art.

Claims

What is claimed is:

1. A computer representations of structural and functional elements in patent documents represented by the representation of a graph composed of nodes and links through a method comprising of:

a searchable invention document comprising a body, wherein said body comprises a body narrative;

selecting a searchable document, wherein said document comprises at least a claim, wherein said claim comprises a claim preamble and a claim narrative;

a first set of instruction for parsing of the body, wherein said first set of instructions comprises the extraction of a plurality of element from said body narrative, wherein each element of said plurality element is identified as node for said body; a second set of instruction for parsing of the body, wherein said second set of instructions extracts the links between the plurality of elements; a third set of instruction for parsing the claim narrative to obtain the preamble from the claim narrative using match phrases;

a fourth set of instruction for parsing the claim narrative to obtain claim nodes using match phrases; and a fifth set of instruction for classifying each of the claims nodes and nodes in a first group and a second group; and wherein said group is structural elements and the second group is functional elements.

2. The method according to claim 1, comprising: wherein the structural element is classify by semantic relationships.

3. A computer representations of structural and functional elements in patent documents represented by the representation of a graph composed of nodes and links through a method comprising of: a parsing of the body that extracts numbered elements that are identified as nodes of the narrative of the preferred embodiment;

a parsing of the body that extracts the links between the numbered elements of the narrative of the preferred embodiment; a parser of the claims to obtain the preamble from the body of the

claim narrative using match phrases or words in the claims narrative;

a parser of the claims to obtain the nodes of the narrative of the claims using match phrases or words in the claims narrative;

performing the process of determining whether it is a structural element by the use of edges that correspond to parts of speech that describe

placement of the nodes within the described invention;

performing the process of determining whether it is a functional element by the use of edges that correspond to parts of speech that describe the

functioning of the nodes within the described invention;

4. The method of generating the structural and functional elements in patent documents according to claim 3, further comprising: semantic relationships by exploiting the semi structured elements of the patent document to determine structural elements by parsing the body of the patent by searching descriptive elements within the narrative to form the nodes of the graph that are stored in an adjacency matrix or adjacency lists.

5. The method of generating the structural and functional elements in patent documents according to claim 3, further comprising: semantic relationships by exploiting the semi structured elements of the patent document to determine structural elements by parsing the body of the patent by searching descriptive elements within the narrative to form the edges of the graph that are stored in an adjacency matrix or adjacency lists.

6. The method of generating the structural and functional elements in patent documents according to claim 3, further comprising: of parsing the document using parsing techniques to determine standard sections of the patent header information background of the invention, brief description of drawings, detailed description, and a claims section as well as non standard sections of the document using previous patent patterns encoded into the parsing of the patent document

7. The method for the generation of the graphs according to claims 4, further comprising: parsing the sections of the patent document to determine the boundaries of the paragraph, phrases and individual words by exploiting structural elements in the form of keywords and format of the patent

8. The method for the generation of the graphs according to claims 4, further comprising: part of speech tagging will help in facilitating subsequent stages of the process. The elimination of stop words consist of doing syntax and semantic analysis of the content of the sentences.

9. The method for the generation of the graphs according to claim 4, further comprising:

The patent element type described by a node is identified by node ID that uniquely determines that node belongs to a particular part of speech tag using a database entry

10. The method for the generation of the graphs according to claim 5, further comprising:

The relationship type described by edge is identified by link type ID that uniquely determines that edge belongs to a particular part of speech tag using a database entry

11. The method for the generation of the graphs according to claims 5, further comprising: the node and edge description that forms a semantic network description of

relationship

12. The method for the generation of the graphs according to claim 10, further comprising: an edge weight score that can be used to describe the strength of the relationship in a semantic network construction for the patent document.

13. The method for the generation of the graphs according to claim 9, further comprising:

The adjacency matrix that has nodes as rows and column labels and edges as entries into the adjacency matrix which can be mapped to an adjacency list.

14. The method for the generation of the graphs according to claim 13, further comprising: the steps of the process to determine the novelty of a patent submission against a selection of prior art by constructing a graph for the prior art as well as the patent submission

15. The method for the generation of the graphs according to claim 14, further comprising:

The graph of the set of patents closest to the submitted application used as prior art based on a frequency count of common elements that form the columns and rows of the adjacency matrix.

16. The method for the generation of the graphs according to claim 14, further comprising: based solely on the connection type frequency in the adjacency matrix.

17. The method for the generation of the graphs according to claim 14, further comprising: the selection of the highest link weight score on the adjacency matrix

18. The method for the generation of the graphs according to claim 14, further comprising: mixture of the previous embodiments with other relevant measures of commonality such as centrality, path, connectedness, other structural graph description measure.

19. The method for the generation of the graphs according to claim 4, further comprising: a method of constructing a sub graph of claims for the selected set of patents and the patent submission that determines the difference in elements of the selected patents vs the patent application.

20. The method for the generation of the graphs according to claim 19, further comprising: analysis of the differences between the sub graphs of the selected patents and the patent applications to determine structural and functional form differences between them using node matching, node reachability, node reachability score , path length description ,path length weight, path length characteristics, connectedness of the sub graphs or complete graphs

21. A method for querying patent information based on image interface comprising: a data base, wherein said data base comprises at least a first graph comprising of elements and a first group of image descriptors;

a first image grabbing device, wherein said image grabbing device is a computer device that captures a pictorial representation;

wherein said image grabbing device transmit said pictorial representation to a processing unit;

wherein the processing unit generates a first image descriptor of the pictorial representation;

wherein the processing unit submits a query of the first image descriptor to said database; wherein said processing unit matches the first image descriptor with a descriptor extracted from image; and

wherein said processing unit transmits the results as a narrative in a message to the image grabbing device.