WO2013043146A1 - Collection de documents de brevet électronique en plusieurs langues interrogeables et techniques de recherche dans celle-ci - Google Patents

Collection de documents de brevet électronique en plusieurs langues interrogeables et techniques de recherche dans celle-ci Download PDF

Info

Publication number
WO2013043146A1
WO2013043146A1 PCT/US2011/052090 US2011052090W WO2013043146A1 WO 2013043146 A1 WO2013043146 A1 WO 2013043146A1 US 2011052090 W US2011052090 W US 2011052090W WO 2013043146 A1 WO2013043146 A1 WO 2013043146A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
documents
language
target language
vectors
Prior art date
Application number
PCT/US2011/052090
Other languages
English (en)
Inventor
Jason David Resnick
Randy W LACASSE
Original Assignee
Cpa Global Patent Research Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cpa Global Patent Research Limited filed Critical Cpa Global Patent Research Limited
Priority to PCT/US2011/052090 priority Critical patent/WO2013043146A1/fr
Publication of WO2013043146A1 publication Critical patent/WO2013043146A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3337Translation of the query language, e.g. Chinese to English

Definitions

  • This generally relates to methods, systems, and mediums for creating a searchable electronic patent document collection including documents in multiple languages, and methods, systems, and mediums for searching such a collection.
  • Document vectors can be created for the documents in the collection and translated into various languages. Document vectors also can be created in a predetermined base language.
  • a search query in a single language can be used to search all documents in the collection with document vectors in the search query's language.
  • a search query can be entered in a base language to search all documents in the collection with document vectors in the base language.
  • the search query can also be translated into the base language or one or more target languages to facilitate searching. Additional searches can be performed based on document vectors of relevant documents (identified during the initial search) to thus expand the scope of a search.
  • An example embodiment is directed to a method of creating a searchable electronic patent document collection.
  • the method can involve storing a collection of electronic patent documents on a memory.
  • the collection can include documents in multiple languages, including a plurality of documents in a first language and a plurality of documents in a second language.
  • a static document vector can be created in the first language for each document.
  • a static document vector can be created in the second language and a static document vector can be created in the first language for each document.
  • the static document vectors can be stored in association with their respective documents.
  • the static document vector in the first language for each document in the second language can be created by translating each document's static document vector in the second language into the first language. Such a translation can be performed using a microprocessor. The translation also can be verified. [0009] In addition, a static document vector in the second language can be created for each document in the first language.
  • a system for creating a searchable electronic patent document collection can be configured to store a collection of electronic patent documents.
  • the collection of documents can include a first plurality of documents in a first language and a second plurality of documents in a second language.
  • the system can be further configured to create a static document vector in the first language for each document of the first plurality of documents and create a static document vector in the second language and a static document vector in the first language for each document of the second plurality of documents.
  • the system can be configured to store the static document vectors in association with their respective documents.
  • a computer-readable medium can store instructions to be executed by a computer, causing the computer to create a searchable electronic patent document collection.
  • the stored instructions may include storing a collection of electronic patent documents.
  • the collection can include a first plurality of documents in a first language and a second plurality of documents in a second language.
  • the instructions may further include creating a static document vector in the first language for each document of the first plurality of documents, creating a static document vector in the second language and a static document vector in the first language for each document of the second plurality of documents, and storing the static document vectors in association with their respective document.
  • Another embodiment is directed to a method of searching a collection of electronic patent documents.
  • An input or received search query can contain one or more terms in a first language.
  • a dynamic document vector can be created in the first language using the one or more terms.
  • Another dynamic document vector can be created in a second language by translating the dynamic document vector in the first language.
  • the dynamic document vector in the second language can be compared with static document vectors in the second language of documents from the collection. A relevant document can be identified based on the comparison.
  • the translation of the dynamic document vector in the first language can be performed using a microprocessor.
  • the search query can be input via a user computer and the relevant document can be returned as a search result to the user computer.
  • the one or more terms of the search query can comprise words, groups of words, values, and/or symbols.
  • a system for searching a collection of electronic patent documents can be configured to receive a search query containing one or more terms in a first language.
  • the system can create a dynamic document vector in the first language using the one or more terms and create a dynamic document vector in a second language by translating the dynamic document vector in the first language.
  • the system can then compare the dynamic document vector in the second language with static document vectors in the second language of documents from the collection and can identify a relevant document based on the comparison.
  • a computer-readable medium can store instructions to be executed by a computer.
  • the instructions can enable the computer to receive a search query containing one or more terms in a first language.
  • the instructions can include creating a dynamic document vector in the first language using the one or more terms of the search query and creating a dynamic document vector in a second language by translating the dynamic document vector in the first language.
  • the instructions can further include comparing the dynamic document vector in the second language with static document vectors in the second language of documents from a collection of electronic patent documents and identifying a relevant document based on the comparison.
  • An exemplary embodiment is directed to a method for searching a collection of electronic patent documents stored on a memory.
  • the method can include comparing a dynamic document vector in a first language with static document vectors in the first language of documents from the collection. A first relevant document can be identified.
  • the method can further include accessing a static document vector in a second language associated with the first relevant document and comparing the vector's contents with static document vectors in the second language of documents from the collection. As a result, a second relevant document can be identified that is not associated with a static document vector in the first language.
  • the step of comparing the vector's contents with static document vectors in the second language of documents from the collection can be performed by creating a second dynamic document vector using the contents of the first relevant document's static document vector and comparing the second dynamic document vector with static document vectors in the second language of documents from the collection.
  • a system for searching a collection of electronic patent documents can be configured to compare a dynamic document vector in a first language with static document vectors in the first language of documents from the collection.
  • the system can identify a first relevant document and access a static document vector in a second language associated with the first relevant document.
  • the system can be further configured to compare contents of the first relevant document's static document vector in the second language with static document vectors in the second language of documents from the collection.
  • the system can be enabled to identify a second relevant document that is not associated with a static document vector in the first language.
  • FIG. 1 is a flow chart illustrating a process for creating a searchable electronic document collection.
  • FIGS. 2(a) and 2(b) are flow charts illustrating processes for creating static document vectors for documents in multiple languages.
  • FIG. 3 is a flow chart illustrating a process for searching a document collection.
  • FIG. 4 is a Venn diagram illustrating an exemplary distribution of documents in a collection.
  • FIG. 5 is a flow chart illustrating a process for searching a document collection.
  • FIG. 6 is a flow chart illustrating a process for searching a document collection.
  • FIG. 7 illustrates an interface for searching documents, in accordance with an embodiment.
  • FIG. 8 is a flowchart of a method for searching documents, in accordance with an embodiment.
  • FIG. 9 illustrates search results to be displayed on a result section, in accordance with an exemplary embodiment.
  • FIG. 10 is flowchart of a method for searching documents, in accordance with another embodiment.
  • FIG. 1 1 illustrates search results to be displayed on a result section, in accordance with another exemplary embodiment.
  • FIG. 12 is a flowchart of a method for searching documents, in accordance with another embodiment.
  • FIG. 13 is a flowchart of a method for searching documents, in accordance with another embodiment.
  • FIG. 14 illustrates search results to be displayed on a result section, in accordance with another exemplary embodiment.
  • FIG. 15 is a block diagram illustrating a system and/or tools for implementing an embodiment.
  • FIG. 16 is a block diagram illustrating a system and/or tools for implementing another embodiment.
  • relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
  • set or subset implies the inclusion of zero or more entities.
  • Document vectors can be created for the documents in the collection and translated into various languages. Document vectors also can be created in a predetermined base language.
  • a search query in a single language can be used to search all documents in the collection with document vectors in the search query's language.
  • a search query can be entered in a base language to search all documents in the collection with document vectors in the base language.
  • the search query can also be translated into the base language or one or more target languages to facilitate searching. Additional searches can be performed based on document vectors of relevant documents (identified during an initial search) to thus expand the scope of a search.
  • Document vectors can be employed for creating and searching an electronic patent document collection.
  • a document vector can be a set of (keyword, weight) pairs.
  • the keyword can be a word, group of words, or value associated with an underlying document.
  • the weight can be a numerical measure of how important the keyword is for the document.
  • a document vector can be thought of as a type of document signature that represents a document's content in a manner that facilitates comparison between documents.
  • a document can have one or more static document vectors, the vectors having varying degrees of scope.
  • a dynamic document vector can be created from a search query submitted for searching the patent document collection.
  • Such a document vector can be referred to as dynamic because it generally can be created at the time of receipt of a search query and discarded, changed, or updated after being used in a search.
  • a static document vector can be created by parsing an input document, choosing keywords that help define the document, computing weights for the various keywords, and storing the (keyword, weight) pairs in the document vector.
  • the choice of keywords and weighting of the words can be performed according to predetermined rules.
  • a dynamic document vector can be created from an input search query in a similar way, except that all of the terms in the search query may be entered into the document vector.
  • a dynamic document vector can be created representing the query.
  • the document vector can be a weighted list of words and phrases, such as:
  • the dynamic document vector can be compared with static document vectors that have been previously created for documents in a collection.
  • the comparison can include, for example, multiplying the weights of any common terms among the dynamic document vector and a given static document vector, and adding the results to obtain a similarity index for that static document vector.
  • Document vectors may also be parsed to exclude terms that are often frequently employed in patent documents and have minimal value for conducting a search. For example, for patent documents, terms such as 'respectively,' 'means,' and 'preferred' may be excluded from a document vector because such terms appear in a large number of patent documents and thus may have minimal value in locating a relevant document.
  • terms such as 'respectively,' 'means,' and 'preferred' may be excluded from a document vector because such terms appear in a large number of patent documents and thus may have minimal value in locating a relevant document.
  • different values may be placed on terms extracted from different sections of the patent document, depending on the type and scope of a search. These different values can be reflected both in the content of a document vector and in the weights associated with keywords in the document vector.
  • patents and published patent applications are generally divided into multiple sections. Each section of the patent document is generally required for a submission of a completed patent application, and each section of a patent has a purpose.
  • the majority of patent documents include a title, various filing and priority dates, citations, inventors, assignees, an abstract, a background description, a summary, a brief description of the drawing figures (if any), a detailed description, and claims.
  • search categories there are different search categories that can be preferably employed based on the purpose of the search.
  • an infringement and/or product clearance search may be concerned with the words in the claims, and therefore should be directed to the claims present in the document collection.
  • a validity and/or invalidity search may be concerned with any known prior art, and may include identification of the effective filing date of the patent document.
  • the inventor(s) or his agent or representative may commission a novelty search.
  • Such a search may de-emphasize the claims and focus on the detailed description of the invention. Accordingly, each search places emphasis on different sections of patent documents in the collection.
  • multiple static document vectors can be created for a single patent document, with each separate document vector pertaining to a single or combination of sections in the patent document.
  • the creation of multiple document vectors, with each vector identifying a specific section, enables a search of the document collection to be refined based upon a defined scope of search.
  • an infringement search would preferably be performed using a static document vector that was directed primarily to the claims section of the patent documents in the collection.
  • emphasis in the document vector on the definition and/or synonyms of claim terms could be useful as well.
  • static document vectors can be created in different languages for a patent document, as discussed in more detail below.
  • document vectors can be employed in a patent document collection to efficiently and effectively locate one or more documents relevant to a search query submitted to the collection.
  • An exemplary software platform for implementing a search system using document vectors according to the embodiments described below is the Microsoft® FAST Enterprise Search Platform.
  • FIG. 1 illustrates an embodiment for creating a searchable electronic patent document collection.
  • a collection of electronic patent documents can be compiled at step101.
  • the collection may include patent documents in multiple languages.
  • the collection could include patent documents issued by the patent offices of the United States, Europe, Japan and Korea.
  • the collection could include patent documents written in the English, French, German, Japanese, and Korean languages.
  • the collection can be indexed at step 102.
  • the process of indexing the collection may include converting a collection of data representing the document collection into a database suitable for search and retrieval.
  • a record can be created for each document in the collection.
  • the record can contain the data representing the document.
  • one or more static document vectors can be created at step103 for each document in the collection.
  • the static document vector(s) can be stored at step 104 in a field in the record in association with their respective documents.
  • Other information about the document can be extracted from the document or obtained elsewhere, and can be stored in other fields in the document's record. These fields can be used for various searching functions.
  • family members of an indexed patent document can be listed in the document's record.
  • the family members can include related applications or patents, such as continuing applications or patents issued therefrom, filed in the same country as the patent document.
  • the family members can include foreign counterpart applications or patents. The language of each foreign counterpart can be recorded in the document's record as well. This information can be accessed and used in various embodiments as described below.
  • the process of indexing and/or creating new static document vectors, or updating existing static document vectors can be performed at a later time to account for any changes that may have been made to the indexed documents.
  • the content of patent documents can change due to a certificate of correction or a reexamination proceeding.
  • documents can be added to the compilation as they become available, such as newly-issued patents. Indexing of each added document would preferably be performed at the time of addition. Automated re-indexing of the collection and addition of newly-available documents to the collection can be scheduled to occur at various intervals.
  • FIG. 2 (a) illustrates an embodiment in which the process of creating one or more static document vectors involves creating a static document vector in each of multiple languages for documents in the collection. Multiple languages can be selected at step 201 for creating document vectors. An original-language static document vector can be created at step 202 for each document in the collection. An
  • original-language static document vector is a static document vector containing keywords in the original language of the document.
  • a static document vector in each selected language can be created at step 203 for each document in the collection. This approach is beneficial as the various document vectors can support search queries in multiple languages.
  • Creating a non-original-language static document vector can involve translating the keywords in the original-language static document vector of each document.
  • the keywords in the document vector can be translated by various techniques, such as by human translation, machine translation, or machine translation verified by a human.
  • Machine translation can include a replacement technique in which terms that do not literally translate into another language can be appropriately replaced by the equivalent term in the target language based on a replacement table.
  • Machine translation could also include substituting synonyms for the keyword to be translated or for the translated keyword to achieve an intended meaning.
  • Human translations can be of high quality, but also can be prohibitively expensive and time consuming. Human translations also can be difficult to perform on the fly because a human translator must be accessible to perform the translation. Machine translations can be performed quickly and cheaply, but can be of sub-ideal quality. Machine translations can be verified for accuracy by a human, and a balance can be achieved between time- and cost-effectiveness on one hand and accuracy on the other.
  • FIG. 2(b) illustrates an embodiment in which the process of creating one or more static document vectors for a document involves creating an original-language static document vector and a base-language static document vector.
  • An original-language static document vector is a document vector containing keywords in the language of the document.
  • a base-language static document vector is a static document vector containing keywords in a base language.
  • a base language can be selected at step 251 .
  • An original-language static document vector can be created at step 252 for each document in the collection. It can be determined at step 253 whether an original-language static document vector of each document is in the base language. For each document whose original-language document vector is not in the base language, a base-language static document vector can be created at step 254.
  • an English document may have only an English static document vector.
  • a Japanese document may have both a Japanese static document vector and an English static document vector.
  • a search query in the base language may be used to conduct a search including documents in multiple languages.
  • a translation burden can be reduced because multiple translations of document vectors can be avoided.
  • translation can be completely avoided.
  • an original-language static document vector for a number of documents in the collection. For instance, it may be preferable to only create an original-language static document vector for patent documents from an obscure country. It may also be preferable to create static document vectors in one or more non-original languages for only a certain number or proportion of documents in a certain language. For example, perhaps only half of all Korean patent documents in the collection will have English static document vectors. A technique for searching such a collection will be discussed later with reference to Fig. 4.
  • a determination regarding for which documents to create document vectors in multiple languages could be made based on a number of criteria. For instance, an underlying technical field of the patent document or the frequency of citation of the patent document may be considered. As another option, such a determination could be used to decide what type of translation technique to use when translating a document vector into another language.
  • a vector creation platform may exist for each language of documents in the collection. Because words, grammar, syntax, style, conventions, etc., vary greatly from language to language and country to country, the various platforms may implement different search algorithms for identifying relevant terms for the document vectors.
  • a vector creation platform may not exist for all languages.
  • the documents can be translated into a language for which a vector creation platform exists.
  • the translated document can then be processed via the platform corresponding to the translation language.
  • the translation-language static document vector can then be stored with the document and, if need be, an original-language static document vector also can be created from the translation-language static document vector.
  • a counterpart document in a language for which a vector creation platform exists may be identified.
  • a static document vector can then be created in the counterpart language and stored with the original document.
  • an original-language static document vector can be created from the counterpart-language static document vector if desired, for example, by translating the counterpart-language static document vector.
  • FIG. 3 illustrates an embodiment for searching a collection, whether the collection is created according to the above techniques or otherwise.
  • a search query can be received at step 301.
  • the search query can contain one or more terms, the terms comprising words, groups of words, or values.
  • the query can be input from a user computer that can be directly connected to or remote from the database or system.
  • a user computer can be connected to the database via the Internet. In either case, the user computer could submit the search query via a web browser or via another application.
  • a dynamic document vector can be created at step 302 using the one or more terms of the search query.
  • the query in the form of the dynamic document vector can be submitted to the document collection, where the dynamic document vector can be compared at step 303 to the appropriate static document vector of each document in the collection, or alternatively, of each document that has been specified for searching.
  • the static document vector of each document in the collection can be in a base language.
  • the base language can thus be used to search the entire collection. If the entered search query is in the base language as well, then a dynamic document vector can be created directly from the search query and can be used to perform the search. If the search query is not in the base language, a dynamic document vector can be created in the language of the search query.
  • a dynamic document vector in the base language can then be created by translating the query-language dynamic document vector. Translation can be accomplished by various translation methods, as discussed previously. The base-language dynamic document vector can then be compared to base-language static document vectors of documents in the collection.
  • a user can specify which documents to search in the collection.
  • a dynamic document vector representing the search query can be created in the language of the search query. If documents are specified for searching which do not have static document vectors in the search query language, the query-language dynamic document vector can be translated to create multiple dynamic document vectors in appropriate languages for searching the specified documents. Alternatively, the search query can be translated before creating a first dynamic document vector.
  • the dynamic document vectors in the various languages can then be compared to static document vectors in the various languages of documents in the collection.
  • a static document vector in a single language is selected for comparison.
  • a static document vector in a predetermined language such as the language of the document, may be compared.
  • One or more relevant documents can be identified at step 304.
  • a result set containing the relevant documents can be returned at step 305, for example, to the user computer.
  • the result set can be empty. In such a case, a message could be delivered to the user computer suggesting ways to change or broaden the search query to achieve better results.
  • an abstract of the document in a language preferred by the user may be available, and such can be identified and/or provided to the user.
  • available counterparts can be identified and/or provided to the user. Such counterparts include patent publications and patents based on applications for the same invention, continuing applications, and the like.
  • further search capabilities can be provided.
  • the contents of a static document vector of one of those documents may be used to search the collection.
  • This technique can be useful to further aid in searching documents in multiple languages.
  • a collection of documents 400 includes a set of documents in multiple languages.
  • the language of an input search query will be deemed a search language for purposes of this embodiment.
  • a subset 401 of documents in the collection contains documents in multiple languages each having search-language static document vectors.
  • Another subset 402 contains documents of a particular language, which is not the search language, each having an original-language static document vector.
  • a subset 403, which overlaps subsets 401 and 402 contains documents of the particular language having both original-language static document vectors and search-language static document vectors.
  • a search can be performed where a received search query is in the search language at step 501.
  • a search-language dynamic document vector can be created at step 502 from the search query and can be compared at step 503 with the static document vectors in subset 401 , that is, all documents with search-language static document vectors.
  • a document 404 which is a document whose original language is not the search language, can be identified at step 504 as a relevant document.
  • a search of subset 402 can now be performed 404.
  • a document 405, which does not have a search-language static document vector, can be identified at step 506 as relevant.
  • documents that do not have a search-language static document vector are able to be searched even though they are not directly searchable via the search-language dynamic document vector.
  • Such search capabilities can be especially useful when a specific country having documents without search-language static document vectors is well-known for innovation in a particular field.
  • the search-language dynamic document vector does not necessarily have to have been created directly from a search query entered in the same language.
  • the search-language dynamic document vector could have been translated from another dynamic document vector.
  • an input search query could have been translated into another language, and the search-language dynamic document vector could have been created based on the translated search query.
  • Another variation of an extended search can involve extracting the keywords from a static document vector of an identified relevant document and appending them to the original one or more terms of the search query.
  • a new dynamic document vector can be created and a new search can be performed based on the new dynamic document vector.
  • the extracted keywords would preferably be in the same language as the one or more search terms, though translation could also be performed.
  • the static document vector could be copied to make a new dynamic document vector and extended to include the one or more search terms.
  • the weights associated with the keywords in the static document vector also can be used to appropriately weight the dynamic document vector in each embodiment.
  • family members of a patent document can be leveraged to provide search results including documents in a target language, as depicted in FIG. 6 for example.
  • a dynamic document vector in a search language can be compared at step 601 with search-language static document vectors of documents in a collection.
  • a relevant document can be identified at step 602 based on the comparison.
  • the relevant document can be displayed on a graphical user interface.
  • a user can select the relevant document for further searching.
  • the user can request a search for documents similar to the relevant document. This can be called a "Find Similar" search, as opposed to a "Query Search," for example.
  • a target language can be selected at step 603.
  • the user can select the target language using the graphical user interface.
  • a drop-down menu in the graphical user interface can list a number of potential target languages and the user can specify the desired language.
  • a search-language static document vector of the relevant document can be compared at step 604 with search-language static document vectors of documents in the collection.
  • a second relevant document can be identified at step 605 based on the comparison. It can be determined at step 606 whether the second relevant document is associated with a counterpart document in the target language. In an embodiment, this can be determined by consulting the record of the second relevant document, as discussed above, which can list family members of the document. If determined that the second relevant document is associated with a counterpart document in the target language, the counterpart document can be identified at step 607. The identified counterpart document can be displayed to the user via the graphical user interface.
  • a plurality of relevant documents can be identified in at step 605.
  • the identified relevant documents can be ranked in order of relevancy based on their similarity ranking.
  • it can be determined whether each identified relevant document is associated with a respective counterpart document in the target language. Where there is such an association, the counterpart document can be identified.
  • the identified counterpart documents can be displayed via the graphical user interface in order of relevancy of their corresponding identified relevant document.
  • the counterpart document for the second ranked relevant document would be ranked first and the counterpart document for the fourth ranked relevant document would be ranked second.
  • a further search can be performed based on an identified counterpart document of the second relevant document.
  • a target-language static document vector of the counterpart document can be compared with target-language static document vectors of documents in the collection.
  • One or more additional relevant documents can be identified based on the comparison. It can be determined whether each of these documents are in the target language and the ones that are can be displayed to the user via the graphical user interface along with or instead of the second counterpart document.
  • the ranking of the documents can be similar to as described above. In an embodiment, however, the similarity ranking of the identified counterpart document can be increased by multiplying it by a factor or adding a value to it. Alternatively, the identified counterpart document can by default be ranked first.
  • the search-language static document vector of the second relevant document can be translated into a target-language static document vector.
  • Various translation methods may be employed, as discussed previously.
  • the translated target-language static document vector can be compared to target-language static document vectors of documents in the collection. Additional relevant documents can be identified based on the comparison and displayed on the graphical user interface.
  • the search-language static document vector of the second relevant document can be immediately translated into a target-language static document vector and a search performed.
  • the results of the additional searches can become excessive if the second relevant document is actually a plurality of relevant documents.
  • a limit can be set as to how many result documents will be included in the combined result set. For example, if the plurality of documents includes 100 documents and the limit is set to 10 documents, the results of the further searches will yield no more than 1000 documents, which of course can be ranked according to relevancy.
  • FIG. 7 illustrates an interface 700 for searching documents, in accordance with an embodiment.
  • Interface 700 is used to search similar documents for an input document in multiple languages.
  • the input document may be a legal or a non-legal document.
  • the legal document may be a patent document and a non-legal document may be a technology oriented document. For example, a white paper or a technical journal.
  • An input element 702 of interface 700 is used to enter data associated with the input document.
  • the data associated with the input document may include, but is not limited to, a patent number, a publication number, a link for the input document, and some text from the input document.
  • a user may enter 6,000,000 as a patent number in input element 702.
  • a user may paste a web link for a document in input element 702.
  • a user may copy some text (title, abstract, or description) from the input document and paste it in input element 702.
  • input element 702 may enable uploading of an electronic copy of the input document.
  • Input element 702 may receive the electronic copy in a plurality of formats.
  • Example of the formats may include, but are not limited to PDFTM, MS WordTM, and MS ExcelTM.
  • a user may upload a PDF of a patent through input element.
  • a user may upload an MS Excel sheet that includes multiple patent numbers for which similar documents have to be searched.
  • a target language element 704 may be used to select one or more target languages in which similar documents for the input document are to be located.
  • the input document is in an originating language.
  • the originating language for example, may be English and the one or more target language, for example, may include, but are not limited to Japanese, German, French, Chinese, and Korean.
  • Target language element 704 includes a drop-down menu 706 in which plurality of target languages are listed. In drop-down menu 706, a check box is provided in front of each of the plurality of target languages. Therefore, to select a particular language, a user can click on a check box associated with a particular target language.
  • target language element 704 is not limited to selection using a drop-down menu nor is the user selection limited to certain languages.
  • a find similar element 708 is activated to locate one or more first documents similar to the input document and one or more second documents similar to the one or more first documents using document vectors.
  • the one or more second documents are in the one or more target languages selected by the user. This is explained in detail in conjunction with FIG. 8.
  • the one or more first documents and the one or more second documents are then displayed in a result section 710.
  • the one or more first documents displayed on result section 710 may be ranked based on the their similarity with the input document, such that, most similar document is give the first rank and the least similar document is given the last rank. Further, the one or more second documents may be ranked based on the one or more first documents. Display of documents in result section 710 in various embodiments is explained in detail in conjunction with FIGS. 9, 1 1 , and 14.
  • FIG. 8 is a flowchart of a method for searching documents, in accordance with an embodiment.
  • a user enters data associated with an input document. This has been explained in detail in conjunction with FIG. 7. Thereafter, the user selects one or more target languages at step 804. For example, the user may choose French, Chinese, and Japanese as target languages.
  • a processor locates one or more first documents similar to the input document using document vectors. Each of the one or more first documents may be a legal document or a non-legal document.
  • the legal document may be a patent document and a non-legal document may be a technology-oriented document. For example, a white paper and a technical journal.
  • each of the input document and the one or more first documents are in the same language, which is the originating language.
  • the language in which the input document is written may decide the originating language. For example, if the input document is in English, then English will be the originating language. Also, in this example, the one or more first documents that are located will be in English.
  • the input document vector created for the input document is compared with document vectors associated with documents stored on a document database in the originating language.
  • the document database may be a patent database or a technical journal database.
  • the input document vector is also compared with one or more document vectors created for the one or more first documents.
  • Each of the one or more document vectors and the input document vector may be one of a static document vector and a dynamic document vector.
  • a user may increase the predefined threshold so as to decrease the number of documents that may qualify as similar documents.
  • a user may decrease the predefined threshold to increase the number of documents that may qualify as similar documents.
  • the processor uses similarity indices to rank the one or more first documents, at step 808.
  • a first document that has the highest similarity index is the most similar document and is thus given the first rank.
  • a first document that has the lowest similarity index is the least similar document and is thus given the last rank.
  • the processor locates one or more second documents that are similar to the one or more first documents at step 810.
  • the one or more second documents are in the one or more target languages selected by the user at step 804.
  • the one or more second documents are in French, Chinese, and Japanese.
  • Each of the one or more second documents may be a legal document or a non-legal document.
  • the legal document may be a patent document and a non-legal document may be a technology oriented document.
  • each second document of the one or more second documents is a family member of a corresponding first document.
  • each of the one or more second documents is a foreign family patent of one of the one or more first documents.
  • a first patent document may be a US patent and a second patent document that is a foreign family patent may be a Japanese patent.
  • the processor After locating the one or more second documents, the processor ranks the one or more second documents based on ranking of the one or more first documents at step 812.
  • the ranking assigned by the processor to a second document of the one or more second documents may be equivalent to a similar first document in the one or more first documents.
  • the processor may enforce this ranking on the second document, even if the second document has a lower ranking amongst the one or more second documents. For example, the processor locates three US patents, i.e., USP1 ,
  • processor locates three patents in Chinese (the target language) that are similar to the three US patents.
  • the three patents in Chinese are
  • CP1 is a Chinese family patent for USP1
  • CP2 is a Chinese family patent for USP1
  • Chinese family patent for USP2 and CP3 is a Chinese family patent for USP3.
  • the processor enforces the ranking of the three US patents on the three patents in
  • CP2 is assigned the first rank
  • CP3 is assigned the second rank
  • CP1 is assigned the third rank.
  • This ranking is assigned irrespective of the fact that amongst CP1 , CP2, and CP3; CP3 is the most relevant document and CP2 is the least relevant when compared to the US patent.
  • the processor may create a table matrix that includes all foreign family members of the one or more first documents.
  • the table matrix can be used to locate the one or more second documents and to enforce rankings.
  • the one or more first documents and the one or more second documents are displayed on a display at step 814.
  • the one or more first documents may be displayed as a first list and the one or more second documents may be displayed as one or more second lists based on the rankings assigned by the processor.
  • the first list and the second list are displayed as adjacent columns depicting the relationship between a first document and a second document.
  • the display includes a first list that displays the one or more first documents, a second list that includes documents in Chinese and a third list that includes documents in Japanese. This is explained in detail in conjunction with FIG. 9.
  • the one or more first documents and the one or more second documents may be displayed as a tree structure, such that, the one or more second documents may branch out from the one or more first documents. This is explained in detail in conjunction with FIG. 1 1 and FIG. 14. It will be apparent to a person skilled in the art that display of the one or more first documents and the one or more second documents may not be limited to the above listed methods or languages.
  • a second document may be displayed to the user in the target language.
  • the second document may be translated into the originating language and then displayed to user.
  • a second document may be a Japanese patent.
  • the processor may automatically translate the Japanese patent into English before displaying it, thus making it easier for the user to understand the Japanese patent.
  • the processor may automatically remove the first document from the display at step 816.
  • the first document that does not have any similar second document may be displayed to the user; however, the user may be provided with an option on the display to remove the first document. This is further explained in detail in conjunction with FIG. 9.
  • the user is interested only in searching documents that are in one or more target languages. As the user is interested to locate documents in the one or more target languages, thus, displaying only those first documents that have foreign family members makes it easier and efficient for the user to analyze results of the search.
  • FIG. 9 illustrates search results to be displayed on result section 710, in accordance with an exemplary embodiment.
  • a user accesses input element 702 to enter a patent number for a US patent, which is in English.
  • the language of the input document .i.e., English is the originating language.
  • the user selects French, Japanese, and Chinese as one or more target languages and activates find similar element 708.
  • a search is performed on a patent database that includes patent documents in English, to locate patent documents similar to the US patent.
  • the search is performed by comparing the document vector of the US patent with document vectors associated with the patent documents in the patent database. Based on the comparison, similarity indices are computed for the patent documents. Using these similarity indices and a predefined threshold, which is dictated by the user as 0.5, the processor locates five patent documents that are similar to the US patent and have similarity index greater than 0.5.
  • the five patent documents, which are in the originating language, i.e., English include E1 that has a similarity index of 0.89, E2 that has a similarity index of 0.81 , E3 that has a similarity index of 0.77, E4 that has a similarity index of 0.65, and E5 that has a similarity index of 0.53.
  • these five patent documents are ranked and displayed as a list 902 in a result box 904.
  • E1 which has the highest similarity index, is given the first rank and E5, which has the lowest similarity index, is given the last rank.
  • a list 908 in result box 904 displays Japanese family patents.
  • E3 and E4 do not have Japanese family patents.
  • Rankings of E1 , E2, and E5 are enforced on their respective Japanese family patents that include J1 , J2, and J5.
  • a list 910 in result box 904 displays Chinese family patents.
  • E1 , E3 and E4 do not have Chinese family patents.
  • Rankings of E2 and E5 are enforced on their respective Chinese family patents, i.e., C2, and C5.
  • Result box 904 is displayed in result section 710, when the user chooses multiple target languages, i.e., French, Japanese, and Chinese. However, if the user chooses only French as the target language, then a result box 912 is displayed in result section 710. As the user is only interested in patent documents that are in French, patent documents in English that do not have a French family patents are not displayed to the user. Therefore, result box 912 is displayed to the user, which only includes E1 , E2, E3, and E5 and their respective French family patents that include F1 , F2, F3, and F5. Similarly, a result box 914 is used to display Japanese family patents and a result box 916 is used to display Chinese family patents.
  • target languages i.e., French, Japanese, and Chinese.
  • a result box 912 is displayed in result section 710.
  • result box 912 is displayed to the user, which only includes E1 , E2, E3, and E5 and their respective French family patents that include F1 , F2, F3, and
  • FIG. 10 is a flowchart of a method for searching documents, in accordance with another embodiment.
  • a user enters data associated with an input document. This has been explained in detail in conjunction with FIG. 7.
  • a processor locates one or more first documents similar to the input document.
  • the input document and the one or more first documents are in an originating language.
  • the processor then ranks the one or more first documents at step 1004. This has been explained in detail in conjunction with FIG. 8.
  • the processor After locating and ranking the one or more first documents, the processor locates the one or more second documents in one or more target languages at step 1006 and enforces ranking of the one or more first documents on the one or more second documents, at step 1008. Thereafter, at step 1010, a display displays the one or more first documents and the one or more second documents. This has been explained in detail in conjunction with FIG. 8.
  • a set of second documents is selected, at step 1012.
  • the set of second documents may be selected based on discretion of the user.
  • the set of second documents may be selected automatically by the processor based on a predefined criterion.
  • the predefined criterion may include, but is not limited to selecting a predetermined number of top ranked second documents from the one or more second documents.
  • the processor may define in a system that the top 50 percent of the one or more second documents will be selected.
  • the predefined criterion may include, but is not limited to the set of second documents having similarity index greater than a predetermined limit.
  • the processor may define in the system that second documents having a similarity index greater than 0.5 will be selected as the set of second documents.
  • the processor creates a set of target language document vectors at step 1014.
  • the set of target vectors are also created in Chinese.
  • the method of creating document vectors is explained in detail in conjunction with Paragraph [0043] to Paragraph [0049].
  • the processor locates one or more target language documents similar to the set of second documents at step 1016.
  • the set of target language document vectors are compared with one or more document vectors associated with target language documents stored in one or more target language databases.
  • the one or more target language databases also include the one or more target language documents located by the processor.
  • a similarity index is computed for each target language document in the one or more target language databases. For each of the one or more target language documents, the similarity index is greater than a predefined threshold and thus they are similar to the set of second documents. This is further explained in detail in conjunction with FIG. 1 1 .
  • the one or more target language documents are then ranked by the processor at step 1018.
  • the processor may rank the one or more target language documents based on similarity with the set of second documents.
  • the similarity may be determined using the similarity index computed for the one or more target language documents. Thus, the most similar target language document, which has the highest similarity index, is given the first rank and the least similar target language document, which has the lowest similarity index, is given the last rank.
  • the one or more target language documents may be divided into one or more sets based on their similarity with the set of second documents.
  • a first set within the one or more target language documents may be similar to a second document in the set of documents and a second set within one or more target language documents may be similar to other second document within the set of second documents.
  • the second document may be ranked higher than the other second document amongst the set of second documents. In this case, the first set will be ranked higher than the second set. This is further explained in detail in conjunction with FIG. 1 1 .
  • any duplicity between the first set and the second set is removed.
  • a duplicate document may be removed from the second set, which is a lower ranked set. This ensures that a user may not have to go through the same documents more than once, thereby increasing efficiency of the user in reviewing results of the search.
  • the one or more target language documents are displayed on the display.
  • the user may further use document vectors for a foreign family patent to search more foreign language patents that are similar to the foreign family patent. Based on this similarity, the foreign language patents located using document vectors of the foreign family patent may have a correlation with the input document, thereby providing the user with a more comprehensive search result.
  • FIG. 1 1 illustrates search results to be displayed on result section 710, in accordance with another exemplary embodiment.
  • a user accesses input element 702 to enter a patent number for a US patent that is in English. Thereafter, the user selects
  • a search is performed on a patent database, which includes patent documents in English, to locate patent documents similar to the US patent.
  • the search is performed by comparing the document vector of the US patent with document vectors associated with the patent documents in the patent database. Based on the comparison, similarity indices are computed for the patent documents. Using these similarity indices and the predefined threshold, which is dictated by the user as 0.5, the processor locates four patent documents that are similar to the US patent and have similarity index greater than 0.5.
  • the four patent documents which are in originating language, i.e., English, include E1 that has a similarity index of 0.89, E2 that has a similarity index of 0.81 , E3 that has a similarity index of 0.77, and E4 that has a similarity index of 0.53. Based on their respective similarity indices, these four patent documents are ranked and displayed as a list 1 102.
  • Japanese family patents are searched. Adjacent to list 1 102, Japanese family patents for the four patent documents are displayed in a list 1 104.
  • the Japanese family patents include, J1 , J2, J3, and J4. Rankings of the four patent documents in English are enforced on their respective Japanese family patents; as a result, J1 is assigned the same ranking as E1 and similarly for J2, J3, and J4.
  • J1 and J3 are selected as the most relevant Japanese patents.
  • a Japanese document vector for J 1 is compared with document vectors associated with patent documents in a Japanese patent database. Based on the comparison, similarity indices are computed for the patent documents with reference to J 1 . Using these similarity indices, J5 with a similarity index of 0.91 , J6 with a similarity index of 0.83, and J7 with a similarity index of 0.69 are selected as Japanese patents similar to J1 .
  • J5, J6, and J7 are ranked based on their similarity indices; J5 being assigned the highest rank and J7 being assigned the lowest rank amongst J5, J6, and J7. Thereafter, J5, J6, and J7 are displayed in a list 1 1 10.
  • J8 with a similarity index of 0.86, J9 with a similarity index of 0.77, and J10 with a similarity index of 0.64 are selected as Japanese patent documents similar to J3.
  • J8, J9, and J 10 are ranked based on their similarity indices; J8 being assigned the highest rank and J10 being assigned the lowest rank amongst J8, J9, and J 10. Thereafter, J8, J9, and J 10 are also displayed in list 1 1 10. Additionally, each of J5, J6, and J7 are ranked higher than J8, J9, and J 10 in list 1 1 10.
  • FIG. 12 is a flowchart of a method for searching documents, in accordance with another embodiment.
  • a user enters data associated with an input document.
  • the user may upload the input document.
  • the input document is in an originating language. This has been explained in conjunction with FIG. 7.
  • the input document has an originating language document vector associated with it. Otherwise, the originating language document vector may be created for the input document when the input document is uploaded.
  • a processor translates the originating language document vector into one or more target language vectors.
  • Each word in the originating language document vector is translated into the one or more target languages.
  • the originating document vector is in English and is represented as: ⁇ [table, 11 [chair, 0.51 [plate, 0.21).
  • the user chooses French as the target language.
  • each word is translated into French to create a translated document vector in French.
  • the document vector in French may be represented as: ⁇ [tableau, 11
  • the number of target language document vectors that are created depends on the number of target languages chosen by the user. For example, if the user chooses Chinese, Japanese, and French as the target languages, then the processor will create three target language document vectors, one each in Chinese, Japanese, and French.
  • the processor locates one or more documents in one or more target languages at step 1208. For this, the processor accesses target language databases that include documents in the one or more target languages. The documents in the target language databases have document vectors associated with them in the one or more target languages. Therefore, the processor compares the one or more target language document vectors with the document vectors associated with the documents in target language databases to locate the one or more target language documents.
  • the similarity index for each of the one or more target language documents is greater than a predefined threshold. The similarity index is used to determine similarity of the one or more target language documents with the one or more target language document vectors created for the input document.
  • the processor uses similarity indices to rank the one or more target language documents at step 1210. The most similar target language document is given the highest rank and the least similar target language document is given the lowest rank. Thereafter, at step 1212, a display displays the one or more target language documents.
  • FIG. 13 is a flowchart of a method for searching documents, in accordance with another embodiment.
  • a plurality of first documents in an originating language and one or more second documents in one or more target languages are displayed on a display.
  • the one or more second documents are similar to the plurality of first documents. This has been explained in detail in conjunction with FIG. 8 and 9.
  • a processor performs a check to determine if one or more of the plurality of first documents is dissimilar to each of the one or more second documents. If result of the check is positive, then at step 1306, the processor identifies and separates out such a subset of the plurality of first documents. Each first document in the subset is dissimilar to each document in the one or more second documents. For example, five patent documents are displayed in English as the plurality of first documents and three patent documents are displayed in Chinese as the one or more second documents. These three patent documents in Chinese are family patents of three of the five patent documents in English. For rest of the two patent documents in English, there is no Chinese family patent. Thus, these two patent documents in English are identified as the subset. This is explained in detail in conjunction with FIG. 14.
  • a set of originating language document vectors created for the subset are translated into a set of target language document vectors.
  • the method of translation has been explained in detail in conjunction with FIG. 12.
  • the processor locates one or more target language documents that are similar to documents in the subset. This is achieved by using a similarity index that is computed by comparing the set of target language document vectors with one or more target language document vectors associated with target language document in one or more target language databases. For each of the one or more target language documents located by the processor, the similarity index is greater than a predefined threshold.
  • the processor ranks the one or more target language document vectors based on ranking of second documents in the subset.
  • the processor enforces the ranking of a second document in the subset on a similar target language document. For example, if a second document is ranked highest, a similar target language document will also be given the highest rank amongst the one or more target language documents. This is further explained in detail in conjunction with FIG. 14.
  • the one or more target language documents are displayed on the display.
  • FIG. 14 illustrates search results to be displayed on result section 710, in accordance with another exemplary embodiment.
  • a user accesses input element 702 to enter a patent number for a US patent that is in English. Thereafter, the user selects Japanese as the target language and activates find similar element 708. In response to this, using the document vector for the US patent, a search is performed on an English patent database, to locate patent documents similar to the US patent.
  • the search is performed by comparing the document vector of the US patent with document vectors associated with the patent documents in the English patent database. Based on the comparison, similarity indices are computed for the patent documents. Using these similarity indices and predefined threshold, which is dictated by the user as 0.5, the processor locates five patent documents that are similar to the input document and have similarity index greater than 0.5.
  • the five patent documents, which are in the originating language, i.e., English include E1 that has similarity index of 0.89, E2 that has a similarity index of 0.81 , E3 that has a similarity index of 0.77, E4 that has a similarity index of 0.65, and E5 that has a similarity index of 0.53. Based on their respective similarity indices, these five patent documents are ranked and displayed as a list 1402.
  • Japanese family patents are searched. Adjacent to list 1402, Japanese family patents for the five patent documents are displayed in a list 1404.
  • the Japanese family patents include, J 1 , J2, and J4. Out of the five patent documents in English, E3 and E4 do not have a Japanese family patent. Rankings of three of the five patent documents in English are enforced on their respective Japanese family patents, such that, J1 is assigned the same ranking as E1 and similarly for J2 and J5.
  • E3 and E4 do not have any Japanese family patents, however, they are very similar to the patent provided by the user as input document and thus E3 and E4 should not be ignored. Assume, for example, that user only wants similar patent documents in Japanese. Therefore, an English document vector for E3 and an English document vector for E4 are translated into Japanese document vectors. The translation of document vectors has been explained in detail in conjunction with FIG. 12.
  • a Japanese document vector created for E3 is compared with document vectors associated with patent documents in a Japanese patent database. Based on the comparison, similarity indices are computed for the patent documents with reference to the Japanese document vector created for E3. Using these similarity indices, J6 with a similarity index of 0.91 , J7 with a similarity index of 0.83, and J8 with a similarity index of 0.69 are selected as Japanese patent documents similar to E3. Further, J6, J7, and J8 are ranked based on their similarity indices; J6 being assigned the highest rank and J8 being assigned the lowest rank amongst J6, J7, and J8. Thereafter, J6, J7, and J8 are displayed in a list 1406.
  • a Japanese document vector created for E4 is compared with document vectors associated with patent documents in a Japanese patent database. Based on the comparison, similarity indices are computed for the patent documents with reference to the Japanese document vector created for E4. Using these similarity indices, J9 with a similarity index of 0.86, J 10 with a similarity index of 0.77, and J1 1 with a similarity index of 0. 64 are selected as Japanese patent documents similar to E4. Further, J9, J 10, and J1 1 are ranked based on their similarity indices; J9 being assigned the highest rank and J1 1 being assigned the lowest rank amongst J9, J10, and J 1 1 . Thereafter, J9, J10, and J 1 1 are displayed in list 1406. Each of list 1402, list 1404, and list 1406 are displayed in result section 710.
  • FIG. 15 is a block diagram 1500 illustrating a system for creating and searching such a collection.
  • a computer system 1502 can be provided with a processor unit 1504 coupled to memory 1506 by a bus structure 1508. Although only one processor unit 1504 is shown, more processor units may be provided in an expanded design.
  • the system 1502 is shown in communication with storage media 1540 configured to store a document collection 1542.
  • the electronic document collection may include a compilation of patent documents, including issued patents and published patent applications, and including documents in multiple languages.
  • the storage media 1540 can be in communication with the processor unit 1504.
  • the system is shown in communication with a visual display 1550 for presentation of visual data. Each of the elements shown and described herein support query submission to the document collection 1542.
  • a document manager 1560 can be provided local to the computer system 1502 and in communication with memory 1506.
  • the document manager 1560 can be responsible for creating a document vector for each patent document in the collection 1542 at the time of indexing. More specifically, the document manager 1560 creates at least one static document vector 1544 for each patent document in the collection 1542.
  • the document vectors 1544 created by the document manager 1560 can be housed in the storage media 1540.
  • the computer system's 1600 document manager 1601 can be employed to create multiple static document vectors for each patent document, each static document vector being in a different language.
  • the document manager 1601 can have machine-translation capabilities itself, can request translation of the static document vector by another component of the system, such as a translation manager 1603, and/or could request translation of the static document vector by a remote translation manager 1620 via a network 1610.
  • the document manager 1601 sends a translation request, including the static document vector to be translated, via the Internet. Such a request could also be sent by the system's translation manager 1603.
  • Various configurations can be employed to reduce the load on the document manager so that it can continue indexing documents while translations are being performed.
  • the document manager 1560 can be employed to create multiple static document vectors 1544 for each patent document in the same language.
  • the various static document vectors can be directed to different sections or combinations of sections of the patent document, as discussed previously.
  • An input manager 1562 can be also provided local to the computer system 1502 and in communication with memory 1506.
  • the input manager 1562 can be responsible for creating a dynamic document vector at query time based on one or more terms of an input search query.
  • the computer system's 1600 input manager 1602 can be employed to translate the dynamic document vector into another language.
  • the input manager 1602 can have machine-translation capabilities itself, can request translation of the dynamic document vector by another component of the system, such as the translation manager 1603, and/or could request translation of the dynamic document vector by a remote translation manager 1620 via a network 1610.
  • the document manager 1601 sends a translation request, including the dynamic document vector to be translated, via the Internet. Such a request could also be sent by the system's translation manager 1603.
  • the input manager 1562 can be in communication with a query manager 1564, also provided local to the computer system 1502 and in communication with memory 1506.
  • the query manager 1564 can be responsible for the comparison of the dynamic document vector, created by the input manager 1562, with static document vectors 1544 in response to submission of a query input to the document collection 1542.
  • a field in each document vector can specify the language of the document vector, so as to facilitate comparison of only identical languages.
  • the comparison yields a compilation of relevant patent documents 1546.
  • the compilation can be presented on the visual display 1550. Similarly, in an embodiment, the compilation may be retained on storage, either volatile or persistent.
  • a compilation of non-relevant terms 1548 may be employed to parse non-relevant terms from the static document vectors 1544.
  • the compilation of non-relevant terms 1548 can be retained on storage media 1540 and periodically updated by the document manager 1560. Either employing or disregarding the non-relevant terms, the document manager 1560 may be directed to create multiple static document vectors for each patent document in the document collection 1542.
  • a selection manager 1566 can be provided local to the computer system 1502 and in communication with memory 1506. More specifically, the selection manager 1566 can be in communication with the query manager 1564 to select a search scope for the document collection. The selected search scope determines a selection of static document vectors to be applied by the query manager 1564 to process the query.
  • Various combinations of the managers and components described with respect to FIGS. 15 and 16 may reside in memory 1506 local to the computer system. However, other configurations are possible. For example, in an embodiment, the various managers may each reside as hardware tools external to local memory 1506, or they may be implemented as a combination of hardware and software. Similarly, in an embodiment, the various managers, may reside on a remote system in communication with the storage media 1540. Accordingly, a manager may be implemented as a software tool or a hardware tool to support submission of one or more queries to an electronic patent document collection to yield a compilation of relevant patent documents.
  • embodiments can be implemented in software, which may include but is not limited to firmware, resident software, microcode, etc.
  • Embodiments can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • program storage means can be any available media which can be accessed by a general purpose or special purpose computer.
  • program storage means can include random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired program code means and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included in the scope of the program storage means.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, RAM, ROM, a rigid magnetic disk, and an optical disk.
  • Current examples of optical disks include CD-ROM, compact disk read/write (CD-R/W), and digital versatile disc (DVD).
  • a data processing system suitable for storing and/or executing program code can include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code is retrieved from bulk storage during execution.
  • I/O devices can be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
  • the software implementation can take the form of a computer program product accessible from a computer-useable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention porte sur un système de recherche documentaire qui comprend un dispositif d'affichage affichant un second document dans au moins une langue cible, et sur un procédé et un produit programme d'ordinateur associés. Le document ou groupe de documents en langue cible est lié au document en langue d'origine qui est soit traduit à partir d'un document d'origine soit lié par exemple légalement ou par sujet. Un aspect peut également consister à sélectionner automatiquement un ensemble de seconds documents en langue cible à partir du document en langue d'origine sur la base d'un critère prédéfini et à créer un ensemble de vecteurs de document en langue cible pour l'ensemble de documents en langue cible et ensuite à classer les résultats. Un système de recherche localise ensuite au moins un document en langue cible qui est similaire à l'ensemble de seconds documents à l'aide de l'ensemble de vecteurs de document en langue cible. L'invention permet à un utilisateur de rechercher facilement des documents provenant de divers pays, malgré une connaissance négligeable des langues cibles. Selon un exemple, l'invention offre l'avantage de localiser automatiquement une liste de brevets étrangers classés par similarité pour un unique document de brevet d'entrée sans traduire le brevet d'origine recherché.
PCT/US2011/052090 2011-09-19 2011-09-19 Collection de documents de brevet électronique en plusieurs langues interrogeables et techniques de recherche dans celle-ci WO2013043146A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2011/052090 WO2013043146A1 (fr) 2011-09-19 2011-09-19 Collection de documents de brevet électronique en plusieurs langues interrogeables et techniques de recherche dans celle-ci

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2011/052090 WO2013043146A1 (fr) 2011-09-19 2011-09-19 Collection de documents de brevet électronique en plusieurs langues interrogeables et techniques de recherche dans celle-ci

Publications (1)

Publication Number Publication Date
WO2013043146A1 true WO2013043146A1 (fr) 2013-03-28

Family

ID=47914693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/052090 WO2013043146A1 (fr) 2011-09-19 2011-09-19 Collection de documents de brevet électronique en plusieurs langues interrogeables et techniques de recherche dans celle-ci

Country Status (1)

Country Link
WO (1) WO2013043146A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001043236A (ja) * 1999-07-30 2001-02-16 Matsushita Electric Ind Co Ltd 類似語抽出方法、文書検索方法及びこれらに用いる装置
US20050119995A1 (en) * 2001-03-21 2005-06-02 Knowledge Management Objects, Llc Apparatus for and method of searching and organizing intellectual property information utilizing an IP thesaurus
US20100287148A1 (en) * 2009-05-08 2010-11-11 Cpa Global Patent Research Limited Method, System, and Apparatus for Targeted Searching of Multi-Sectional Documents within an Electronic Document Collection
US20110066612A1 (en) * 2009-09-17 2011-03-17 Foundationip, Llc Method, System, and Apparatus for Delivering Query Results from an Electronic Document Collection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001043236A (ja) * 1999-07-30 2001-02-16 Matsushita Electric Ind Co Ltd 類似語抽出方法、文書検索方法及びこれらに用いる装置
US20050119995A1 (en) * 2001-03-21 2005-06-02 Knowledge Management Objects, Llc Apparatus for and method of searching and organizing intellectual property information utilizing an IP thesaurus
US20100287148A1 (en) * 2009-05-08 2010-11-11 Cpa Global Patent Research Limited Method, System, and Apparatus for Targeted Searching of Multi-Sectional Documents within an Electronic Document Collection
US20110066612A1 (en) * 2009-09-17 2011-03-17 Foundationip, Llc Method, System, and Apparatus for Delivering Query Results from an Electronic Document Collection

Similar Documents

Publication Publication Date Title
KR100572797B1 (ko) 데이터베이스 검색 방법, 데이터베이스 검색 시스템 및 컴퓨터 판독 가능 기록 매체
US9836511B2 (en) Computer-generated sentiment-based knowledge base
CN1198225C (zh) 关键字提取系统及采用该系统的文本检索系统
US8285702B2 (en) Content analysis simulator for improving site findability in information retrieval systems
US20180004850A1 (en) Method for inputting and processing feature word of file content
US20130013612A1 (en) Techniques for comparing and clustering documents
US20100174704A1 (en) Searching method and system
US20060129531A1 (en) Method and system for suggesting search engine keywords
US8862595B1 (en) Language selection for information retrieval
Dowell et al. Integrating text mining into the MGI biocuration workflow
KR20070089449A (ko) 문서 분류방법 및 그 문서 분류방법을 컴퓨터에서 실행시키기 위한 프로그램을 포함하는 컴퓨터로 읽을 수있는 기록매체.
CN1846210A (zh) 利用本体存储并检索数据的方法及装置
US20170262528A1 (en) System and method of content based recommendation using hypernym expansion
CN106156111B (zh) 专利文件检索方法、装置和系统
AU2013231149B2 (en) Systems and methods for keyword research and content analysis
Crespo Azcarate et al. Improving image retrieval effectiveness via query expansion using MeSH hierarchical structure
US20100287177A1 (en) Method, System, and Apparatus for Searching an Electronic Document Collection
US20120179709A1 (en) Apparatus, method and program product for searching document
JP5897991B2 (ja) 専門家評価情報管理装置
Trainor et al. Digging into the data: Exposing the causes of resolver failure
JP2011100191A (ja) 文書検索装置、文書検索方法、及び文書検索プログラム
JP2010055164A (ja) 文章検索装置、文章検索方法、文章検索プログラムおよびその記録媒体
KR101172487B1 (ko) 검색 결과 내에 첨부된 정보 데이터베이스에 기초한 검색 리스트 및 검색어 순위 제공 방법 및 시스템
WO2013043146A1 (fr) Collection de documents de brevet électronique en plusieurs langues interrogeables et techniques de recherche dans celle-ci
Keepanasseril PubMed alternatives to search MEDLINE: an environmental scan

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11872731

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11872731

Country of ref document: EP

Kind code of ref document: A1