CN106164889A - System and method for internal storage data library searching - Google Patents

System and method for internal storage data library searching Download PDF

Info

Publication number
CN106164889A
CN106164889A CN201480072953.7A CN201480072953A CN106164889A CN 106164889 A CN106164889 A CN 106164889A CN 201480072953 A CN201480072953 A CN 201480072953A CN 106164889 A CN106164889 A CN 106164889A
Authority
CN
China
Prior art keywords
entity
search
computer
extraction
occurrence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201480072953.7A
Other languages
Chinese (zh)
Inventor
斯科特·莱特纳
弗兰兹·威克斯尔
拉凯什·戴维
桑贾伊·博德胡
约瑟夫·贝克内尔
毕拉里·哈基祖姆瓦米
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chubais LLC
Qbase LLC
Original Assignee
Chubais LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chubais LLC filed Critical Chubais LLC
Publication of CN106164889A publication Critical patent/CN106164889A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/02Computing arrangements based on specific mathematical models using fuzzy logic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Fuzzy Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Algebra (AREA)
  • Automation & Control Theory (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)

Abstract

Disclose use entity co-occurrence knowledge base to identify the system and method for related entities.Embodiment uses the entity co-occurrence knowledge base of the entity of the corpus extraction from entity index to extract the entity identified in the search query, so that Search Results is rendered as related entities.Also disclose for using the fuzzy score mated with entity co-occurrence knowledge base to generate the embodiment of search suggestion.Embodiment extracts part entity from search inquiry, performs the matching algorithm of type based on extracted entity, and performs search facing to entity co-occurrence knowledge base.Also disclose the embodiment for mating the search suggestion generating related entities based on co-occurrence and/or fuzzy score.Embodiment processes and presents the suggestion to complete inquiry to partial search query, and complete inquiry is used as new search inquiry.Also disclose for by using entity and trend co-occurrence knowledge base to extract entity from search inquiry, using entity co-occurrence to generate the embodiment of search suggestion.Also disclose the embodiment of search capability for enabling the search based on entity that is geographical and that named in Content Management System.

Description

System and method for internal storage data library searching
Technical field
The disclosure relates generally to method for information retrieval and system;More specifically, be used for using entity co-occurrence to search The method of rope related entities.The disclosure relates generally to inquiry and strengthens;More specifically, in the entity co-occurrence used in knowledge base and The search suggestion of fuzzy score coupling.The disclosure relates generally to computer inquery and processes;More specifically, based on co-occurrence and/or The electronic search suggestion of the related entities of fuzzy score coupling.The disclosure relates generally to method for information retrieval and is System;More specifically, for the method obtaining search suggestion.The disclosure relates generally to search engine and Content Management;More specifically Ground, makes the search engine technique of Content Management System extend the name entity enabling to carry out GEOGRAPHICAL INDICATION and digital content Abundant.
Background technology
In business environment, known search engine resolves last set term and returns the string sorted in some way Project (being webpage in traditional search).The history ginseng of other users it is typically based on for performing the most well-known method of search Examining to set up search inquiry data base, this data base can generate index eventually for based on key word.User search queries can be wrapped Include by can be with the title of entity associated or one or more entities of attribute identification.Entity may also include tissue, people, position, Date and/or time.In typical search, if user is searching for the information relevant to Liang Ge particular organization, then search for Engine can return to the result of classification, and this result can be relevant to the mixing of the different entities with same names or similar names. Later approach may result in user find may not be interested to user the relevant larger numbers of document of document.
Therefore, for the side for searching for related entities of the ability that user finds related entities interested can be given , there is demand in method.
User uses search engine to position information interested from the Internet or any Database Systems continually.Search Engine is generally by receiving search inquiry from user and Search Results is returned to user operating.Generally drawn by search Holding up, Search Results based on each return and the dependency of search inquiry, to search result rank.Therefore, for Search Results Quality for, the quality of search inquiry may be the most important.But, in most of the cases, from the search inquiry of user Be probably imperfect ground or partly write out (such as, search inquiry may not include enough words with generate pay close attention to One group of relevant result, on the contrary, generates the most unrelated result), and there may come a time when cacography (such as, Bill Smith Bill Smitth may be spelt into mistakenly).
A kind of common method of the quality improving Search Results is to strengthen search inquiry.Search inquiry is increased Strong a kind of mode can be that input based on user generates possible suggestion.To this, certain methods proposes for from by one Or multiple user submit to before inquiry in identify the method that the candidate query for given inquiry optimizes.But, these sides Method is based on inquiry log, and this daily record there may come a time when to direct the user to possible uninterested result.There is use to be not likely to be The additive method of sufficiently accurate different technologies.Therefore, for improving or strengthening more smart to obtain from the search inquiry of user The method of true result, however it remains demand.
User uses search engine to position information interested for from the Internet or any Database Systems continually. Search engine is generally by receiving search inquiry from user and Search Results is returned to user operating.It is typically based on every The Search Results of individual return and the dependency of search inquiry come to search result rank.Therefore, the quality for Search Results is come Saying, the quality of search inquiry may be the most important.But, in most of the cases, the search inquiry from user is probably not Completely or partially write out (such as, search inquiry may not include enough words with generate pay close attention to one group be correlated with Result, on the contrary, generate the most unrelated result), and (such as, Bill Smith may mistake to there may come a time when cacography Spell into Bill Smitth).
A kind of common method of the quality improving Search Results is to strengthen search inquiry.Search inquiry is increased Strong a kind of mode can be that input based on user generates possible suggestion.To this, certain methods proposes for from by one Or multiple user submit to before inquiry in identify the method that the candidate query for given inquiry optimizes.But, these sides Method is based on inquiry log, and this daily record there may come a time when to direct the user to possible uninterested result.There is use to be not likely to be The additive method of sufficiently accurate different technologies.Therefore, for improving or strengthening more smart to obtain from the search inquiry of user True result and the method for related entities interested also presented to user when user gets search inquiry, still deposit In demand.
Search engine includes multiple feature to provide the prediction for user's inquiry.Such prediction can include that inquiry is certainly Dynamic complete and search is advised.Now, such Forecasting Methodology is based on history keyword word reference.The reference of such history may not be smart Really, its reason is to may relate to the multiple themes in single text at a key word.
It addition, user search queries can include by can be one or more with what the title of entity associated or attribute identified Entity.Entity may also include tissue, people, position, event, date and/or time.In typical search, if user's search The information relevant to Liang Ge particular organization, then search engine can return to classification result, this result can with there is same names Or the mixing of the different entities of similar names is relevant.Later approach may result in user and finds and may not actually feel emerging with user The larger numbers of document that the document of interest is relevant.
Therefore, for for obtaining the method searching for suggestion faster and more accurately, there is demand.
Become known for documentation release and control the Content Management with collaborative project management and document file management system.One unrestricted Property example can be the Sharepoint (share with point) of MicrosoftThe application external member of software and instrument.Microsoft SharepointIt is a series of software products developed by Microsoft, is used for cooperating, file-sharing and network are sent out Cloth.SharePointCan provide the user with substantial amounts of in perhaps information, and may become user be difficult to find that for The maximally related information of particular case.In order to alleviate these problems, SharePointThere is provided search engine to help User finds the content that they need.User can input search inquiry based on key word, and once content is the most indexed, then SharePointIn search engine can to user return at SharePointFind in the environment of platform The maximally related result of string.
Sometimes, user is it may be desirable to find and SharePointIn geographical entity or other kinds of entity The content that (tissue such as mentioned in document or people) is relevant.SharePointImmediately available function is not provided Automatically to extract entity from document.Specifically, it does not support that the content of GEOGRAPHICAL INDICATION is to extract geographical entity and by geography reality Body is decomposed into geographical position.Additionally, SharePointNot support entity labelling so as to identify, disambiguation and carrying Take the tissue in the entity named, such as document or people.But, SharePointSearch can be extended with can Carrying out the search that effective geographic search is relevant to entity with other, it includes search facet based on entity.SharePointBefore, version includes " fast search " for SharePoint, and contents processing pipeline therefrom can be made to extend through Cross sandbox application program, but so again limit its addressable information the most slowly.
SharePointIntroducing open API much, this enables to increase special linguistics, such as Concept extraction, relation extraction, GEOGRAPHICAL INDICATION, collect and fine text analyzing.Therefore, for extension SharePointThe ability of search engine enable to carry out geographic search and search based on other entities, there is chance.
Summary of the invention
Disclose for using entity co-occurrence to the method searching for related entities.In an aspect of this disclosure, method can For the search system of client/server type architecture can be included.In one embodiment, search system can include for searching The user interface that index is held up, search engine is connected and one or more server apparatus communications by network.Server apparatus can Electronic data corpus, entity co-occurrence repository database and entity extraction computer module including entity index.Knowledge Storehouse can be created as memory database, and may also include other assemblies, the most one or more search controllers, multiple search joint Point, the set of compression data and disambiguation module.One search controller optionally saves with one or more search Point association.Each search node can run through the set of compression data and perform fuzzy keyword search independently and by one group of scoring Result returns to the search controller of associated.
In one embodiment, a kind of computer implemented method includes: by entity extraction computer from client meter Calculation machine receives the search inquiry including one or more entity;By entity extraction computer, each corresponding entity is real with corresponding Body one or more co-occurrences in co-occurrence data storehouse are made comparisons;One or more entity is determined in response to according to co-occurrence data storehouse Each corresponding entity in subset exceedes the confidence in co-occurrence data storehouse, comes by entity extraction computer from search inquiry Extracting the subset of one or more entities, wherein, confidence is based on the one or more phases in electronic data corpus Close the co-occurrence qualitative extent really of entity and each corresponding entity;Given in the multiple entities extracted by entity extraction computer Each entity distribution index identifier (index ID);By entity extraction computer by be used for extract multiple entities in each The index ID of entity is saved in electronic data corpus, electronic data corpus be by with in one or more related entities Index ID corresponding to each entity index;Electronic data language material by search server computer search entity index Storehouse, to position the multiple entities extracted and to identify the index ID of data record, in data record, in multiple entities of extraction At least two entity co-occurrence;And set up search result list by search server computer, search result list have with The data record corresponding for index ID identified.
In one embodiment, a kind of system includes one or more server computer, one or more server meters Calculating facility and have one or more processor, one or more processors perform to refer to for the computer-readable of multiple computer modules Order, multiple computer modules include: entity extraction module, are configured to receive user's input of search inquiry parameter, entity extraction Module is further configured to: by being made comparisons from searching with entity co-occurrence data storehouse by each entity in the multiple entities extracted Rope query argument extracts multiple entities, and wherein entity co-occurrence data storehouse includes confidence, confidence instruction electron number According to the co-occurrence qualitative extent really of the one or more related entities in corpus Yu the entity of extraction, give the multiple entities extracted In each entity distribution index identifier (index ID), the index ID of each entity being used in multiple entities of extracting is protected Existing in electronic data corpus, electronic data corpus is by corresponding with each entity in one or more related entities Index ID index;And search server module, it is configured to the electronic data corpus of searching entities index, with location The multiple entities extracted the index ID identifying data record, at least two in data record, in multiple entities of extraction Entity co-occurrence, search server module is further configured to set up search result list, and search result list has and identified Data record corresponding for index ID.
In another embodiment, a kind of non-transitory computer-readable medium, on it, storage has the executable finger of computer Order, instruction includes: the user being received search inquiry parameter by entity extraction computer is inputted;By the multiple entities that will extract In each entity make comparisons with entity co-occurrence data storehouse, come by entity extraction computer multiple from search inquiry parameter extraction Entity, wherein entity co-occurrence data storehouse includes confidence, in confidence instruction electronic data corpus one or Multiple related entities and the co-occurrence qualitative extent really of entity extracted;The multiple realities extracted are given by entity extraction computer Each entity distribution index identifier (index ID) in body;By entity extraction computer by the multiple entities being used for extraction The index ID of each entity be saved in electronic data corpus, electronic data corpus is by relevant to one or more The index ID that each entity in entity is corresponding indexes;Electron number by search server computer search entity index According to corpus, to position the multiple entities extracted and to identify the index ID of data record, in data record, multiple realities of extraction At least two entity co-occurrence in body;And set up search result list, search result list by search server computer The data record corresponding for index ID having and identified.
Disclose a kind of for by using the entity co-occurrence in knowledge base and fuzzy score coupling to generate search suggestion Method.In the one side of the disclosure, method can be used for including the search system of client/server type architecture.One In individual embodiment, search system can include the user interface for search engine, search engine by network connect with one or Multiple server apparatus communications.Server apparatus can include entity extraction computer module, fuzzy score matching computer module And entity co-occurrence repository database.Knowledge base can be created as memory database, and may also include other hardware and/or soft Part assembly, the most one or more search controllers, multiple search node, the set of compression data and disambiguation computer Module.One search controller optionally associates with one or more search nodes.Each search node can run through pressure The set of contracting data performs fuzzy keyword search independently and one group of appraisal result returns to the search control of associated Device.
In another aspect of the present disclosure, method comprises the steps that entity extraction module, and entity extraction module can be to the search provided Query execution part entity extracts, to identify whether search inquiry mentions entity, if it is, identify which kind of search inquiry mentions The entity of type.Additionally, method comprises the steps that fuzzy score matching module, fuzzy score matching module can be based on the entity extracted Type and produce algorithm and perform search facing to entity co-occurrence knowledge base.It addition, be detected as not corresponding with entity inquiry Textual portions is taken as the concept characteristic that can be used for searching entities co-occurrence knowledge base, such as theme, the fact and key phrase.? In one embodiment, entity co-occurrence knowledge base includes information bank, at this information bank, and can be according to entity to entity, entity to main Entity is indexed by topic or entity to the fact etc., and this can be easy to that suggestion fast and accurately is returned to user so that searching for Inquiry is complete.
In one embodiment, a kind of method is disclosed.The method includes: by entity extraction computer from user interface Receive user's input of search inquiry parameter;By search inquiry parameter is made comparisons with entity co-occurrence data storehouse, and identify with At least one entity type that one or more entities in search inquiry parameter are corresponding, by entity extraction computer from searching Rope query argument extracts one or more entities, and wherein, entity co-occurrence data storehouse has one or more entity at electronic data The example of co-occurrence in corpus;And select fuzzy matching algorithm by fuzzy score matching computer, for searching entities Co-occurrence data storehouse and identify the one or more records with search inquiry parameter association, wherein, fuzzy matching algorithm with identified At least one entity type corresponding.The method farther includes: by the mould selected by the use of fuzzy score matching computer Stick with paste matching algorithm searching entities co-occurrence data storehouse, and form one or more suggestions based on described search from one or more records Search inquiry parameter;And present one or more proposed searching by fuzzy score matching computer via user interface Rope query argument.
In another embodiment, it is provided that a kind of system.This system includes one or more server computer, one Or multiple server computer has one or more processor, the one or more processor performs for multiple computers The computer-readable instruction of module, the plurality of computer module includes: entity extraction module, entity extraction module be configured to from User interface receives user's input of search inquiry parameter, and entity extraction module is further configured to by by search inquiry parameter Make comparisons with entity co-occurrence data storehouse, and identify that at least one corresponding with the one or more entities in search inquiry parameter is real Body type, comes from the one or more entity of search inquiry parameter extraction, and wherein, entity co-occurrence data storehouse has one or more reality Body is the example of co-occurrence in electronic data corpus.This system farther includes: fuzzy score matching module, and fuzzy score mates Module be configured to select fuzzy matching algorithm, for searching entities co-occurrence data storehouse thus identify with search inquiry parameter association One or more records, wherein, fuzzy matching algorithm is corresponding with at least one entity type identified.Fuzzy score mates Module is further configured to: the fuzzy matching algorithm selected by use comes searching entities co-occurrence data storehouse, and based on described search One or more proposed search inquiry parameter is formed from one or more records;And via user interface present one or Search inquiry parameter proposed by multiple.
Disclose a kind of side for mating the search suggestion generating related entities based on co-occurrence and/or fuzzy score Method.In the one side of the disclosure, method can be used for including the computer search system of client/server type architecture.? In one embodiment, search system can include the user interface for search engine, and search engine is connected and one by network Or multiple server apparatus communication.Server apparatus can include one or more processor and entity co-occurrence knowledge base data Storehouse, the one or more processor performs the instruction for multiple special-purpose computer modules, the plurality of special-purpose computer mould Block includes entity extraction module and fuzzy score matching module.Knowledge base can be created as memory database, and may also include it His assembly, the most one or more search controllers, multiple search node, the set of compression data and disambiguation module. One search controller optionally associates with one or more search nodes.Each search node can run through compression data Set perform fuzzy keyword search independently and one group of appraisal result returned to the search controller of associated.
In another aspect of the present disclosure, method comprises the steps that by entity extraction module from the search inquiry enforcement division provided Divide entity extraction, to identify whether search inquiry mentions entity, if it is, determine entity type.Additionally, method comprises the steps that Generate the algorithm corresponding with the type of the entity extracted by fuzzy score matching module and hold facing to entity co-occurrence knowledge base Line search.It addition, be detected as being taken as not as the query text part of entity can be used for searching entities co-occurrence knowledge base Concept characteristic, such as theme, the fact and key phrase.Can have information bank entity co-occurrence knowledge base can by quickly and Accurate suggestion returns to user so that search inquiry is complete, wherein can arrive according to entity to entity, entity at this information bank Entity is indexed by theme or entity to the fact etc..
In the further aspect of the disclosure, complete search inquiry can be used as new search inquiry.Search system can be located Manage new search inquiry, run entity extracts, find from entity co-occurrence knowledge base have highest score related entities and The drop-down list that user is useful can be presented described related entities.
In one embodiment, a kind of method is disclosed.The method includes: by entity extraction computer from user interface User's input of receiving portion search inquiry parameter, this partial search query parameter has at least one incomplete search inquiry Parameter;By by partial search query parameter with there is one or more first instance reality of co-occurrence in electronic data corpus Making comparisons in the entity co-occurrence data storehouse of example, and identifies corresponding with the one or more first instances in partial search query parameter At least one entity type, comes by entity extraction computer real from partial search query parameter extraction one or more first Body;And by fuzzy score matching computer select fuzzy matching algorithm, for searching entities co-occurrence data storehouse thus know Not with one or more records of partial search query parameter association, wherein, fuzzy matching algorithm and identified at least one Entity type is corresponding.The method farther includes: by the fuzzy matching algorithm selected by the use of fuzzy score matching computer Come searching entities co-occurrence data storehouse, and form searching of one or more first suggestion based on described search from one or more records Rope query argument;Presented the search inquiry of one or more first suggestion via user interface by fuzzy score matching computer Parameter;The user being received the search inquiry parameter to one or more first suggestions by entity extraction computer is selected, with shape Become complete search inquiry parameter;And it is one or more from complete search inquiry parameter extraction by entity extraction computer Second instance.The method farther includes: by entity extraction computer search entity co-occurrence data storehouse, with identify with one or One or more entities that multiple second instances are relevant, to form the search inquiry parameter of one or more second suggestion;And Presented the search inquiry parameter of one or more second suggestion via user interface by entity extraction computer.
In another embodiment, a kind of system is disclosed.This system includes one or more server computer, described one Individual or multiple server computers have one or more processor, and the one or more processor performs for multiple calculating The computer-readable instruction of machine module, the plurality of computer module includes: entity extraction module, and entity extraction module is configured to Inputting from the user of user interface receiving portion search inquiry parameter, it is imperfect that this partial search query parameter has at least one Search inquiry parameter, entity extraction module be further configured to by by partial search query parameter with have one or more First instance is made comparisons in the entity co-occurrence data storehouse of the example of co-occurrence in electronic data corpus, and identification is looked into part searches Ask at least one entity type that the one or more first instances in parameter are corresponding, come from partial search query parameter extraction one Individual or multiple first instances.This system farther includes: fuzzy score matching module, and fuzzy score matching module is configured to select Fuzzy matching algorithm, for searching entities co-occurrence data storehouse, to identify or many with partial search query parameter association Individual record, wherein, fuzzy matching algorithm is corresponding with at least one entity type identified.Fuzzy score matching module is further It is configured that the fuzzy matching algorithm selected by use comes searching entities co-occurrence data storehouse, and based on described search from one or many Individual record forms the search inquiry parameter of one or more first suggestion;And present one or more first via user interface The search inquiry parameter of suggestion.It addition, entity extraction module is further configured to: receive and one or more first suggestions are searched The user of rope query argument selects, to form complete search inquiry parameter;From complete search inquiry parameter extraction one or Multiple second instances;Searching entities co-occurrence data storehouse, to identify the one or more realities relevant to one or more second instances Body, thus form the search inquiry parameter of one or more second suggestion;And present one or more via user interface The search inquiry parameter of two suggestions.
Disclose a kind of method for using entity to obtain the search suggestion relevant to entity with feature co-occurrence.At this Disclosed one side, method can be used for including the search system of client/server type architecture.
A kind of search system, makes using the following method, and the method is usable in allowing entity data bak and trend database One or more servers in storage entity.Entity on such data base can have mark, for based on higher Mark index.The information that is stored in two data base be can be combined, for generation for obtaining the method for search suggestion Single search suggestion lists.Trend database can provide from it of the one or more users in LAN and/or the Internet Front search inquiry.Entity data bak can be based on multiple extracting data entities obtainable on LAN and/or the Internet Search suggestion is provided.This list can provide the user with more accurate and faster one group of suggestion.
In one embodiment, a kind of computer implemented method includes: included from search engine reception by computer The search inquiry of one or more serial datas, the most each corresponding entity is corresponding with the subset of one or more serial datas;Based on One or more entities are made comparisons facing to entity data bak and trend database, comes by the one or more number of computer identification According to the one or more entities in string;By in the one or more serial data of computer identification, be identified as not with at least one One or more features that individual entity is corresponding;By computer based on matching algorithm by each feature in one or more features Distribute at least one entity in one or more entity;By computer based on be assigned to distribution to corresponding entity each The mark of individual features, extracts mark to the distribution of each corresponding entity;One is comprised from entity data bak reception by computer Individual or the first search listing of multiple entity, the one or more entity has the extraction mark away from each corresponding entity at threshold Mark in value distance;Received the second search listing comprising one or more entity, institute from trend database by computer State one or more entity and there is the extraction mark mark in threshold distance away from each corresponding entity;Generated by computer Including the list after the gathering of the first search listing and the second search listing, wherein, according to the list after each corresponding gathering Mark the entity of list after assembling is carried out ranking;And by computer according to proposed by the list offer after assembling Search.
Disclosed herein is can be at Content Management System such as Microsoft SharePointIn carry out based on geographical entity The system and method for search.The method that embodiment describes relates to being extended by interpolation GEOGRAPHICAL INDICATION network service SharePointSearch framework.This system includes with computer storage and one or more I/O equipment operationally The computer processor of association, wherein processor and memorizer are configured to operate one or more SharePointPlace Reason.System also includes another computer processor operationally associated with computer storage and one or more I/O equipment, Wherein, processor and memorizer are configured to deposit and provide the process for GEOGRAPHICAL INDICATION network service.SharePointSystem can include crawling assembly, contents processing assembly and search indexing component so as to search for content. SharePointContents processing assembly in search can be by using abundant in content network service (Content Enrich Web Service, CEWS) feature extends its function.
The method relates to crawling content from different sources, crawls obtaining send to carry out contents processing a collection of Attribute.During contents processing, trigger condition can determine that whether the attribute crawled can process from other and is benefited, in order to by other Geo-metadata software attribute enriches original contents.If the attribute crawled does not processes from other and is benefited, then the attribute crawled can It is mapped to managed process and is sent to search index.If the attribute crawled processes from external web services and is benefited, Then CEWS can use HTML (Hypertext Markup Language) (HTTP) or any other web services call method to make to configurable end points Simple Object Access Protocol (SOAP) is asked.Entity enriches service and can determine that the type of content.If content is in picture format, Then its metadata such as document location may be sent to that optical character recognition (OCR) engine so that can retrieve asynchronously and locate Reason original document is to be converted into text and to send back and crawl assembly, thus is again crawled with text formatting.If content is in Text formatting, then GEOGRAPHICAL INDICATION network service can recognize that geo-metadata software and is allowed to close with content as the attribute managed Connection.After content being carried out GEOGRAPHICAL INDICATION, content may be sent to that indexing component.
Use SharePointNetwork components, or by use standard network developing instrument such as HTML, HTML5, JavaScript and CSS etc. revise SharePointThe standard layout of search, can add other search users Interface (UI).Search UI can help user perform geographic search inquiry, or use numeral geographical feature the most such as but not as The numerical map limited shows geographical Search Results.Entity that search UI also can be enhanced to use other to be enriched or and it The metadata that associates to perform facet search.
Other aspects multiple, feature and the benefit of the disclosure can become obvious from detailed description below.
Accompanying drawing explanation
By referring to figure below, it is better understood the disclosure.Assembly in accompanying drawing is not necessarily to scale, on the contrary It is important that focus on the principle illustrating the disclosure.In the accompanying drawings, different views, the parts that reference number instruction is corresponding are run through.
Fig. 1 is the block diagram of the exemplary environments illustrating computer system, and wherein an embodiment of the disclosure can be at this meter Operate under the exemplary environments of calculation machine system;
Fig. 2 is the flow chart illustrating the method for using entity co-occurrence to scan for according to embodiment;And
Fig. 3 is the flow chart of the embodiment illustrating simple search, and the Search Results wherein returned by system can include feeling emerging The related entities of interest.
Fig. 4 is the block diagram illustrating exemplary system environment, and wherein an embodiment of the disclosure can be in this example system Operate under environment;
Fig. 5 be illustrate according to embodiment for using the entity co-occurrence in knowledge base and fuzzy score coupling to search for and build The flow chart of the method for view;And
Fig. 6 is the figure of the example illustrating user interface, wherein uses entity co-occurrence in the knowledge base of Fig. 4-6 and fuzzy Join, search suggestion can be produced by this user interface.
Fig. 7 is the block diagram illustrating exemplary system environment, and wherein an embodiment of the disclosure can be in this example system Operate under environment.
Fig. 8 is to illustrate to generate searching of related entities according to embodiment for mating based on co-occurrence and/or fuzzy score The flow chart of the method for Suo Jianyi.
Fig. 9 be with Fig. 8 described in the exemplary embodiment of user interface that associates of method.
Figure 10 is the block diagram illustrating the method for obtaining search suggestion based on entity and trend database.
Figure 11 is to illustrate for generating suggestion lists by independent mark based on the search suggestion in each data base, and The block diagram of the method for search suggestion is obtained based on entity and trend database.
Figure 12 is to illustrate the gross score generation suggestion lists for by advising based on the search in two data bases, and base The block diagram of the method for search suggestion is obtained in entity and trend database.
Figure 13 is the system architecture that the labelling for content in Content Management System and entity are abundant.
Figure 14 is a kind of process, is searched for that named and geographical entity by this process, labelling index content Rope.
Definition
As used herein, terms below can have a following definition:
" entity extraction " refers to the information processing method for extracting information (such as title, position and tissue).
" corpus " refers to the set of one or more document.
" feature " is any information obtained from document at least in part.
" Event Concepts warehouse " refers to the data base of event-template model.
" event " refers at least by the one or more features occurring in real time characterizing of feature.
" event model " refers to the set of data, can be used for that the collection facing to data is incompatible makes comparisons and identify particular type Event.
" module " refers to be adapted for carrying out the computer of at least one or more task or component software.
" characteristic attribute " refers to the metadata with feature association, such as feature position in a document, confidence Deng.
" true " refers to the objective relation between feature.
" entity knowledge base " refers to comprise the Computer Database of feature/entity.
" inquire about " and refer to the request that computer generates, with from one or more applicable database retrieval information.
" theme " refers to the one group of subject information obtained at least in part from corpus.
" GEOGRAPHICAL INDICATION " refers to extract the process of geographical entity from non-structured text file.GEOGRAPHICAL INDICATION can include Eliminate entity relative to specific geographic position and additional geo-metadata software (such as geographical coordinate, geographical feature type and other yuan Data) ambiguity.
" entity indicia " refers to extract the process of the entity named from non-structured text.Entity indicia can include Entity disambiguation, entity name standardization and additional entities metadata.
" entity named " refers to people, tissue or theme.
" geographical entity " refers to geographical position or geographic location.
" attribute crawled " refers to crawling period from checking the Content Management System metadata obtained document.
Detailed description of the invention
Now, with detailed reference to preferred embodiment, its example is shown in the drawings.Embodiment described above is intended to show Example.Those skilled in the art it is appreciated that multiple optional assembly and the alternative specific examples described herein of embodiment, And in the range of being still within.Without departing from the spirit or the scope of the present disclosure, other embodiments can be used And/or other changes can be made.The illustrative embodiment described in a specific embodiment does not means that becomes in this paper The restriction of theme.
It will be understood, however, that it is therefore intended that the scope of the present invention is not restricted.Can be by association area and gather around The change of the herein described inventive features that the technical staff having the disclosure expects and further modification and as illustrated herein The present invention principle other application, be considered to be located within the scope of the present invention.
Present disclosure describes a kind of system and method for detecting, extract and verify event from multiple sources.Source Ke Bao Include news sources, social media website and/or any source of the data about event can be included.
Each embodiment of system and method disclosed herein gathers data from different sources, to identify independent event.
Fig. 1 is the block diagram of the search system 100 according to the disclosure.Search system 100 can include one or more client Calculating equipment, the one or more client computing device includes processor, and processor performs and searches for what system 100 associated Software module, search system 100 can include that graphic user interface 102, graphic user interface 102 access search engine 104, search Engine 104 exchanges, by network 108, the search inquiry being in binary data form with server apparatus 106.In exemplary reality Executing in example, search system 100 is implemented in client-server computing architecture.It is to be recognized, however, that search System 100 can use other computer architectures (such as, stand-alone computer, there is the mainframe system of terminal, application service carries For business (ASP) model and peer-to-peer model etc.) realize.Network 108 can include can transmitting numerical data between computing devices Any suitable hardware and software module, such as LAN, wide area network, the Internet, wireless network and mobile telephone network etc.. So, it is further to be understood that, system 100 can realize on single network 108, or uses multiple network 108 to realize.
The calculating equipment 102 of user may have access to search engine 104, and search engine 104 can include transmitting search inquiry Software module.Search inquiry is provided to search engine 104, instruction by the parameter of the expectation information of retrieval.Can by with Family or another software application are with any suitable data form compatible with the parsing and handling routine of search engine 104 (such as, Integer, string, complex object) search inquiry is provided.In certain embodiments, search engine 104 can be network work Tool, this instrument can be accessed by the browser of the calculating equipment 102 of user or other software application, and make user or software Application can position the information on WWW.In certain embodiments, search engine 104 can be the software of system 100 self Application module, makes user or the application can information in the data base of alignment system 100.
Server apparatus 106 can be implemented as individual server equipment 106 or be implemented in across multiple server computers Distributed structure/architecture in, server apparatus 106 can include entity extraction module 110, entity co-occurrence knowledge base 112 and entity rope Draw corpus 114.Entity extraction module 110 can be can be from a given group polling (such as, query string and structural data Deng) extract independent community and eliminate the computer software of ambiguity and/or the hardware module of independent community.The example of entity can include People, tissue, geographical position, date and/or time.During milking, one or more feature recognition and extraction algorithm can be used. Additionally, mark can be assigned to the feature of each extraction, the instruction of this mark correctly extracts feature really by correct attribute Qualitative level.Consider characteristic attribute, it may be determined that the relative weighting of each feature or dependency.It addition, the scoring of weighting can be used Model determines the dependency of the association between each feature.
According to each embodiment, entity co-occurrence knowledge base 112 can be created as but be not limited to memory computer data base (not Illustrate), and other assembly (not shown) can be included, the most one or more search controllers, multiple search node, compression data Set and disambiguation computer module.One search controller optionally closes with one or more search nodes Connection.Each search node can run through the set of compression data and perform fuzzy keyword search independently and by one group of appraisal result Return to the search controller of associated.
That entity co-occurrence knowledge base 112 can include feature based and related entities according to confidence ranking.Can make With for the various methods of feature ranking, these methods substantially can use be used for determining which entity type most important, which There is the weighted model of bigger weight a bit, and these methods determine the most how to perform to confidence based on confidence The extraction of proper characteristics.Entity index corpus 114 can include (such as having big corpus or the language material lived from multiple sources The Internet in storehouse) data.
Fig. 2 be illustrate that can realize in the search system 100 (the search system described the most in FIG), be used for using Entity co-occurrence searches for the flow chart of the method 200 of related entities.According to each embodiment, before method 200 starts, entity Index corpus 114 (be similar to described by Fig. 1 that) can be supplied with from multiple sources (such as electronic data big Corpus or live corpus (such as, the Internet, website, blog, word-processing document, text-only file)) data.Entity Index corpus 114 can include multiple indexed entity, can update the plurality of quilt continuously when finding new data The entity of index.
In one embodiment, in step 202, when user or the software application of calculating equipment 102 carry to search engine 104 When supplying the one or more search inquiry comprising one or more entity, method 200 can start.In the search that step 202 provides Inquiry can be processed by search system 100, every time from 1 to n.The example of the search inquiry in step 202 can be crucial Contamination, such as string, structural data or other suitable data forms.In this exemplary embodiment of Fig. 2, search is looked into The key word ask can be the entity of representative, tissue, geographical position, date and/or time.
Subsequently the search inquiry from step 202 can be processed for the entity extraction in step 204.In this step Suddenly, the search inquiry from step 202 can be processed into entity by entity extraction module 110, and by all of which and entity co-occurrence Knowledge base 112 is made comparisons, to extract entity as much as possible and to eliminate the ambiguity of entity as much as possible.During milking, may be used Use one or more feature recognition and extraction algorithm.Additionally, mark can be assigned to the feature of each extraction, this mark indicates Feature qualitative level really is correctly extracted by correct attribute.Consider characteristic attribute, it may be determined that the relative power of each feature Weight or dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
Additionally, the various methods for linking feature can be used, these methods substantially can use and be used for determining Which entity type is most important, which has the weighted model of bigger weight, and these methods are come based on confidence Determine the extraction performing to confidence proper characteristics.Once extract entity and based on confidence to entity ranking, Then in step 206, index ID (can be numeral in some cases) can be distributed to extracted entity.
It follows that in step 208, search based on the entity index ID distributed in step 206 can be performed.At search step 208, can index, at entity, the entity that the data inner position of corpus 114 is extracted by using standard indexing means.Once fixed The position entity extracted, then can follow and carry out entity associated step 210.In entity associated step 210, language can be indexed from entity Material storehouse 114 pulls out all data (such as document, video, picture, file etc.) of the entity overlap that at least two of which is extracted.? After, set up potential the results list, according to dependency, potential the results list is ranked up, and using potential the results list as search Result presents to user, i.e. step 212.This results list can the most only show the connection leading to following data, in these data User can find related entities interested.
Fig. 3 is as associatedly discussed with Fig. 2 above, for using entity co-occurrence to search for the method 300 of related entities Concrete example.As described in Fig. 2, according to each embodiment, before method 300 starts, entity index corpus 114 (be similar to described by Fig. 1 that) can be supplied with from multiple sources (the biggest corpus or the language material lived Storehouse (the Internet)) data.Entity index corpus 114 can include multiple indexed entity, can hold when finding new data Continue and be continuously updated the plurality of indexed entity.
In the present example embodiment, user can find has with " Qiao Busi (Jobs) " in company " Fructus Mali pumilae (Apple) " The information closed.To this end, user can input one or more entity by user interface 102, (such as, the search in step 302 is looked into Ask), wherein user interface 102 can be but not limited to have the interface of search engine 104 (described in such as Fig. 1 that). By the way of explanation rather than by the way of restriction, user can input the combination of entity, such as " Fructus Mali pumilae+Qiao Busi ".Connect Getting off, search engine 104 can generate search inquiry i.e. step 302, and these inquiries are sent to server apparatus 106 place Reason.At server apparatus 106, entity extraction module 110 can be from performing entity extraction step the search inquiry that step 302 inputs Rapid 304.
Entity extraction module 110 can be subsequently by the search inquiry such as " Fructus Mali pumilae " inputted in step 302 and " Qiao Busi " place Manage into entity, and make comparisons extract entity as much as possible and eliminate to the greatest extent facing to entity co-occurrence knowledge base 112 by all of which The ambiguity of entity that may be many.During milking, one or more feature recognition and extraction algorithm can be used.Additionally, mark can Being assigned to the feature of each extraction, the instruction of this mark correctly extracts feature qualitative level really by correct attribute.Examine Consider characteristic attribute, it may be determined that the relative weighting of each feature or dependency.It addition, the Rating Model of weighting can be used to determine respectively The dependency of the association between feature.
Additionally, the various methods for linking feature can be used, these methods substantially can use and be used for determining Which entity type is most important, which has the weighted model of bigger weight, and these methods are come based on confidence Determine the extraction performing to confidence proper characteristics.As a result, can create and include the table 306 of entity and co-occurrence.Table 306 can Showing entity " Fructus Mali pumilae " and co-occurrence thereof subsequently, in this case, co-occurrence can be Fructus Mali pumilae and Qiao Busi, Fructus Mali pumilae and Si Difuqiao Buss (Steve Jobs).Table 306 may also include Fructus Mali pumilae and can be the discovery that relevant tissue A, its reason be tissue A with Fructus Mali pumilae Make business and generate " Qiao Busi " in described tissue A.Other co-occurrences that importance is relatively low can be found.So, Fructus Mali pumilae and Qiao Bu This can have the highest mark (1) subsequently, is therefore listed in top, and then can to have second the highest for Fructus Mali pumilae and Si Difuqiao Buss Mark (0.8), finally due to have minimum mark (0.3), Fructus Mali pumilae and its hetero-organization A can be listed in bottom.
Once extract entity and based on confidence to entity ranking, then (index ID can in some cases to index ID To be numeral) extracted entity can be assigned in step 308.Table 310 shows the rope distributing to extracted entity Draw ID.Table 310 shows " Fructus Mali pumilae " with index ID 1 subsequently, have index ID 2 " Qiao Busi ", there is index ID 3 " Si Difuqiao Buss " and have index ID 4 " tissue A ".
It follows that search step 312 based on entity index ID 308 can be performed.At search step 312, can be by using Standard indexing means is at the entity of the data inner position extraction of entity index corpus 114, such as " Fructus Mali pumilae ", " Qiao Busi ", " this Di Fuqiao Buss " and " tissue A ".
After the entity that entity index corpus 114 inner position is extracted, can follow and carry out entity associated 314 step.? Entity associated step 314, can pull out all numbers of the entity overlap that at least two of which is extracted from entity index corpus 114 According to (such as document, video, picture or file etc.), to establish the link list as Search Results (step 318).By explanation Mode rather than by restriction by the way of, table 316 show the entity of extraction can how to be associated with entity index corpus 114 in Data.In table 316, document 1,4,5,7,8 and 10 shows the overlap of two entities extracted, and therefore in step 318, uses Link in these documents can be shown as Search Results.
Fig. 4 is the block diagram of the search computer system 400 according to the disclosure.Search system 400 can include to search engine One or more user interfaces 402 of 404, search engine 404 is communicated with server apparatus 406 by network 408.In this enforcement In example, search system 400 is implemented in one or more special-purpose computer and computer module referenced below, and it includes Framework by client/server type.But, search system 400 can use other computer architectures (such as, independent meter Calculation machine, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) realize.In an embodiment, search computer System 400 includes multiple network, such as LAN, wide area network, the Internet, wireless network and mobile telephone network etc..
Search engine 404 can include user interface, the most network instrument, and this instrument allows users to position ten thousand dimensions Online information.Search engine 404 may also include user interface tool, and this instrument allows users to localization of internal Database Systems Interior information.Server apparatus 406 was implemented in individual server equipment 406 or in dividing across multiple server computers In cloth framework, server apparatus 406 can include entity extraction module 410, fuzzy score matching module 412 and entity co-occurrence Repository database 414.
Entity extraction module 410 can be hardware and/or software module, this hardware and/or software module be configured to A fixed group polling (such as, query string, partial query and structural data etc.) immediately extracts independent community and immediately eliminates solely The ambiguity of vertical entity.The example of entity can include people, tissue, geographical position, date and/or time.During milking, can use One or more feature recognition and extraction algorithms.Additionally, mark can be assigned to the feature of each extraction, the instruction of this mark is passed through Correct attribute correctly extracts feature qualitative level really.Consider characteristic attribute, it may be determined that the relative weighting of each feature or Dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between feature.
Fuzzy score matching module 412 can include polyalgorithm, can be according to the entity extracted from given search inquiry Type selects the plurality of algorithm.The function of algorithm can determine that: is inputted by user and the given search that receives Inquire about the most similar to by other searched strings arrived of algorithm identification, or whether with general given of pattern string Join.Fuzzy matching also can be known as fuzzy String matching, inexact matching and probably mate.Entity extraction module 410 is with fuzzy Fractional matching module 412 can work in combination with entity co-occurrence knowledge base 414, to generate search suggestion to user.
According to each embodiment, entity co-occurrence knowledge base 414 can be created as but be not limited to memory database, and can include Multiple assemblies, the most one or more search controllers, multiple search node, the set of compression data and disambiguation mould Block.One search controller optionally associates with one or more search nodes.Each search node can run through compression The set of data performs fuzzy keyword search independently and one group of appraisal result returns to the search controller of associated.
That entity co-occurrence knowledge base 414 can include feature based and related entities according to confidence ranking.Can make By the various methods for linking feature, these methods substantially can use and be used for determining which entity type is the heaviest , which has the weighted model of bigger weight, and these methods determine the most how confidence based on confidence Ground performs the extraction of proper characteristics.
Fig. 5 is to illustrate the method for using the fuzzy score coupling in knowledge base and entity co-occurrence to generate search suggestion The flow chart of 500.Method 500 can be realized in search system 400 (be similar to described by Fig. 4 that).
In one embodiment, starting to search engine interface as describing such as Fig. 4 as user in step 502 When getting search inquiry in 402, method 500 can start.When getting search inquiry in step 502, search system 400 can perform Instant process.According to each embodiment, the search inquiry input in step 502 can be complete or part, it may be possible to just That really spell or misspellings.Then, in search system 400, the search inquiry of step 502 can be inputted executable portion entity Extraction step 504.Part entity extraction step 504 can run fast search facing to entity co-occurrence knowledge base 414, to identify in step Whether the search inquiry of rapid 502 inputs is entity, if entity, then it is which type of entity.According to each embodiment, The search inquiry input of step 402 can mention people, tissue, local position and date etc. subsequently.Once identify search inquiry The entity type of input, the then optional corresponding fuzzy matching algorithm i.e. step 506 of fuzzy score matching module 412.Such as, as Really search inquiry is identified as mentioning the entity of people, and fuzzy score matching module 412 can be the most such as by extracting people the most subsequently The different component (including name, Christian name, surname and title) of name, selects the string matching algorithm for people.In another embodiment, as Really search inquiry is identified as mentioning the entity of tissue, and fuzzy score matching module 412 is alternatively used for tissue (it can the most subsequently Including such as the identification term of institute, university, company and limited company etc.) string matching algorithm.Fuzzy score matching module 412 can select the string matching algorithm corresponding with the entity type identified in search inquiry input, subsequently to configure this search.Once String matching algorithm is adjusted to the type for the entity identified, then can perform fuzzy score coupling step 508.
Mate step 508 at fuzzy score, can search for one or more entities and the non-physical extracted, and face toward Entity co-occurrence knowledge base 414 is made comparisons.The entity extracted can include the first character of incomplete name, such as name and surname Symbol, abbreviation (such as can represent " UN " of " the United Nations "), short form and the pet name etc. of tissue.Entity co-occurrence knowledge base 414 can Can be already registered with being indexed as structural data (such as, entity and entity, entity and theme and entity and the fact etc.) Multiple records.The latter can allow the fuzzy score coupling in step 508 to occur in a very quick way.Mould in step 508 Stick with paste fractional matching to use but be not limited to commonly go here and there tolerance, such as Levenshtein distance, strcmp95 and ITF scoring etc..Two Levenshtein distance between individual word may refer to that a word changes over the single character needed for another word and compiles The minimum number collected.
Finally, once fuzzy score coupling step 508 completes searching facing to all records in entity co-occurrence knowledge base 414 The rope comparison that carries out of inquiry and search, can arrange or most closely mate given pattern string (that is, the search of step 502 Inquiry input) record can be selected for use in step 510 search suggestion the first candidate.The most closely coupling is given Pattern string other record can be with descending sort below the first candidate.Subsequently the search in step 510 can be advised with can The drop-down list of the coupling of energy presents to user, and user is negligible or can not ignore this drop-down list.
Fig. 6 is according to for discussing, use the entity co-occurrence in knowledge base and fuzzy score the most in figs. 4-5 Join the exemplary user interface 600 generating the method for search suggestion.In this example, (similar by search engine interface 602 In described by Fig. 4 that), user in search box 606 importation inquiry 604.By the way of explanation rather than logical Crossing the mode limited, partial query 604 can be the incomplete title of people, " Michael J " the most as shown in Figure 6.It Be considered part searches 604, its reason be user may the most non-selected search button 608, or additionally to search system 400 submit to partial query 604 to perform actual search and to obtain result.
Follower method 500 (Fig. 5), when user gets " Michael J ", entity extraction module 410 is facing to entity co-occurrence Knowledge base 414 performs the most instant of first word (Michael) and searches for, to identify the type of entity, in this example, Entity can be mentioned that name.Therefore, the optional string matching algorithm being exclusively used in name of fuzzy score matching module 412.Can be with not Same form (such as, only use initial (short form), or the first character of first name and last name, or name, the initial of Christian name and Surname, or their any combination) data base that writes finds name.Fuzzy score matching module 412 can use commonly degree of string Measure such as Levenshtein distance, with determine mark and in entity co-occurrence knowledge base 414, can be with entity " Michael " Entity, theme or the true distribution mark joined.In this example, Michael and a large amount of record matchings with this title.So And, when user gets character " J " followed, fuzzy score matching module 412 can be based on Levenshtein distance, facing to tool There are all co-occurrences of Michael and perform to compare with another of entity co-occurrence knowledge base 414.Entity co-occurrence knowledge base 414 can be with Rear selection has all possible coupling of highest score for " Michael J ".Such as, fuzzy score matching module 412 can By search suggestion 610 such as " Michael Jackson ", " Michael Jordan ", " Michael J.Fox " or at some In the case of even " Michael Dell " return to user.User be then able to from drop-down list select proposed by one of people with Complete search inquiry.About the extension of aforementioned exemplary, the inquiry such as " basket baller Michael " can be produced based on following result The suggestion of raw " Michael Jordan ": wherein by entity co-occurrence knowledge base in the entity name modification of seeker " basket baller " in " Michael " and co-occurrence feature (such as key phrase, the fact and theme) and return this result.Another Example can be " performer Alexander ", can produce the suggestion of " Alexander Polinsky ".Those skilled in the art will recognize Know to, the search platform that presently, there are will not may generate suggestion in the foregoing manner.
Fig. 7 is the block diagram of the search system 700 according to the disclosure.Search system 700 can include to search engine 704 Individual or multiple user interfaces 702, search engine 704 is communicated with server apparatus 706 by network 708.In the present embodiment, search Cable system 700 is implemented in the framework of client/server type;But, search system 700 can use other computers Framework (such as, stand-alone computer, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) and multiple network (such as, LAN, wide area network, the Internet, wireless network, mobile telephone network etc.) realize.
Search engine 704 may include but be not limited to the interface by network instrument, and this instrument allows users to fixed Information on WWW, position.Search engine 704 may also include the work of the information allowed users in localization of internal Database Systems Tool.It is implemented in the server in individual server equipment 706 or in the distributed structure/architecture across multiple server computers Equipment 706, it may include entity extraction module 710, fuzzy score matching module 712 and entity co-occurrence repository database 714.
Entity extraction module 710 can be hardware and/or software computer module, this hardware and/or software computer mould Block can extract independent community from a given group polling (such as, query string, partial query and structural data etc.) immediately And immediately eliminate the ambiguity of independent community.The example of entity can include people, tissue, geographical position, date and/or time.Carrying Take period, one or more feature recognition and extraction algorithm can be used.Additionally, mark can be assigned to the feature of each extraction, The instruction of this mark correctly extracts feature qualitative level really by correct attribute.Consider characteristic attribute, it may be determined that Mei Gete The relative weighting levied or dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
Fuzzy score matching module 712 can include polyalgorithm, can be according to the entity extracted from given search inquiry Type regulate or select the plurality of algorithm.The function of algorithm can determine that: given search inquiry (input) and institute Whether the searched string of suggestion is the most similar, or probably mate with given pattern string.Fuzzy matching may be additionally referred to as Obscure String matching, inexact matching and probably mate.Entity extraction module 710 and fuzzy score matching module 712 can be with realities Body co-occurrence knowledge base 714 works in combination, thinks that user generates search suggestion.
According to each embodiment, entity co-occurrence knowledge base 712 can be created as but be not limited to memory database, and can include Multiple assemblies, the most one or more search controllers, multiple search node, the set of compression data and disambiguation mould Block.One search controller optionally associates with one or more search nodes.Each search node can run through compression The set of data performs fuzzy keyword search independently and one group of appraisal result returns to the search controller of associated.
That entity co-occurrence knowledge base 714 can include feature based and related entities according to confidence ranking.Can make By the various methods for linking feature, these methods substantially can use and be used for determining which entity type is the heaviest , which has the weighted model of bigger weight, and these methods determine the most how confidence based on confidence Ground performs the extraction of proper characteristics.
Fig. 8 is to illustrate the method for mating the search suggestion generating related entities based on co-occurrence and/or fuzzy score The flow chart of the embodiment of 800.Method 800 can realize in search system 700 (be similar to Fig. 7 described in that).
In one embodiment, look into when user starts to get search in the search engine 704 as described in the most in the figure 7 During the i.e. step 802 of inquiry, method 800 can start.When getting search inquiry, search system 700 can perform instant process.According to respectively Individual embodiment, search inquiry can be complete and/or part, it may be possible to correct that spell and/or misspellings.Connect down Come, the part entity extraction step 804 of search inquiry can be performed.Part entity extraction step 804 can face toward entity co-occurrence knowledge Fast search is run in storehouse 714, to identify whether search inquiry includes entity, if including entity, then identifies entity type.Root According to each embodiment, the entity of search inquiry can be mentioned that people, tissue, local position and date etc..It is once entity, then mould Stick with paste the optional corresponding fuzzy matching algorithm i.e. step 806 of fractional matching module 712.Such as, if search inquiry is identified as Mentioning that the entity of people, fuzzy score matching module 712 scalable or selection the most subsequently are used for the string matching algorithm of people, it can extract The different component of name, including name, Christian name, surname and title.In another embodiment, if search inquiry is identified as mentioning The entity of tissue, fuzzy score matching module 712 scalable or selection the most subsequently is used for organizing that (it can include such as institute, big Learn, company and the identification term of limited company) string matching algorithm.Therefore, fuzzy score matching module 712 regulate or Select the string matching algorithm for entity type, in order to search.Once string matching algorithm be conditioned or be selected to corresponding to Entity type, then can perform fuzzy score coupling in step 808.
Mate step 808 at fuzzy score, can search for one or more entities of extraction and any non-physical, and will One or more entities and any non-physical are made comparisons facing to entity co-occurrence knowledge base 714.The one or more realities extracted Body can include the first character of incomplete name, such as name and surname, the abbreviation of tissue (such as can represent " the United Nations " " UN "), short form and pet name etc..Entity co-occurrence knowledge base 714 may be already registered with according to structural data (such as entity With entity, entity and theme and entity and fact index etc.) multiple records of indexing.This can allow obscuring in step 808 Fractional matching promptly occurs.Fuzzy score coupling can use but be not limited to commonly go here and there tolerance, such as Levenshtein apart from, Strcmp95 and ITF scoring etc..Levenshtein distance between two words may refer to a word is changed over another The minimum number of the single character editing needed for individual word.
Once the coupling of the fuzzy score in step 808 step completes facing to all records in entity co-occurrence knowledge base 714 The comparison carrying out search inquiry and search, can arrange or the given pattern string of most closely coupling search inquiry input Record can be selected for use in search suggestion the first candidate, i.e. step 810.The most closely coupling search inquiry inputs Other records of given pattern string can be with descending sort below the first candidate.Subsequently the search in step 810 can be advised Presenting to user with the drop-down list of possible coupling, user may select this drop-down list so that this inquiry is complete.
In another embodiment, after user selects the coupling that he/her is interested, search system 700 can be by this selection As new search inquiry, i.e. step 812.It follows that described new search inquiry can be performed entity extraction step 814.? During extraction, one or more feature recognition and extraction algorithm can be used.Additionally, mark can be assigned to the spy of each extraction Levying, the instruction of this mark correctly extracts feature qualitative level really by correct attribute.Consider characteristic attribute, it may be determined that each The relative weighting of feature or dependency.It addition, the Rating Model of weighting can be used to determine the relevant of association between each feature Property.Entity extraction module 710 can face toward entity co-occurrence knowledge base 714 run search, subsequently with based on having being total to of highest score Now find related entities i.e. step 816.Finally, in step 818, the actual search of data can be performed in electronic document corpus The drop-down list of the search suggestion including related entities is presented to user before.
Fig. 9 is and the method 800 for mating the search suggestion generating related entities based on co-occurrence and/or fuzzy score The exemplary embodiment of the user interface 900 of association.In this example, (it is similar to pass through Fig. 7 by search engine interface 902 Describe that), user in search box 906 importation inquiry 904.By the way of explanation rather than by the side limited Formula, partial query 304 can be the incomplete title of people, " Michael J " the most as shown in Figure 9.It may be considered that Part searches 904, its reason be user may the most non-selected search button 908, or additionally submit part to search system 100 Inquiry 904 is to perform actual search and to obtain result.
Follower method 800, when user gets " Michael J ", entity extraction module 710 is facing to entity co-occurrence knowledge base 714 pairs first word (Michael) performs the most instant search, and to identify the type of entity, in this example, entity can carry And name.Then, the optional string matching algorithm being exclusively used in name of fuzzy score matching module 712.Can be in (example in different forms As, only use initial (short form), or name and the first character of surname, or name, the initial of Christian name and surname, or they Any combination) data base that writes finds name.Fuzzy score matching module 712 can use common string tolerance such as Levenshtein distance, to determine mark and to reality in entity co-occurrence knowledge base 714, that can mate with entity " Michael " Body, theme or true distribution mark.In this example, Michael and a large amount of record matchings with this title.But, when with When character " J " followed is got at family, fuzzy score matching module 712 can be based on Levenshtein distance, facing to having All co-occurrences of Michael and perform to compare with another of entity co-occurrence knowledge base 714.Entity co-occurrence knowledge base 714 can be subsequently Select all possible coupling for " Michael J " with highest score.Such as, fuzzy score matching module 712 can be by Make search suggestion 910 such as " Michael Jackson ", " Michael Jordan ", " Michael that " Michael J " is complete J.Fox " or the most even " Michael Dell " return to user.User is then able to select from drop-down list One of proposed people, or ignore this suggestion and continue typewriting.About the extension of aforementioned exemplary, such as " basket baller Michael " inquiry can produce the suggestion of " Michael Jordan " based on following result: wherein by searching entities altogether In existing knowledge base, " Michael " in seeker's entity name modification and co-occurrence feature are (such as key phrase, the fact and theme Deng) in " basket baller ", and return this result.Another example can be " performer Alexander ", can produce The suggestion of " Alexander Polinsky ".As those skilled in the art will recognize that, existing search platform may will not carry For the suggestion generated in the foregoing manner.
In the present embodiment, user can select " Michael Jordan " from drop-down list, so that partial query 904 is complete Whole, as indicated in fig. 9.Described selection can be processed into new search inquiry 912 by search system 700 subsequently.Connect down Come, described new search inquiry 912 can be performed entity extraction.During milking, can use one or more feature identification and Extraction algorithm.Additionally, mark can be assigned to the feature of each extraction, the instruction of this mark is correctly extracted by correct attribute Feature qualitative level really.Consider characteristic attribute, it may be determined that the relative weighting of each feature or dependency.Add it addition, can use The Rating Model of power determines the dependency of the association between each feature.Entity extraction module 710 can face toward entity co-occurrence subsequently Knowledge base 714 runs the search for " Michael Jordan ", relevant real to find based on the co-occurrence with highest score Body.Finally, can be by the search suggestion including related entities before performed actual search by click search button 908 Drop-down list 914 presents to user.Aforementioned system and method described in Fig. 7-9 can be quick and be convenient to user , its reason is that user can find useful relation.
Figure 10 is the block diagram of the search system 1000 according to the disclosure.Search system 1000 can include search engine 1002, Such search engine 1002 can include one or more user interface, thus allows the data from user to input, such as, use Family is inquired about.
Search system 1000 can include one or more data base.Such data base can include entity data bak 1004 He Trend database 1006.Data base can be stored in home server or in network server.Therefore, search system System 1000 is implemented in the framework of client/server type;But, search system 1000 can use other computer racks Structure (such as, stand-alone computer, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) and multiple network (example As, LAN, wide area network, the Internet, wireless network and mobile telephone network etc.) realize.
Search engine 1002 may include but be not limited to network instrument, and this instrument allows users to position on WWW Information.Search engine 1002 may also include the instrument of the information allowed users in localization of internal Database Systems.
Entity data bak 1004 can be implemented as individual server or be implemented in the distributed structure/architecture across multiple servers In.Entity data bak 1004 can allow a group object inquiry, such as query string and structural data etc..Such group object is looked into Inquiry can be extracted in obtainable multiple corpus in advance from the Internet and/or local network.Object query can be indexed and comment Point.The example of entity can include people, tissue, geographical position, date and/or time.During milking, can use one or more Feature recognition and extraction algorithm.Additionally, mark can be assigned to the feature of each extraction, this mark indicates by correct attribute Correctly extract feature qualitative level really.Consider characteristic attribute, it may be determined that the relative weighting of each feature or dependency.Separately Outward, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
Trend database 1006 can be implemented as individual server or be implemented in the distributed structure/architecture across multiple servers In.Trend database 1006 can allow a group object inquiry, such as query string and structural data etc..Such group object is looked into Inquiry can be extracted in advance from the historical query performed by a user the Internet and/or local network and/or multiple user. Object query can indexed and scoring.The example of entity can include people, tissue, geographical position, date and/or time.Extracting Period, one or more feature recognition and extraction algorithm can be used.Additionally, mark can be assigned to the feature of each extraction, should Mark instruction correctly extracts feature qualitative level really by correct attribute.Consider characteristic attribute, it may be determined that each feature Relative weighting or dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
Entity data bak 1004 and trend database 1006 can include entity co-occurrence knowledge base, and entity co-occurrence knowledge base can quilt It is created as but is not limited to memory database (not shown), and other assembly (not shown), the most one or more search can be included Controller, multiple search node, the set of compression data and disambiguation module.One search controller optionally with One or more search nodes associate.Each search node can run through the set of compression data and perform fuzzy key word independently Search for and one group of appraisal result returned to the search controller of associated.
That co-occurrence knowledge base can include feature based and related entities according to confidence ranking.Can use for right Feature carries out the various methods linked, and these methods substantially can use and be used for determining which entity type is most important, which tool Have a weighted model of bigger weight, and these methods determine based on confidence perform to confidence correct The extraction of feature.
Search system 100 can compare use facing to entity data bak 1004 and trend database 1006 at search engine 1002 Family is inquired about.Can from two data bases, i.e. entity data bak 1004 and trend database 1006 start on search engine 1002 from Dynamic integrated pattern.Search system 1000 can dispose search suggestion lists 1008, each reality can being based upon in data base to user Body advises that the fuzzy score of distribution generates and index such list.The mark of each entity suggestion can be by search system 1000 Automatically distribute and/or by system operator manual assignment.Based on the mark obtained by each entity, entity can be advised from phase Close relevant carrys out ranking to less.It addition, can use from the one or more users' in local network and/or the Internet Trend and enquiry frequency distribute the mark in trend database 1006.
The entity suggestion of each data base can be made comparisons between which, then according to the grade obtained in mark carrys out rope Draw and ranking, therefore combine searching of the suggestion of the entity in two data bases (entity data bak 1004 and trend database 1006) Rope suggestion lists 1008 can be displayed to user.If user selects suggestion from list or selects another knot from suggestion lists Really, the most such information can be saved in trend database 1006 by search system 1000.Therefore, self study system can be allowed System, this reliability that can increase search system 1000 and precision.In short, by the feature and selected extracted from the inquiry of user The suggestion selected, the most more new trend co-occurrence knowledge base, thus the means of instant learning are provided, this improves the relevant of search Property and precision.Further, trend co-occurrence knowledge base can be filled by the different user of the system of use, and also by such as becoming The automated process of gesture detection module is filled.
Figure 11 is the block diagram of the search system 1100 according to the disclosure.Search system 1100 can include search engine 1102, Such search engine 1102 can include one or more user interface, thus allows the data from user to input, such as, use Family is inquired about.
Search system 1100 can include one or more data base.Such data base can include entity data bak 1104 He Trend database 1106.Data base can be stored in home server or in network server.Therefore, search system In 1100 frameworks being implemented in client/server type;But, search system 1100 can use other computer architectures (such as, stand-alone computer, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) and multiple network (example As, LAN, wide area network, the Internet, wireless network and mobile telephone network etc.) realize.
In one embodiment, the input of the user interface in user is by search engine 1102 (in search inquiry) During individual or multiple entity, search system 1100 can start.The example of search inquiry can be in string data form and structuring The crucial contamination of data etc..These key words can be the reality representing people, tissue, geographical position, date and/or time Body.In the present embodiment, " Indiana Na " is used as search inquiry.
" Indiana Na " can be subsequently processed for entity extraction.Entity extraction module can by search inquiry such as " Indiana Na " is processed as entity, and by all of which facing to the entity in entity data bak 1104 and trend database 1106 Co-occurrence knowledge base is made comparisons, to extract entity as much as possible and to eliminate the ambiguity of entity as much as possible.It addition, be detected as not It is taken as the inquiry body part of entity (such as, people, tissue, position) and can be used for searching entities co-occurrence knowledge base (such as, Entity data bak and trend database) concept characteristic (such as theme, the fact, key phrase).During milking, one can be used Individual or multiple feature recognition and extraction algorithms.Additionally, mark can be assigned to the feature of each extraction, this mark indicates by just True attribute correctly extracts feature qualitative level really.Consider characteristic attribute, it may be determined that the relative weighting of each feature or phase Guan Xing.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
In the present embodiment, entity data bak 1104 can show search suggestion lists, as can the reality of indexed and ranking Body suggestion lists 1108.Trend database 1106 can show search suggestion lists, as can indexed and ranking based on trend Suggestion lists 1110.It follows that search system 1100 can provide based on by entity data bak 1104 and trend database 1106 Search suggestion and set up search suggestion lists 1112.Search suggestion lists 1112 can be based on each entity in each data base Suggestion independent mark and indexed and ranking;Therefore, maximally related can first show, less relevant result can be below it Continue.
In search system 1100, disclose the exemplary use for obtaining search suggestion.Search suggestion lists 1112 Suggestion can be shown based on " Indiana Na " user inquiry.As a result, " Indiana Name " can only based on for this entity Found mark 0.9 and first occur, then as the result of independent mark 0.8, can show " Indiana Nascar ", finally can base " Indiana Nashville " is shown in independent mark 0.7.In the case of not applying consideration to repeat entity, entity can be used Independent mark is made comparisons by suggestion lists 1108 and suggestion lists based on trend 1110.
Figure 12 is the block diagram of the search system 1200 according to the disclosure.Search system 1200 can include search engine 1202, Such search engine 1202 can include one or more user interface, thus allows the data from user to input, such as, use Family is inquired about.
Search system 1200 can include one or more data base.Such data base can include entity data bak 1204 He Trend database 1206.Data base can be stored in home server or in network server.Therefore, search system System 1200 is implemented in the framework of client/server type;But, search system 1200 can use other computer racks Structure (such as, stand-alone computer, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) and multiple network (example As, LAN, wide area network, the Internet, wireless network and mobile telephone network etc.) realize.
In one embodiment, the user interface in user is by search engine 1202 inputs one or more entities Time (search inquiry), search system 1200 can start.The example of search inquiry can be that key word is such as gone here and there and structural data Deng combination.These key words can be the entity representing people, tissue, geographical position, date and/or time.At the present embodiment In, " Indiana Na " is used as search inquiry.
" Indiana Na " can be subsequently processed for entity extraction.Entity extraction module can by search inquiry such as " Indiana Na " is processed as entity, and by all of which facing to the entity in entity data bak 1204 and trend database 1206 Co-occurrence knowledge base is made comparisons, to extract entity as much as possible and to eliminate the ambiguity of entity as much as possible.It addition, be detected as not It is taken as the query text part of entity (such as, people, tissue, position) and can be used for searching entities co-occurrence knowledge base (such as, Entity data bak, trend database) concept characteristic (such as theme, the fact, key phrase).During milking, one can be used Individual or multiple feature recognition and extraction algorithms.Additionally, mark can be assigned to the feature of each extraction, this mark indicates by just True attribute correctly extracts feature qualitative level really.Based on corresponding characteristic attribute, it may be determined that the relative power of each feature Weight and/or dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
In the present embodiment, entity data bak 1204 can show search suggestion lists, i.e. can the most indexed and ranking Entity suggestion lists 1208.Similarly, trend database 1206 can show search suggestion lists, it is possible to the most indexed and The suggestion lists based on trend 1210 of ranking.It follows that search system 1200 can be based on by entity data bak 1204 and trend The search that data base 1206 provides is advised and is set up search suggestion lists 1212.Search suggestion lists 1212 can be based on two data The gross score of each entity suggestion in storehouse indexes and ranking;Therefore, maximally related can be the most shown, less relevant knot Fruit can continue with at it.
In search system 1200, disclose the exemplary use for obtaining search suggestion.Search suggestion lists 1212 Suggestion can be shown based on " Indiana Na " user inquiry.As a result, " Indiana Nascar " can be based on from entity suggestion row Gross score 1.4 that mark 0.8 in table 1208 and the mark 0.6 in suggestion lists based on trend 1210 are sued for peace obtained and first First occur.Similarly, as the result of gross score 0.9, can show " Indiana Name ", finally can show based on gross score 0.7 Show " Indiana Nashville ".
Figure 13 is at SharePointIn content made the system architecture 1300 of GEOGRAPHICAL INDICATION.Search rope Drawing 1324 is in the multiple key components allowing for search in SharePoint 1302.SharePointAnother key component allowing for search in 1302 can be content capture, in order to be indexed content. SharePoint 1302 includes reptile 1304 assembly so as to carry out content capture.
Reptile 1304 can be run through different content source 1306 and be crawled, to increase string metadata attributes to each content. The example of content source can include but not be to limit: SharePoint content, network file are shared or user or Intranet content. Reptile 1304 can be configured to the function performing to be securely connected to content source 1306, thus by from source document associations to they Metadata as the attribute crawled.Reptile 1304 can be configured to fully or incrementally to content crawl.Crawled The example of attribute can include such as but not be to limit: author, title, date created etc..
SharePointIncluding contents processing 1308 assembly.Contents processing 1308 assembly obtains from reptile 1304 Content preparing content are for index.Contents processing 1308 can relate to linguistics processing (language detection) stage, syntactic analysis Stage, entity extraction management phase, file format detection-phase based on content, contents processing error reporting stage, natural language The processing stage of speech and the attribute crawled is mapped to the stage etc. of managed attribute.
Contents processing 1308 can be extended by abundant in content network service (CEWS 1310).By allowing network service Recalling 1312 and call external web services to perform other action the abundant data attribute crawled, CEWS 1310 makes it possible to Enough carry out the abundant of contents processing 1308.Network service recall 1312 can be standard Simple Object Access Protocol (SOAP) please Ask or adjust for enriching any other network service of the structured message of the data that service 1314 clearing houses crawl with entity Use method.Network service recalls 1312 can include trigger condition, and this trigger condition is configured in abundant in content configuration object, touches Clockwork spring part controls when call external web services for abundant process.Entity enriches service 1314 and may further determine that the number crawled According to Doctype, in order to determine can with image (document of scanning, picture etc.) form arrive content.It is in whenever finding During the content of pictorial form, entity enriches service 1314 and the position of the document crawled can be sent to OCR processes engine 1316, all As not being such as but to limit: optical character recognition assembly or other image processing modules.OCR processes engine 1316 and can retrieve subsequently And process image file, and it is converted into text asynchronously.The file 1318 that OCR processed next can again by It is fed to reptile 1304, in order to be crawled and send back contents processing 1308, and the flow process that works on as text Remainder.
System architecture 1300 can include outside geographical marker network service 1320 and name entity indicia device service 1322. GEOGRAPHICAL INDICATION device network service 1320 and name entity indicia device service 1322 can be software modules, and this software module is configured to As network service application provider and network service is recalled 1312 respond.GEOGRAPHICAL INDICATION device network service 1320 can make By natural language processing entity extraction technology, machine learning model and other technologies, in order to geographical from the content recognition crawled Entity also eliminates the ambiguity of geographical entity.Such as, by analyzing being total to statistically of the entity found in dictionary of place name Existing, GEOGRAPHICAL INDICATION device network service 1320 can eliminate the ambiguity of geographical entity.GEOGRAPHICAL INDICATION device network service 1320 can include can Data base that link for the content found by reptile 1304, the entity of co-occurrence statistically.Follow identical technology, life Name entity indicia device service 1322 can be used for extracting other entity or text feature, such as tissue, people or theme.
GEOGRAPHICAL INDICATION device network service 1320 can be analyzed and be sent as inputting a collection of managed of attribute by CEWS 1310 Attribute also identifies any geographical entity mentioned in the text.The non-limiting example of input attribute comprises the steps that file type (FileType), Is document (IsDocument), original path (OriginalPath) and main body etc..By referring to find The attribute that each geographical entity creates or amendment is managed, text can be done geographical mark by GEOGRAPHICAL INDICATION device network service 1320 subsequently Note.Modified or new management attribute can be sent to entity and enrich service 1314 by GEOGRAPHICAL INDICATION device network service 1320, Makes conversion at the abundant service 1314 of entity, modified the managed attribute of this conversion map and using them as output attribute Return to CEWS 1310.Identical process can be used for mutual with name entity indicia device service 1322, for other entities or The extraction of text feature (such as, tissue, people or theme) and entity indicia.
After enriched the attribute managed that service 1314 return is enhanced by entity, attribute and the file crawled The attribute managed merge and be sent to search for index 1324.
Once GEOGRAPHICAL INDICATION and other entity indicia with relevance indexed, then it be also possible to use geographical feature and The substance feature named is to perform search inquiry.SharePointIn search UI 1326 can include being particularly shown Device, this display can assist user to perform based on geographical search and the enhanced display of support facet Search Results.Search Rope UI 1326 can be self-defined web page portions, or can also pass through conventional tool (such as HTML, HTML5, JavaScript and CSS) amendment SharePointThe standard layout of search realizes.
Figure 14 is to illustrate for for SharePointThe stream of the process steps that the content of search is marked Journey Figure 140 0.Work as SharePointIn reptile assembly perform time content is crawled (step 1402), this process can Start.In one embodiment, crawling can be to crawl completely, and the most in another embodiment, crawling can be that increment crawls. The attribute crawled and metadata can be fed to contents processing (step 1404) by reptile assembly subsequently.Make determine with checking crawl Content whether can include geographical entity or name entity.It not such as but to limit, trigger condition can be used.Trigger condition can be wrapped Containing batch processing logic or rule, it can determine that whether content can be benefited from GEOGRAPHICAL INDICATION or entity indicia.If trigger condition Be evaluated as vacation, then the content crawled can with the Attribute Association (step 1406) managed, and be passed to search for indexing component (step Rapid 1408).If trigger condition is evaluated as very, then network service can be recalled (step 1410) and be sent to the abundant clothes of entity by CEWS Business.Entity enriches service can analyze sent content, with determine this content whether can be at picture format (document of scanning, Picture etc.).The content being in picture format found can be processed asynchronously by OCR engine, and is sent back to by crawling group Part is crawled (step 1412) as text again.If content is not in picture format, then can be marked by geography Note network service or name entity indicia device service process content (step 1414).Network service can extract and mention in the content Geographical entity or the entity named eliminate geographical entity or the ambiguity of entity named, and pass through entity metadata Enrich them.The entity identified and their metadata can be sent back to contents processing assembly as the attribute managed And with relevance (step 1416).The metadata of association can be subsequently sent to search for indexing component (step 1406).
Although having been disclosed for various aspects and embodiment, it is contemplated that to other aspects and embodiment.Disclosed is each Individual aspect and embodiment are in order at descriptive purpose rather than are meant as limiting, and its real scope and spirit are wanted by appended right Ask and indicate.
Preceding method describes and process flow diagram flow chart is only used as illustrated examples and provides, and is not intended to requirement or hint must be with The order presented is to perform the step of each embodiment.As would be recognized by those skilled in the art, can be to appoint What order performs the step in previous embodiment.Such as " then ", the word of " next " etc. is not intended to the order of conditioning step; These words are only used for the description guiding reader to walk circulation method.Although process flow diagram flow chart may describe the operations as sequential process, But a lot of operations can perform concurrently or simultaneously.It addition, the order of operation can be rearranged.Process may correspond to method, Function, step, subroutine, subprogram etc..When a process corresponds to a function, it terminates may correspond to this function and returning to call merit Energy or the function of tonic chord.
Each illustrative components, blocks, module, circuit and the algorithm steps associatedly described with embodiment disclosed herein Electronic hardware, computer software or a combination of both can be implemented as.Can be mutual in order to clearly demonstrate this of hardware and software Transsexual, have been described above describing each illustrative components, block, module, circuit and step generally in accordance with their function. Whether such function is implemented as hardware or software depends on concrete application and applies design over the whole system about Bundle.For each concrete application, skilled artisan can realize described function in every way, but such reality Now determine should not be interpreted as causing a departure from the scope of the present invention.
Can by software, firmware, middleware, microcode, hardware description language or their any combination realize by terms of The embodiment that calculation machine software realizes.Code segment or the executable instruction of machine can represent step, function, subprogram, program, example Journey, subroutine, module, software kit, kind, or instruction, data structure or any combination of program statement.By transmission and/or Reception information, data, command line parameter, parameter or memory content, code segment can be coupled to another code segment or hardware electricity Road.Can transmit by any suitable means (including the transmission of Memory Sharing, message, alternative space, network transmission etc.), forward Or transmit information, command line parameter, parameter, data etc..
It not limitation of the present invention for realizing actual software code or the special control hardware of these system and methods. Therefore, descriptive system and the operation of method and behavior in the case of without reference to specific software code, this is understood to, software and Control hardware to be designed to realize based on system and method described herein.
When implemented in software, function can be stored as storage Jie that non-transitory is computer-readable or processor is readable One or more instructions in matter or code.The step of method disclosed herein or algorithm can be with the executable software of processor Module is implemented, and this software module can reside on computer-readable or that processor is readable storage medium.Non-transitory calculates The medium that machine is readable or processor is readable includes the calculating being easy to computer program from a position transfer to another position Machine storage medium and tangible media.The readable storage medium of non-transitory processor can be can be accessed by computer Any available medium.For example, but be not to limit, the readable medium of such non-transitory processor can include RAM, ROM, EEPROM, CD-ROM or other disk storages, disk memory or other magnetic storage apparatus, or can be used for The form of instruction or data structure store desired program code and can access by computer or processor any other is tangible Storage medium.As used herein plate and dish includes compact dish (CD), laser dish, optics dish, Digital Versatile Disc (DVD), soft Dish and Blu-ray Disc, its mid-game the most magnetically reproduces data, and dish uses laser optics ground to reproduce data.The group of above-mentioned item Close and also should be included in the range of computer-readable medium.It addition, the operation of method or algorithm can as code and/or In instruction one or any combination or set and reside in the readable medium of non-transitory processor and/or computer-readable On medium, and this medium can be incorporated in computer program.
It is to be understood that, each assembly of technology can be located at the multiple remote part of distributed network and/or the Internet Place, or it is positioned at a special safety, the unsafe and/or system of encryption.It is, therefore, to be understood that, system Multiple parts can be combined into one or more equipment or to be co-located at of distributed network such as communication network specific On node.As will understand from describe, and due to computational efficiency, in the situation of the operation not affecting system Under, the parts of system can be disposed in any position in distributed network.Additionally, parts can be embedded into special machine In.
Furthermore, it is to be understood that, each link being attached multiple elements can be expired air or wireless Link, or their any combination, or can by data supply and/or be delivered to connected element and supply and/ Or transmission from the element connected data, known to any other or later developed element.Art as used herein Language " module " may refer to be able to carry out functional, any of or later developed hardware with this element associated, soft Part, firmware or combinations thereof.As used herein term " determines ", " calculating " and " computing " and modification thereof make interchangeably With, and include any kind of method, process, mathematical operation or technology.
The described above, so that any person skilled in the art can make or use of the disclosed embodiments is provided The present invention.Various amendments to these embodiments will be easily to those skilled in the art it will be evident that and without departing from In the case of the spirit or scope of the present invention, the general principle limited herein can be applied to other embodiments.Therefore, this Bright be not intended to be limited to the embodiments shown herein, but give with claim below and principle disclosed herein and The widest scope that new feature is consistent.
Embodiment described above is intended to exemplary.Those skilled in the art are it is appreciated that multiple selectable unit (SU) Specific examples described herein alternative with embodiment and in the range of being still within.

Claims (56)

1. a computer implemented method, including:
Included the search inquiry of one or more entity from client computer reception by entity extraction computer;
By described entity extraction computer by each corresponding entity and described corresponding entity in co-occurrence data storehouse one or many Individual co-occurrence is made comparisons;
Exceed described in response to each corresponding entity in the subset determining one or more entity according to described co-occurrence data storehouse The confidence in co-occurrence data storehouse, come by described entity extraction computer from described search inquiry extract described first or The subset of multiple entities, wherein, described confidence based on the one or more related entities in electronic data corpus with The co-occurrence qualitative extent really of described each corresponding entity;
The each described entity distribution index identifier (index in the multiple entities extracted is given by described entity extraction computer ID);
By described entity extraction computer, the index ID of each entity in the multiple entities being used for described extraction is saved in In described electronic data corpus, described electronic data corpus is each by with the one or more related entities The index ID that entity is corresponding indexes;
By the described electronic data corpus of search server computer search entity index, to position the multiple of described extraction Entity also identifies the index ID of data record, and in described data record, at least two in multiple entities of described extraction is real Body co-occurrence;And
Setting up search result list by described search server computer, described search result list has and the rope identified Draw data record corresponding for ID.
Method the most according to claim 1, farther includes: by described search server computer according to based on confidence Described search result list is ranked up by the dependency of degree mark;And by described search server computer by ranked Described search result list pass to subscriber equipment.
Multiple entities of described extraction wherein, are entered by method the most according to claim 1 based on described confidence Row ranking.
Method the most according to claim 1, wherein, described entity extraction computer is by the entity extracted and entity index One or more co-occurrence entity associated in described electronic data corpus.
Method the most according to claim 4, wherein, arranges the entity through association according to described confidence Name.
Method the most according to claim 1, wherein, each entity in the plurality of entity is selected from people, tissue, geographical position Put, date and time.
7. a system, including:
One or more server computers, have one or more processor, and the one or more processor performs to be used for The computer-readable instruction of multiple computer modules, the plurality of computer module includes:
Entity extraction module, is configured to receive user's input of search inquiry parameter, and described entity extraction module configures further For:
By the following from the described multiple entity of search inquiry parameter extraction: each entity the multiple entities that will extract and reality Making comparisons in body co-occurrence data storehouse, wherein, described entity co-occurrence data storehouse includes confidence, described confidence instruction electricity The co-occurrence qualitative extent really of the entity of the one or more related entities in subdata corpus and extraction,
Index identifier (index ID) is distributed to each described entity in multiple entities of described extraction,
The index ID of each entity in the multiple entities being used for described extraction is saved in described electronic data corpus, institute Stating electronic data corpus is to be indexed by the index ID corresponding with each entity in the one or more related entities 's;And
Search server module, is configured to the described electronic data corpus of searching entities index, many to position described extraction Individual entity also identifies the index ID of data record, at least two in described data record, in multiple entities of described extraction Entity co-occurrence, described search server module is further configured to set up search result list, and described search result list has With corresponding for the index ID data record identified.
System the most according to claim 7, wherein, described search server module is further configured to: according to based on institute Described search result list is ranked up by the dependency stating confidence;And by ranked described search result list Pass to subscriber equipment.
Multiple entities of described extraction wherein, are entered by system the most according to claim 7 based on described confidence Row ranking.
System the most according to claim 7, wherein, described entity extraction module is configured that the entity extracted and entity One or more co-occurrence entity associated in the described electronic data corpus of index.
11. systems according to claim 10, wherein, arrange the entity through association according to described confidence Name.
12. systems according to claim 7, wherein, each entity in the plurality of entity is selected from people, tissue, geography Position, date and time.
13. 1 kinds of non-transitory computer-readable medium, on it, storage has the executable instruction of computer, and described instruction includes:
The user being received search inquiry parameter by entity extraction computer is inputted;
By following operation by described entity extraction computer from the described multiple entity of search inquiry parameter extraction: will extract Multiple entities in each entity make comparisons with entity co-occurrence data storehouse, wherein said entity co-occurrence data storehouse includes confidence level Mark, the one or more related entities in described confidence instruction electronic data corpus are total to the entity extracted The most really qualitative extent;
The each described entity distribution index identifier in multiple entities of described extraction is given by described entity extraction computer (index ID);
By described entity extraction computer, the index ID of each entity in the multiple entities being used for described extraction is saved in In described electronic data corpus, described electronic data corpus is each by with the one or more related entities The index ID that entity is corresponding indexes;
By the described electronic data corpus of described search server computer search entity index, to position described extraction Multiple entities also identify the index ID of data record, in described data record, and at least two in multiple entities of described extraction Individual entity co-occurrence;And
Setting up search result list by described search server computer, described search result list has and the rope identified Draw data record corresponding for ID.
14. computer-readable mediums according to claim 13, wherein, described instruction farther includes: searched by described Described search result list is ranked up by rope server computer according to dependency based on described confidence;And it is logical Cross described search server computer and ranked described search result list is passed to subscriber equipment.
15. computer-readable mediums according to claim 13, wherein, come described extraction based on described confidence Multiple entities carry out ranking.
16. computer-readable mediums according to claim 13, wherein, described instruction farther includes: by described reality Body extracts computer by the entity extracted and the one or more co-occurrence entities in the described electronic data corpus of entity index Association.
17. computer-readable mediums according to claim 16, wherein, come through association according to described confidence Entity carries out ranking.
18. computer-readable mediums according to claim 13, wherein, each selected from people, group in the plurality of entity Knit, geographical position, date and time.
19. 1 kinds of methods, including:
The user receiving search inquiry parameter from user interface by entity extraction computer inputs;
By following operation by described entity extraction computer from the described one or more entity of search inquiry parameter extraction: Described search inquiry parameter is made comparisons with entity co-occurrence data storehouse, and identifies and or many in described search inquiry parameter At least one entity type that individual entity is corresponding, wherein, described entity co-occurrence data storehouse has the one or more entity and exists The example of co-occurrence in electronic data corpus;
Select fuzzy matching algorithm by fuzzy score matching computer, identify for searching for described entity co-occurrence data storehouse With one or more records of described search inquiry parameter association, wherein, described fuzzy matching algorithm with identified described extremely A few entity type correspondence;
By described fuzzy score matching computer, use selected fuzzy matching algorithm to search for described entity co-occurrence data Storehouse, and form one or more proposed search inquiry parameter from based on described search from the one or more record;With And
The one or more proposed search is presented via described user interface by described fuzzy score matching computer Query argument.
20. methods according to claim 19, farther include: before described user inputs verdict, by described mould Stick with paste the fuzzy matching algorithm selected by the use of fractional matching computer and search for described entity co-occurrence data storehouse.
21. methods according to claim 19, wherein, with the one or more note of described search inquiry parameter association Record includes concept characteristic.
22. methods according to claim 19, wherein, the one or more proposed search inquiry parameter includes many Search inquiry parameter proposed by individual, described method farther includes: based on the described search inquiry in inputting with described user The coupling nearness of parameter, by described fuzzy score matching computer to the plurality of proposed search inquiry parameter with fall Sequence is ranked up.
23. methods according to claim 22, wherein, described fuzzy score matching computer via described user interface with Drop-down list presents ranked the plurality of proposed search inquiry parameter.
24. methods according to claim 19, wherein, described entity co-occurrence data storehouse is indexed.
25. methods according to claim 1, wherein, described entity co-occurrence data storehouse includes the entity index to entity.
26. methods according to claim 19, wherein, described entity co-occurrence data storehouse includes the entity index to theme.
27. methods according to claim 19, wherein, described entity co-occurrence data storehouse includes that entity arrives the index of the fact.
28. 1 kinds of systems, including:
One or more server computers, have one or more processor, and the one or more processor performs to be used for The computer-readable instruction of multiple computer modules, the plurality of computer module includes:
Entity extraction module, is configured to receive user's input of search inquiry parameter, described entity extraction module from user interface It is further configured to:
By following operation from the described one or more entity of search inquiry parameter extraction: by described search inquiry parameter and entity Making comparisons in co-occurrence data storehouse, and identifies at least one corresponding with the one or more entity in described search inquiry parameter Entity type, wherein, described entity co-occurrence data storehouse has the one or more entity co-occurrence in electronic data corpus Example;And
Fuzzy score matching module, is configured to select fuzzy matching algorithm to know for searching for described entity co-occurrence data storehouse Other one or more records with described search inquiry parameter association, wherein, described fuzzy matching algorithm is described with identified At least one entity type is corresponding, and described fuzzy score matching module is further configured to:
Use selected fuzzy matching algorithm to search for described entity co-occurrence data storehouse, and based on described search from one or Multiple records form one or more proposed search inquiry parameter;And
The one or more proposed search inquiry parameter is presented via described user interface.
29. systems according to claim 28, wherein, described fuzzy score matching module is further configured to: described Before user inputs verdict, use selected fuzzy matching algorithm to search for described entity co-occurrence data storehouse.
30. systems according to claim 28, wherein, with one or more record bags of described search inquiry parameter association Include concept characteristic.
31. systems according to claim 28, wherein, the one or more proposed search inquiry parameter includes many Search inquiry parameter proposed by individual, and described fuzzy score matching computer is further configured to: based on described user The coupling nearness of the described search inquiry parameter in input, is carried out with descending the search inquiry parameter proposed by the plurality of Sequence.
32. systems according to claim 32, wherein, described fuzzy score matching computer is configured that via described use Interface, family presents ranked the plurality of proposed search inquiry parameter with drop-down list.
33. systems according to claim 28, wherein, described entity co-occurrence data storehouse is indexed.
34. systems according to claim 28, wherein, described entity co-occurrence data storehouse includes the entity index to entity.
35. systems according to claim 28, wherein, described entity co-occurrence data storehouse includes the entity index to theme.
36. systems according to claim 28, wherein, described entity co-occurrence data storehouse includes that entity arrives the index of the fact.
37. 1 kinds of methods, including:
Being inputted from the user of user interface receiving portion search inquiry parameter by entity extraction computer, described part searches is looked into Ask parameter and there is at least one incomplete search inquiry parameter;
By described partial search query parameter is made comparisons with entity co-occurrence data storehouse, and identify and described partial search query At least one entity type that one or more first instances in parameter are corresponding, comes by described entity extraction computer from institute Stating the one or more first instance of partial search query parameter extraction, wherein, described entity co-occurrence data storehouse has described One or more first instances are the example of co-occurrence in electronic data corpus;
Select fuzzy matching algorithm by fuzzy score matching computer, know for searching for described entity co-occurrence data storehouse Not with one or more records of described partial search query parameter association, wherein, described fuzzy matching algorithm with identified At least one entity type is corresponding;
Described entity co-occurrence data storehouse is searched for by the fuzzy matching algorithm selected by the use of described fuzzy score matching computer, And form the search inquiry parameter of one or more first suggestion from the one or more record based on described search;
Searching of the one or more the first suggestion is presented via described user interface by described fuzzy score matching computer Rope query argument;
The user being received the search inquiry parameter to the one or more the first suggestion by described entity extraction computer is selected Select, to form complete search inquiry parameter;
By described entity extraction computer from the described complete one or more second instance of search inquiry parameter extraction;
By searching entities co-occurrence data storehouse described in described entity extraction computer, to identify and the one or more the second reality One or more entities that body is relevant, thus form the search inquiry parameter of one or more second suggestion;And
The search presenting the one or more the second suggestion via described user interface by described entity extraction computer is looked into Ask parameter.
38., according to the method described in claim 37, farther include: before described user inputs verdict, by described mould Stick with paste the fuzzy matching algorithm selected by the use of fractional matching computer and search for described entity co-occurrence data storehouse.
39. according to the method described in claim 37, wherein, with one or more notes of described partial search query parameter association Record includes concept characteristic.
40. according to the method described in claim 37, and wherein, the search inquiry parameter of the one or more the first suggestion includes The search inquiry parameter of multiple first suggestions, described method farther includes: based on the part searches in inputting with described user The coupling nearness of query argument, is joined the search inquiry of the plurality of first suggestion by described fuzzy score matching computer Number is ranked up with descending.
41. methods according to claim 40, wherein, described fuzzy score matching computer via user interface with drop-down List presents the search inquiry parameter of ranked the plurality of first suggestion.
42. according to the method described in claim 37, and wherein, described entity co-occurrence data storehouse is indexed.
43. according to the method described in claim 37, and wherein, described entity co-occurrence data storehouse includes the entity index to entity.
44. according to the method described in claim 37, and wherein, described entity co-occurrence data storehouse includes the entity index to theme.
45. according to the method described in claim 37, and wherein, described entity co-occurrence data storehouse includes that entity arrives the index of the fact.
46. 1 kinds of systems, including:
One or more server computers, have one or more processor, and the one or more processor performs to be used for The computer-readable instruction of multiple computer modules, the plurality of computer module includes:
Entity extraction module, is configured to input from the user of user interface receiving portion search inquiry parameter, described part searches Query argument has at least one incomplete search inquiry parameter, and described entity extraction module is further configured to:
By described partial search query parameter is made comparisons with entity co-occurrence data storehouse, and identify and described partial search query At least one entity type that one or more first instances in parameter are corresponding, comes from described partial search query parameter extraction The one or more first instance, wherein, described entity co-occurrence data storehouse has the one or more first instance at electricity The example of co-occurrence in subdata corpus;And
Fuzzy score matching module, be configured to select fuzzy matching algorithm, for search for described entity co-occurrence data storehouse thus Identify and one or more records of described partial search query parameter association, wherein, described fuzzy matching algorithm with identified At least one entity type described corresponding, described fuzzy score matching module is further configured to:
Use selected fuzzy matching algorithm to search for described entity co-occurrence data storehouse, and based on described search from one or Multiple records form the search inquiry parameter of one or more first suggestion;And present one via described user interface Or multiple first suggestion search inquiry parameter;
Wherein, described entity extraction module is further configured to:
The user receiving the search inquiry parameter to the one or more the first suggestion selects, to form complete search inquiry Parameter;
From the described complete one or more second instance of search inquiry parameter extraction;
Search for described entity co-occurrence data storehouse, to identify the one or more realities relevant to the one or more second instance Body, thus form the search inquiry parameter of one or more second suggestion;And
The search inquiry parameter of the one or more the second suggestion is presented via described user interface.
47. systems according to claim 46, wherein, described fuzzy score matching module is further configured to: described Before user inputs verdict, selected fuzzy matching algorithm is used to search for described entity co-occurrence data storehouse.
48. systems according to claim 46, wherein, with one or more notes of described partial search query parameter association Record includes concept characteristic.
49. systems according to claim 46, wherein, the search inquiry parameter of the one or more the first suggestion includes The search inquiry parameter of multiple first suggestions, described fuzzy score matching module is further configured to: based on defeated with described user The coupling nearness of the described partial search query parameter in entering, to the plurality of first search inquiry parameter advised with descending It is ranked up.
50. systems according to claim 49, wherein, described fuzzy score matching computer is configured to by described user Interface presents the search inquiry parameter of ranked the plurality of first suggestion with drop-down list.
51. systems according to claim 46, wherein, described entity co-occurrence data storehouse is indexed.
52. systems according to claim 46, wherein, described entity co-occurrence data storehouse includes the entity index to entity.
53. systems according to claim 46, wherein, described entity co-occurrence data storehouse includes the entity index to theme.
54. systems according to claim 46, wherein, described entity co-occurrence data storehouse includes that entity arrives the index of the fact.
55. 1 kinds of computer implemented methods, including:
Included the search inquiry of one or more serial data, wherein, each corresponding entity from search engine reception by computer Corresponding with the subset of the one or more serial data;
Make comparisons, by described computer identification institute facing to entity data bak and trend database based on by one or more entities State the one or more entity in one or more serial data;
By in described the one or more serial data of computer identification, be identified as not corresponding with at least one entity One or more features;
By described computer based on matching algorithm by each characteristic allocation in the one or more feature to one Or at least one entity in multiple entity;
It is based upon the mark of each individual features distribution distributing to corresponding entity, gives each corresponding entity by described computer Mark is extracted in distribution;
Received from entity data bak by described computer and comprise the first search listing of one or more entity, one or Multiple entities have the extraction mark mark in threshold distance away from each corresponding entity;
Received from trend database by described computer and comprise the second search listing of one or more entity, one or Multiple entities have the extraction mark mark in threshold distance away from each corresponding entity;
The list after the gathering including described first search listing and described second search listing is generated by described computer, its In, according to the mark of each corresponding aggregate list, the entity of the list after described gathering is carried out ranking;And
Proposed search is provided according to the list after described gathering by described computer.
56. 1 kinds of computer implemented methods, including:
The multiple data streams associated respectively are received with multiple data sources by computer;
The array of the attribute associated with each respective stream of data is generated by described computer;
In response to described COMPUTER DETECTION to the trigger condition with the data association of data stream:
Geodata by the generation of described computer with the data association of described data stream;
It is not detected by the trigger condition for data source in response to described computer:
By described computer, the Array Mapping being used for the attribute of described data source is managed to a group associated with search index The attribute of reason;And
Type in response to the content determining data source is view data:
In metadata, perform optical character recognition routine by described computer, described metadata with from described data sources Data association;And
Retrieved from the number after the renewal of described data source from the network service identified by described metadata by described computer According to stream, wherein, described data source associates with the network service identified by described metadata.
CN201480072953.7A 2013-12-02 2014-12-02 System and method for internal storage data library searching Pending CN106164889A (en)

Applications Claiming Priority (11)

Application Number Priority Date Filing Date Title
US201361910907P 2013-12-02 2013-12-02
US201361910900P 2013-12-02 2013-12-02
US201361910894P 2013-12-02 2013-12-02
US201361910905P 2013-12-02 2013-12-02
US61/910,900 2013-12-02
US61/910,907 2013-12-02
US61/910,894 2013-12-02
US61/910,905 2013-12-02
US201461947652P 2014-03-04 2014-03-04
US61/947,652 2014-03-04
PCT/US2014/067997 WO2015084759A1 (en) 2013-12-02 2014-12-02 Systems and methods for in-memory database search

Publications (1)

Publication Number Publication Date
CN106164889A true CN106164889A (en) 2016-11-23

Family

ID=53274014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480072953.7A Pending CN106164889A (en) 2013-12-02 2014-12-02 System and method for internal storage data library searching

Country Status (6)

Country Link
EP (1) EP3077918A4 (en)
JP (1) JP2017504105A (en)
KR (1) KR20160124079A (en)
CN (1) CN106164889A (en)
CA (1) CA2932401A1 (en)
WO (1) WO2015084759A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991181A (en) * 2017-04-07 2017-07-28 广州视源电子科技股份有限公司 Method and device for extracting spoken sentences
CN107643835A (en) * 2017-10-19 2018-01-30 北京京东尚科信息技术有限公司 Drop-down word determines method, apparatus, electronic equipment and storage medium
CN107832459A (en) * 2017-11-27 2018-03-23 公安部交通管理科学研究所 The system and method that knowledge base content based on distributed network environment shares study
CN108932248A (en) * 2017-05-24 2018-12-04 苏宁云商集团股份有限公司 A kind of search realization method and system
CN109753517A (en) * 2018-12-06 2019-05-14 北京明略软件系统有限公司 A kind of method, apparatus, computer storage medium and the terminal of information inquiry
CN110245357A (en) * 2019-06-26 2019-09-17 北京百度网讯科技有限公司 Principal recognition methods and device
CN110347699A (en) * 2019-06-26 2019-10-18 北京明略软件系统有限公司 Determine the method and device of identity card related entities liveness
CN110471886A (en) * 2018-05-09 2019-11-19 富士施乐株式会社 For based on detection desk around file and people come the system of search file and people
CN112740196A (en) * 2018-09-20 2021-04-30 华为技术有限公司 Recognition model in artificial intelligence system based on knowledge management
CN114900422A (en) * 2021-01-26 2022-08-12 瞻博网络公司 Enhanced chat interface for network management
US12040934B1 (en) 2021-12-17 2024-07-16 Juniper Networks, Inc. Conversational assistant for obtaining network information
US12132622B2 (en) 2022-10-28 2024-10-29 Juniper Networks, Inc. Enhanced conversation interface for network management

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10296627B2 (en) 2015-08-18 2019-05-21 Fiserv, Inc. Generating integrated data records by correlating source data records from disparate data sources
CN109964224A (en) * 2016-09-22 2019-07-02 恩芙润斯公司 System, method and the computer-readable medium that significant associated time signal is inferred between life science entity are visualized and indicated for semantic information
CN106599547A (en) * 2016-11-23 2017-04-26 中山健康医疗信息技术有限公司 Intelligent medical knowledge base management system based on tags
JP6971104B2 (en) * 2017-09-20 2021-11-24 ヤフー株式会社 Information processing equipment, information processing methods, and programs
WO2019235103A1 (en) * 2018-06-07 2019-12-12 日本電信電話株式会社 Question generation device, question generation method, and program
US11487902B2 (en) 2019-06-21 2022-11-01 nference, inc. Systems and methods for computing with private healthcare data
CN112487214B (en) * 2020-12-23 2024-06-04 中译语通科技股份有限公司 Knowledge graph relation extraction method and system based on entity co-occurrence matrix

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079070A (en) * 2006-05-26 2007-11-28 国际商业机器公司 Computer and method for response of information query
US20080306908A1 (en) * 2007-06-05 2008-12-11 Microsoft Corporation Finding Related Entities For Search Queries
US20090327223A1 (en) * 2008-06-26 2009-12-31 Microsoft Corporation Query-driven web portals
CN103186556A (en) * 2011-12-28 2013-07-03 北京百度网讯科技有限公司 Method for obtaining and searching structural semantic knowledge and corresponding device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965900B2 (en) * 2001-12-19 2005-11-15 X-Labs Holdings, Llc Method and apparatus for electronically extracting application specific multidimensional information from documents selected from a set of documents electronically extracted from a library of electronically searchable documents
US8438142B2 (en) * 2005-05-04 2013-05-07 Google Inc. Suggesting and refining user input based on original user input
JP4922692B2 (en) * 2006-07-28 2012-04-25 富士通株式会社 Search query creation device
EP2291778A4 (en) * 2008-06-14 2011-09-21 Corp One Ltd Searching using patterns of usage
US8631004B2 (en) * 2009-12-28 2014-01-14 Yahoo! Inc. Search suggestion clustering and presentation
JP5256273B2 (en) * 2010-11-24 2013-08-07 ヤフー株式会社 Intention extraction apparatus, method and program
US20120143875A1 (en) * 2010-12-01 2012-06-07 Yahoo! Inc. Method and system for discovering dynamic relations among entities
JP5426526B2 (en) * 2010-12-21 2014-02-26 日本電信電話株式会社 Probabilistic information search processing device, probabilistic information search processing method, and probabilistic information search processing program
EP2788907A2 (en) * 2011-12-06 2014-10-15 Perception Partners Inc. Text mining analysis and output system
WO2013170343A1 (en) * 2012-05-15 2013-11-21 Whyz Technologies Limited Method and system relating to salient content extraction for electronic content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079070A (en) * 2006-05-26 2007-11-28 国际商业机器公司 Computer and method for response of information query
US20080306908A1 (en) * 2007-06-05 2008-12-11 Microsoft Corporation Finding Related Entities For Search Queries
US20090327223A1 (en) * 2008-06-26 2009-12-31 Microsoft Corporation Query-driven web portals
CN103186556A (en) * 2011-12-28 2013-07-03 北京百度网讯科技有限公司 Method for obtaining and searching structural semantic knowledge and corresponding device

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991181A (en) * 2017-04-07 2017-07-28 广州视源电子科技股份有限公司 Method and device for extracting spoken sentences
CN106991181B (en) * 2017-04-07 2020-04-21 广州视源电子科技股份有限公司 Method and device for extracting spoken sentences
CN108932248A (en) * 2017-05-24 2018-12-04 苏宁云商集团股份有限公司 A kind of search realization method and system
CN107643835A (en) * 2017-10-19 2018-01-30 北京京东尚科信息技术有限公司 Drop-down word determines method, apparatus, electronic equipment and storage medium
CN107832459A (en) * 2017-11-27 2018-03-23 公安部交通管理科学研究所 The system and method that knowledge base content based on distributed network environment shares study
CN110471886A (en) * 2018-05-09 2019-11-19 富士施乐株式会社 For based on detection desk around file and people come the system of search file and people
CN112740196A (en) * 2018-09-20 2021-04-30 华为技术有限公司 Recognition model in artificial intelligence system based on knowledge management
CN109753517A (en) * 2018-12-06 2019-05-14 北京明略软件系统有限公司 A kind of method, apparatus, computer storage medium and the terminal of information inquiry
CN110347699A (en) * 2019-06-26 2019-10-18 北京明略软件系统有限公司 Determine the method and device of identity card related entities liveness
CN110245357A (en) * 2019-06-26 2019-09-17 北京百度网讯科技有限公司 Principal recognition methods and device
CN110347699B (en) * 2019-06-26 2022-01-28 北京明略软件系统有限公司 Method and device for determining activity of entity related to identity card
CN110245357B (en) * 2019-06-26 2023-05-02 北京百度网讯科技有限公司 Main entity identification method and device
CN114900422A (en) * 2021-01-26 2022-08-12 瞻博网络公司 Enhanced chat interface for network management
US12040934B1 (en) 2021-12-17 2024-07-16 Juniper Networks, Inc. Conversational assistant for obtaining network information
US12132622B2 (en) 2022-10-28 2024-10-29 Juniper Networks, Inc. Enhanced conversation interface for network management

Also Published As

Publication number Publication date
EP3077918A1 (en) 2016-10-12
JP2017504105A (en) 2017-02-02
WO2015084759A1 (en) 2015-06-11
KR20160124079A (en) 2016-10-26
EP3077918A4 (en) 2017-06-07
CA2932401A1 (en) 2015-06-11

Similar Documents

Publication Publication Date Title
CN106164889A (en) System and method for internal storage data library searching
CN112437917B (en) Natural language interface for databases using autonomous agents and thesaurus
Bizer et al. Dbpedia-a crystallization point for the web of data
KR20160144384A (en) Context-sensitive search using a deep learning model
Kaur et al. Scholarometer: A social framework for analyzing impact across disciplines
AU2011269676A1 (en) Systems of computerized agents and user-directed semantic networking
Nesi et al. Geographical localization of web domains and organization addresses recognition by employing natural language processing, Pattern Matching and clustering
Van Hooland et al. Evaluating the success of vocabulary reconciliation for cultural heritage collections
Weigl et al. On providing semantic alignment and unified access to music library metadata
Lamba et al. Text Mining for Information Professionals
Valentine et al. EarthCube Data Discovery Studio: A gateway into geoscience data discovery and exploration with Jupyter notebooks
Hlava The Taxobook: Applications, implementation, and integration in search: Part 3 of a 3-part series
Wang et al. AceMap: Knowledge Discovery through Academic Graph
Charalabidis et al. Open data interoperability
Musabeyezu Comparative study of annotation tools and techniques
JP5380874B2 (en) Information retrieval method, program and apparatus
Žumer National bibliographies in the digital age: guidance and new directions
ElGindy et al. Capturing place semantics on the geosocial web
JP5652519B2 (en) Information retrieval method, program and apparatus
Kapadia Web Search Engine Using Ontology
Beigzadeh Component recommendation system
Gleich et al. Some computational tools for digital archive and metadata maintenance
Hradec et al. Semantic text analysis tool: SeTA
Elgindy Extracting place semantics from geo-folksonomies
Wu et al. Recommending Relevant Tutorial Fragments for API-Related Natural Language Questions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161123