CN106164889A - System and method for internal storage data library searching - Google Patents
System and method for internal storage data library searching Download PDFInfo
- Publication number
- CN106164889A CN106164889A CN201480072953.7A CN201480072953A CN106164889A CN 106164889 A CN106164889 A CN 106164889A CN 201480072953 A CN201480072953 A CN 201480072953A CN 106164889 A CN106164889 A CN 106164889A
- Authority
- CN
- China
- Prior art keywords
- entity
- search
- computer
- extraction
- occurrence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3322—Query formulation using system suggestions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/02—Computing arrangements based on specific mathematical models using fuzzy logic
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Mathematics (AREA)
- Biomedical Technology (AREA)
- Fuzzy Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Algebra (AREA)
- Automation & Control Theory (AREA)
- Evolutionary Computation (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
Abstract
Disclose use entity co-occurrence knowledge base to identify the system and method for related entities.Embodiment uses the entity co-occurrence knowledge base of the entity of the corpus extraction from entity index to extract the entity identified in the search query, so that Search Results is rendered as related entities.Also disclose for using the fuzzy score mated with entity co-occurrence knowledge base to generate the embodiment of search suggestion.Embodiment extracts part entity from search inquiry, performs the matching algorithm of type based on extracted entity, and performs search facing to entity co-occurrence knowledge base.Also disclose the embodiment for mating the search suggestion generating related entities based on co-occurrence and/or fuzzy score.Embodiment processes and presents the suggestion to complete inquiry to partial search query, and complete inquiry is used as new search inquiry.Also disclose for by using entity and trend co-occurrence knowledge base to extract entity from search inquiry, using entity co-occurrence to generate the embodiment of search suggestion.Also disclose the embodiment of search capability for enabling the search based on entity that is geographical and that named in Content Management System.
Description
Technical field
The disclosure relates generally to method for information retrieval and system;More specifically, be used for using entity co-occurrence to search
The method of rope related entities.The disclosure relates generally to inquiry and strengthens;More specifically, in the entity co-occurrence used in knowledge base and
The search suggestion of fuzzy score coupling.The disclosure relates generally to computer inquery and processes;More specifically, based on co-occurrence and/or
The electronic search suggestion of the related entities of fuzzy score coupling.The disclosure relates generally to method for information retrieval and is
System;More specifically, for the method obtaining search suggestion.The disclosure relates generally to search engine and Content Management;More specifically
Ground, makes the search engine technique of Content Management System extend the name entity enabling to carry out GEOGRAPHICAL INDICATION and digital content
Abundant.
Background technology
In business environment, known search engine resolves last set term and returns the string sorted in some way
Project (being webpage in traditional search).The history ginseng of other users it is typically based on for performing the most well-known method of search
Examining to set up search inquiry data base, this data base can generate index eventually for based on key word.User search queries can be wrapped
Include by can be with the title of entity associated or one or more entities of attribute identification.Entity may also include tissue, people, position,
Date and/or time.In typical search, if user is searching for the information relevant to Liang Ge particular organization, then search for
Engine can return to the result of classification, and this result can be relevant to the mixing of the different entities with same names or similar names.
Later approach may result in user find may not be interested to user the relevant larger numbers of document of document.
Therefore, for the side for searching for related entities of the ability that user finds related entities interested can be given
, there is demand in method.
User uses search engine to position information interested from the Internet or any Database Systems continually.Search
Engine is generally by receiving search inquiry from user and Search Results is returned to user operating.Generally drawn by search
Holding up, Search Results based on each return and the dependency of search inquiry, to search result rank.Therefore, for Search Results
Quality for, the quality of search inquiry may be the most important.But, in most of the cases, from the search inquiry of user
Be probably imperfect ground or partly write out (such as, search inquiry may not include enough words with generate pay close attention to
One group of relevant result, on the contrary, generates the most unrelated result), and there may come a time when cacography (such as, Bill Smith
Bill Smitth may be spelt into mistakenly).
A kind of common method of the quality improving Search Results is to strengthen search inquiry.Search inquiry is increased
Strong a kind of mode can be that input based on user generates possible suggestion.To this, certain methods proposes for from by one
Or multiple user submit to before inquiry in identify the method that the candidate query for given inquiry optimizes.But, these sides
Method is based on inquiry log, and this daily record there may come a time when to direct the user to possible uninterested result.There is use to be not likely to be
The additive method of sufficiently accurate different technologies.Therefore, for improving or strengthening more smart to obtain from the search inquiry of user
The method of true result, however it remains demand.
User uses search engine to position information interested for from the Internet or any Database Systems continually.
Search engine is generally by receiving search inquiry from user and Search Results is returned to user operating.It is typically based on every
The Search Results of individual return and the dependency of search inquiry come to search result rank.Therefore, the quality for Search Results is come
Saying, the quality of search inquiry may be the most important.But, in most of the cases, the search inquiry from user is probably not
Completely or partially write out (such as, search inquiry may not include enough words with generate pay close attention to one group be correlated with
Result, on the contrary, generate the most unrelated result), and (such as, Bill Smith may mistake to there may come a time when cacography
Spell into Bill Smitth).
A kind of common method of the quality improving Search Results is to strengthen search inquiry.Search inquiry is increased
Strong a kind of mode can be that input based on user generates possible suggestion.To this, certain methods proposes for from by one
Or multiple user submit to before inquiry in identify the method that the candidate query for given inquiry optimizes.But, these sides
Method is based on inquiry log, and this daily record there may come a time when to direct the user to possible uninterested result.There is use to be not likely to be
The additive method of sufficiently accurate different technologies.Therefore, for improving or strengthening more smart to obtain from the search inquiry of user
True result and the method for related entities interested also presented to user when user gets search inquiry, still deposit
In demand.
Search engine includes multiple feature to provide the prediction for user's inquiry.Such prediction can include that inquiry is certainly
Dynamic complete and search is advised.Now, such Forecasting Methodology is based on history keyword word reference.The reference of such history may not be smart
Really, its reason is to may relate to the multiple themes in single text at a key word.
It addition, user search queries can include by can be one or more with what the title of entity associated or attribute identified
Entity.Entity may also include tissue, people, position, event, date and/or time.In typical search, if user's search
The information relevant to Liang Ge particular organization, then search engine can return to classification result, this result can with there is same names
Or the mixing of the different entities of similar names is relevant.Later approach may result in user and finds and may not actually feel emerging with user
The larger numbers of document that the document of interest is relevant.
Therefore, for for obtaining the method searching for suggestion faster and more accurately, there is demand.
Become known for documentation release and control the Content Management with collaborative project management and document file management system.One unrestricted
Property example can be the Sharepoint (share with point) of MicrosoftThe application external member of software and instrument.Microsoft
SharepointIt is a series of software products developed by Microsoft, is used for cooperating, file-sharing and network are sent out
Cloth.SharePointCan provide the user with substantial amounts of in perhaps information, and may become user be difficult to find that for
The maximally related information of particular case.In order to alleviate these problems, SharePointThere is provided search engine to help
User finds the content that they need.User can input search inquiry based on key word, and once content is the most indexed, then
SharePointIn search engine can to user return at SharePointFind in the environment of platform
The maximally related result of string.
Sometimes, user is it may be desirable to find and SharePointIn geographical entity or other kinds of entity
The content that (tissue such as mentioned in document or people) is relevant.SharePointImmediately available function is not provided
Automatically to extract entity from document.Specifically, it does not support that the content of GEOGRAPHICAL INDICATION is to extract geographical entity and by geography reality
Body is decomposed into geographical position.Additionally, SharePointNot support entity labelling so as to identify, disambiguation and carrying
Take the tissue in the entity named, such as document or people.But, SharePointSearch can be extended with can
Carrying out the search that effective geographic search is relevant to entity with other, it includes search facet based on entity.SharePointBefore, version includes " fast search " for SharePoint, and contents processing pipeline therefrom can be made to extend through
Cross sandbox application program, but so again limit its addressable information the most slowly.
SharePointIntroducing open API much, this enables to increase special linguistics, such as
Concept extraction, relation extraction, GEOGRAPHICAL INDICATION, collect and fine text analyzing.Therefore, for extension SharePointThe ability of search engine enable to carry out geographic search and search based on other entities, there is chance.
Summary of the invention
Disclose for using entity co-occurrence to the method searching for related entities.In an aspect of this disclosure, method can
For the search system of client/server type architecture can be included.In one embodiment, search system can include for searching
The user interface that index is held up, search engine is connected and one or more server apparatus communications by network.Server apparatus can
Electronic data corpus, entity co-occurrence repository database and entity extraction computer module including entity index.Knowledge
Storehouse can be created as memory database, and may also include other assemblies, the most one or more search controllers, multiple search joint
Point, the set of compression data and disambiguation module.One search controller optionally saves with one or more search
Point association.Each search node can run through the set of compression data and perform fuzzy keyword search independently and by one group of scoring
Result returns to the search controller of associated.
In one embodiment, a kind of computer implemented method includes: by entity extraction computer from client meter
Calculation machine receives the search inquiry including one or more entity;By entity extraction computer, each corresponding entity is real with corresponding
Body one or more co-occurrences in co-occurrence data storehouse are made comparisons;One or more entity is determined in response to according to co-occurrence data storehouse
Each corresponding entity in subset exceedes the confidence in co-occurrence data storehouse, comes by entity extraction computer from search inquiry
Extracting the subset of one or more entities, wherein, confidence is based on the one or more phases in electronic data corpus
Close the co-occurrence qualitative extent really of entity and each corresponding entity;Given in the multiple entities extracted by entity extraction computer
Each entity distribution index identifier (index ID);By entity extraction computer by be used for extract multiple entities in each
The index ID of entity is saved in electronic data corpus, electronic data corpus be by with in one or more related entities
Index ID corresponding to each entity index;Electronic data language material by search server computer search entity index
Storehouse, to position the multiple entities extracted and to identify the index ID of data record, in data record, in multiple entities of extraction
At least two entity co-occurrence;And set up search result list by search server computer, search result list have with
The data record corresponding for index ID identified.
In one embodiment, a kind of system includes one or more server computer, one or more server meters
Calculating facility and have one or more processor, one or more processors perform to refer to for the computer-readable of multiple computer modules
Order, multiple computer modules include: entity extraction module, are configured to receive user's input of search inquiry parameter, entity extraction
Module is further configured to: by being made comparisons from searching with entity co-occurrence data storehouse by each entity in the multiple entities extracted
Rope query argument extracts multiple entities, and wherein entity co-occurrence data storehouse includes confidence, confidence instruction electron number
According to the co-occurrence qualitative extent really of the one or more related entities in corpus Yu the entity of extraction, give the multiple entities extracted
In each entity distribution index identifier (index ID), the index ID of each entity being used in multiple entities of extracting is protected
Existing in electronic data corpus, electronic data corpus is by corresponding with each entity in one or more related entities
Index ID index;And search server module, it is configured to the electronic data corpus of searching entities index, with location
The multiple entities extracted the index ID identifying data record, at least two in data record, in multiple entities of extraction
Entity co-occurrence, search server module is further configured to set up search result list, and search result list has and identified
Data record corresponding for index ID.
In another embodiment, a kind of non-transitory computer-readable medium, on it, storage has the executable finger of computer
Order, instruction includes: the user being received search inquiry parameter by entity extraction computer is inputted;By the multiple entities that will extract
In each entity make comparisons with entity co-occurrence data storehouse, come by entity extraction computer multiple from search inquiry parameter extraction
Entity, wherein entity co-occurrence data storehouse includes confidence, in confidence instruction electronic data corpus one or
Multiple related entities and the co-occurrence qualitative extent really of entity extracted;The multiple realities extracted are given by entity extraction computer
Each entity distribution index identifier (index ID) in body;By entity extraction computer by the multiple entities being used for extraction
The index ID of each entity be saved in electronic data corpus, electronic data corpus is by relevant to one or more
The index ID that each entity in entity is corresponding indexes;Electron number by search server computer search entity index
According to corpus, to position the multiple entities extracted and to identify the index ID of data record, in data record, multiple realities of extraction
At least two entity co-occurrence in body;And set up search result list, search result list by search server computer
The data record corresponding for index ID having and identified.
Disclose a kind of for by using the entity co-occurrence in knowledge base and fuzzy score coupling to generate search suggestion
Method.In the one side of the disclosure, method can be used for including the search system of client/server type architecture.One
In individual embodiment, search system can include the user interface for search engine, search engine by network connect with one or
Multiple server apparatus communications.Server apparatus can include entity extraction computer module, fuzzy score matching computer module
And entity co-occurrence repository database.Knowledge base can be created as memory database, and may also include other hardware and/or soft
Part assembly, the most one or more search controllers, multiple search node, the set of compression data and disambiguation computer
Module.One search controller optionally associates with one or more search nodes.Each search node can run through pressure
The set of contracting data performs fuzzy keyword search independently and one group of appraisal result returns to the search control of associated
Device.
In another aspect of the present disclosure, method comprises the steps that entity extraction module, and entity extraction module can be to the search provided
Query execution part entity extracts, to identify whether search inquiry mentions entity, if it is, identify which kind of search inquiry mentions
The entity of type.Additionally, method comprises the steps that fuzzy score matching module, fuzzy score matching module can be based on the entity extracted
Type and produce algorithm and perform search facing to entity co-occurrence knowledge base.It addition, be detected as not corresponding with entity inquiry
Textual portions is taken as the concept characteristic that can be used for searching entities co-occurrence knowledge base, such as theme, the fact and key phrase.?
In one embodiment, entity co-occurrence knowledge base includes information bank, at this information bank, and can be according to entity to entity, entity to main
Entity is indexed by topic or entity to the fact etc., and this can be easy to that suggestion fast and accurately is returned to user so that searching for
Inquiry is complete.
In one embodiment, a kind of method is disclosed.The method includes: by entity extraction computer from user interface
Receive user's input of search inquiry parameter;By search inquiry parameter is made comparisons with entity co-occurrence data storehouse, and identify with
At least one entity type that one or more entities in search inquiry parameter are corresponding, by entity extraction computer from searching
Rope query argument extracts one or more entities, and wherein, entity co-occurrence data storehouse has one or more entity at electronic data
The example of co-occurrence in corpus;And select fuzzy matching algorithm by fuzzy score matching computer, for searching entities
Co-occurrence data storehouse and identify the one or more records with search inquiry parameter association, wherein, fuzzy matching algorithm with identified
At least one entity type corresponding.The method farther includes: by the mould selected by the use of fuzzy score matching computer
Stick with paste matching algorithm searching entities co-occurrence data storehouse, and form one or more suggestions based on described search from one or more records
Search inquiry parameter;And present one or more proposed searching by fuzzy score matching computer via user interface
Rope query argument.
In another embodiment, it is provided that a kind of system.This system includes one or more server computer, one
Or multiple server computer has one or more processor, the one or more processor performs for multiple computers
The computer-readable instruction of module, the plurality of computer module includes: entity extraction module, entity extraction module be configured to from
User interface receives user's input of search inquiry parameter, and entity extraction module is further configured to by by search inquiry parameter
Make comparisons with entity co-occurrence data storehouse, and identify that at least one corresponding with the one or more entities in search inquiry parameter is real
Body type, comes from the one or more entity of search inquiry parameter extraction, and wherein, entity co-occurrence data storehouse has one or more reality
Body is the example of co-occurrence in electronic data corpus.This system farther includes: fuzzy score matching module, and fuzzy score mates
Module be configured to select fuzzy matching algorithm, for searching entities co-occurrence data storehouse thus identify with search inquiry parameter association
One or more records, wherein, fuzzy matching algorithm is corresponding with at least one entity type identified.Fuzzy score mates
Module is further configured to: the fuzzy matching algorithm selected by use comes searching entities co-occurrence data storehouse, and based on described search
One or more proposed search inquiry parameter is formed from one or more records;And via user interface present one or
Search inquiry parameter proposed by multiple.
Disclose a kind of side for mating the search suggestion generating related entities based on co-occurrence and/or fuzzy score
Method.In the one side of the disclosure, method can be used for including the computer search system of client/server type architecture.?
In one embodiment, search system can include the user interface for search engine, and search engine is connected and one by network
Or multiple server apparatus communication.Server apparatus can include one or more processor and entity co-occurrence knowledge base data
Storehouse, the one or more processor performs the instruction for multiple special-purpose computer modules, the plurality of special-purpose computer mould
Block includes entity extraction module and fuzzy score matching module.Knowledge base can be created as memory database, and may also include it
His assembly, the most one or more search controllers, multiple search node, the set of compression data and disambiguation module.
One search controller optionally associates with one or more search nodes.Each search node can run through compression data
Set perform fuzzy keyword search independently and one group of appraisal result returned to the search controller of associated.
In another aspect of the present disclosure, method comprises the steps that by entity extraction module from the search inquiry enforcement division provided
Divide entity extraction, to identify whether search inquiry mentions entity, if it is, determine entity type.Additionally, method comprises the steps that
Generate the algorithm corresponding with the type of the entity extracted by fuzzy score matching module and hold facing to entity co-occurrence knowledge base
Line search.It addition, be detected as being taken as not as the query text part of entity can be used for searching entities co-occurrence knowledge base
Concept characteristic, such as theme, the fact and key phrase.Can have information bank entity co-occurrence knowledge base can by quickly and
Accurate suggestion returns to user so that search inquiry is complete, wherein can arrive according to entity to entity, entity at this information bank
Entity is indexed by theme or entity to the fact etc..
In the further aspect of the disclosure, complete search inquiry can be used as new search inquiry.Search system can be located
Manage new search inquiry, run entity extracts, find from entity co-occurrence knowledge base have highest score related entities and
The drop-down list that user is useful can be presented described related entities.
In one embodiment, a kind of method is disclosed.The method includes: by entity extraction computer from user interface
User's input of receiving portion search inquiry parameter, this partial search query parameter has at least one incomplete search inquiry
Parameter;By by partial search query parameter with there is one or more first instance reality of co-occurrence in electronic data corpus
Making comparisons in the entity co-occurrence data storehouse of example, and identifies corresponding with the one or more first instances in partial search query parameter
At least one entity type, comes by entity extraction computer real from partial search query parameter extraction one or more first
Body;And by fuzzy score matching computer select fuzzy matching algorithm, for searching entities co-occurrence data storehouse thus know
Not with one or more records of partial search query parameter association, wherein, fuzzy matching algorithm and identified at least one
Entity type is corresponding.The method farther includes: by the fuzzy matching algorithm selected by the use of fuzzy score matching computer
Come searching entities co-occurrence data storehouse, and form searching of one or more first suggestion based on described search from one or more records
Rope query argument;Presented the search inquiry of one or more first suggestion via user interface by fuzzy score matching computer
Parameter;The user being received the search inquiry parameter to one or more first suggestions by entity extraction computer is selected, with shape
Become complete search inquiry parameter;And it is one or more from complete search inquiry parameter extraction by entity extraction computer
Second instance.The method farther includes: by entity extraction computer search entity co-occurrence data storehouse, with identify with one or
One or more entities that multiple second instances are relevant, to form the search inquiry parameter of one or more second suggestion;And
Presented the search inquiry parameter of one or more second suggestion via user interface by entity extraction computer.
In another embodiment, a kind of system is disclosed.This system includes one or more server computer, described one
Individual or multiple server computers have one or more processor, and the one or more processor performs for multiple calculating
The computer-readable instruction of machine module, the plurality of computer module includes: entity extraction module, and entity extraction module is configured to
Inputting from the user of user interface receiving portion search inquiry parameter, it is imperfect that this partial search query parameter has at least one
Search inquiry parameter, entity extraction module be further configured to by by partial search query parameter with have one or more
First instance is made comparisons in the entity co-occurrence data storehouse of the example of co-occurrence in electronic data corpus, and identification is looked into part searches
Ask at least one entity type that the one or more first instances in parameter are corresponding, come from partial search query parameter extraction one
Individual or multiple first instances.This system farther includes: fuzzy score matching module, and fuzzy score matching module is configured to select
Fuzzy matching algorithm, for searching entities co-occurrence data storehouse, to identify or many with partial search query parameter association
Individual record, wherein, fuzzy matching algorithm is corresponding with at least one entity type identified.Fuzzy score matching module is further
It is configured that the fuzzy matching algorithm selected by use comes searching entities co-occurrence data storehouse, and based on described search from one or many
Individual record forms the search inquiry parameter of one or more first suggestion;And present one or more first via user interface
The search inquiry parameter of suggestion.It addition, entity extraction module is further configured to: receive and one or more first suggestions are searched
The user of rope query argument selects, to form complete search inquiry parameter;From complete search inquiry parameter extraction one or
Multiple second instances;Searching entities co-occurrence data storehouse, to identify the one or more realities relevant to one or more second instances
Body, thus form the search inquiry parameter of one or more second suggestion;And present one or more via user interface
The search inquiry parameter of two suggestions.
Disclose a kind of method for using entity to obtain the search suggestion relevant to entity with feature co-occurrence.At this
Disclosed one side, method can be used for including the search system of client/server type architecture.
A kind of search system, makes using the following method, and the method is usable in allowing entity data bak and trend database
One or more servers in storage entity.Entity on such data base can have mark, for based on higher
Mark index.The information that is stored in two data base be can be combined, for generation for obtaining the method for search suggestion
Single search suggestion lists.Trend database can provide from it of the one or more users in LAN and/or the Internet
Front search inquiry.Entity data bak can be based on multiple extracting data entities obtainable on LAN and/or the Internet
Search suggestion is provided.This list can provide the user with more accurate and faster one group of suggestion.
In one embodiment, a kind of computer implemented method includes: included from search engine reception by computer
The search inquiry of one or more serial datas, the most each corresponding entity is corresponding with the subset of one or more serial datas;Based on
One or more entities are made comparisons facing to entity data bak and trend database, comes by the one or more number of computer identification
According to the one or more entities in string;By in the one or more serial data of computer identification, be identified as not with at least one
One or more features that individual entity is corresponding;By computer based on matching algorithm by each feature in one or more features
Distribute at least one entity in one or more entity;By computer based on be assigned to distribution to corresponding entity each
The mark of individual features, extracts mark to the distribution of each corresponding entity;One is comprised from entity data bak reception by computer
Individual or the first search listing of multiple entity, the one or more entity has the extraction mark away from each corresponding entity at threshold
Mark in value distance;Received the second search listing comprising one or more entity, institute from trend database by computer
State one or more entity and there is the extraction mark mark in threshold distance away from each corresponding entity;Generated by computer
Including the list after the gathering of the first search listing and the second search listing, wherein, according to the list after each corresponding gathering
Mark the entity of list after assembling is carried out ranking;And by computer according to proposed by the list offer after assembling
Search.
Disclosed herein is can be at Content Management System such as Microsoft SharePointIn carry out based on geographical entity
The system and method for search.The method that embodiment describes relates to being extended by interpolation GEOGRAPHICAL INDICATION network service
SharePointSearch framework.This system includes with computer storage and one or more I/O equipment operationally
The computer processor of association, wherein processor and memorizer are configured to operate one or more SharePointPlace
Reason.System also includes another computer processor operationally associated with computer storage and one or more I/O equipment,
Wherein, processor and memorizer are configured to deposit and provide the process for GEOGRAPHICAL INDICATION network service.SharePointSystem can include crawling assembly, contents processing assembly and search indexing component so as to search for content.
SharePointContents processing assembly in search can be by using abundant in content network service (Content
Enrich Web Service, CEWS) feature extends its function.
The method relates to crawling content from different sources, crawls obtaining send to carry out contents processing a collection of
Attribute.During contents processing, trigger condition can determine that whether the attribute crawled can process from other and is benefited, in order to by other
Geo-metadata software attribute enriches original contents.If the attribute crawled does not processes from other and is benefited, then the attribute crawled can
It is mapped to managed process and is sent to search index.If the attribute crawled processes from external web services and is benefited,
Then CEWS can use HTML (Hypertext Markup Language) (HTTP) or any other web services call method to make to configurable end points
Simple Object Access Protocol (SOAP) is asked.Entity enriches service and can determine that the type of content.If content is in picture format,
Then its metadata such as document location may be sent to that optical character recognition (OCR) engine so that can retrieve asynchronously and locate
Reason original document is to be converted into text and to send back and crawl assembly, thus is again crawled with text formatting.If content is in
Text formatting, then GEOGRAPHICAL INDICATION network service can recognize that geo-metadata software and is allowed to close with content as the attribute managed
Connection.After content being carried out GEOGRAPHICAL INDICATION, content may be sent to that indexing component.
Use SharePointNetwork components, or by use standard network developing instrument such as HTML,
HTML5, JavaScript and CSS etc. revise SharePointThe standard layout of search, can add other search users
Interface (UI).Search UI can help user perform geographic search inquiry, or use numeral geographical feature the most such as but not as
The numerical map limited shows geographical Search Results.Entity that search UI also can be enhanced to use other to be enriched or and it
The metadata that associates to perform facet search.
Other aspects multiple, feature and the benefit of the disclosure can become obvious from detailed description below.
Accompanying drawing explanation
By referring to figure below, it is better understood the disclosure.Assembly in accompanying drawing is not necessarily to scale, on the contrary
It is important that focus on the principle illustrating the disclosure.In the accompanying drawings, different views, the parts that reference number instruction is corresponding are run through.
Fig. 1 is the block diagram of the exemplary environments illustrating computer system, and wherein an embodiment of the disclosure can be at this meter
Operate under the exemplary environments of calculation machine system;
Fig. 2 is the flow chart illustrating the method for using entity co-occurrence to scan for according to embodiment;And
Fig. 3 is the flow chart of the embodiment illustrating simple search, and the Search Results wherein returned by system can include feeling emerging
The related entities of interest.
Fig. 4 is the block diagram illustrating exemplary system environment, and wherein an embodiment of the disclosure can be in this example system
Operate under environment;
Fig. 5 be illustrate according to embodiment for using the entity co-occurrence in knowledge base and fuzzy score coupling to search for and build
The flow chart of the method for view;And
Fig. 6 is the figure of the example illustrating user interface, wherein uses entity co-occurrence in the knowledge base of Fig. 4-6 and fuzzy
Join, search suggestion can be produced by this user interface.
Fig. 7 is the block diagram illustrating exemplary system environment, and wherein an embodiment of the disclosure can be in this example system
Operate under environment.
Fig. 8 is to illustrate to generate searching of related entities according to embodiment for mating based on co-occurrence and/or fuzzy score
The flow chart of the method for Suo Jianyi.
Fig. 9 be with Fig. 8 described in the exemplary embodiment of user interface that associates of method.
Figure 10 is the block diagram illustrating the method for obtaining search suggestion based on entity and trend database.
Figure 11 is to illustrate for generating suggestion lists by independent mark based on the search suggestion in each data base, and
The block diagram of the method for search suggestion is obtained based on entity and trend database.
Figure 12 is to illustrate the gross score generation suggestion lists for by advising based on the search in two data bases, and base
The block diagram of the method for search suggestion is obtained in entity and trend database.
Figure 13 is the system architecture that the labelling for content in Content Management System and entity are abundant.
Figure 14 is a kind of process, is searched for that named and geographical entity by this process, labelling index content
Rope.
Definition
As used herein, terms below can have a following definition:
" entity extraction " refers to the information processing method for extracting information (such as title, position and tissue).
" corpus " refers to the set of one or more document.
" feature " is any information obtained from document at least in part.
" Event Concepts warehouse " refers to the data base of event-template model.
" event " refers at least by the one or more features occurring in real time characterizing of feature.
" event model " refers to the set of data, can be used for that the collection facing to data is incompatible makes comparisons and identify particular type
Event.
" module " refers to be adapted for carrying out the computer of at least one or more task or component software.
" characteristic attribute " refers to the metadata with feature association, such as feature position in a document, confidence
Deng.
" true " refers to the objective relation between feature.
" entity knowledge base " refers to comprise the Computer Database of feature/entity.
" inquire about " and refer to the request that computer generates, with from one or more applicable database retrieval information.
" theme " refers to the one group of subject information obtained at least in part from corpus.
" GEOGRAPHICAL INDICATION " refers to extract the process of geographical entity from non-structured text file.GEOGRAPHICAL INDICATION can include
Eliminate entity relative to specific geographic position and additional geo-metadata software (such as geographical coordinate, geographical feature type and other yuan
Data) ambiguity.
" entity indicia " refers to extract the process of the entity named from non-structured text.Entity indicia can include
Entity disambiguation, entity name standardization and additional entities metadata.
" entity named " refers to people, tissue or theme.
" geographical entity " refers to geographical position or geographic location.
" attribute crawled " refers to crawling period from checking the Content Management System metadata obtained document.
Detailed description of the invention
Now, with detailed reference to preferred embodiment, its example is shown in the drawings.Embodiment described above is intended to show
Example.Those skilled in the art it is appreciated that multiple optional assembly and the alternative specific examples described herein of embodiment,
And in the range of being still within.Without departing from the spirit or the scope of the present disclosure, other embodiments can be used
And/or other changes can be made.The illustrative embodiment described in a specific embodiment does not means that becomes in this paper
The restriction of theme.
It will be understood, however, that it is therefore intended that the scope of the present invention is not restricted.Can be by association area and gather around
The change of the herein described inventive features that the technical staff having the disclosure expects and further modification and as illustrated herein
The present invention principle other application, be considered to be located within the scope of the present invention.
Present disclosure describes a kind of system and method for detecting, extract and verify event from multiple sources.Source Ke Bao
Include news sources, social media website and/or any source of the data about event can be included.
Each embodiment of system and method disclosed herein gathers data from different sources, to identify independent event.
Fig. 1 is the block diagram of the search system 100 according to the disclosure.Search system 100 can include one or more client
Calculating equipment, the one or more client computing device includes processor, and processor performs and searches for what system 100 associated
Software module, search system 100 can include that graphic user interface 102, graphic user interface 102 access search engine 104, search
Engine 104 exchanges, by network 108, the search inquiry being in binary data form with server apparatus 106.In exemplary reality
Executing in example, search system 100 is implemented in client-server computing architecture.It is to be recognized, however, that search
System 100 can use other computer architectures (such as, stand-alone computer, there is the mainframe system of terminal, application service carries
For business (ASP) model and peer-to-peer model etc.) realize.Network 108 can include can transmitting numerical data between computing devices
Any suitable hardware and software module, such as LAN, wide area network, the Internet, wireless network and mobile telephone network etc..
So, it is further to be understood that, system 100 can realize on single network 108, or uses multiple network 108 to realize.
The calculating equipment 102 of user may have access to search engine 104, and search engine 104 can include transmitting search inquiry
Software module.Search inquiry is provided to search engine 104, instruction by the parameter of the expectation information of retrieval.Can by with
Family or another software application are with any suitable data form compatible with the parsing and handling routine of search engine 104 (such as,
Integer, string, complex object) search inquiry is provided.In certain embodiments, search engine 104 can be network work
Tool, this instrument can be accessed by the browser of the calculating equipment 102 of user or other software application, and make user or software
Application can position the information on WWW.In certain embodiments, search engine 104 can be the software of system 100 self
Application module, makes user or the application can information in the data base of alignment system 100.
Server apparatus 106 can be implemented as individual server equipment 106 or be implemented in across multiple server computers
Distributed structure/architecture in, server apparatus 106 can include entity extraction module 110, entity co-occurrence knowledge base 112 and entity rope
Draw corpus 114.Entity extraction module 110 can be can be from a given group polling (such as, query string and structural data
Deng) extract independent community and eliminate the computer software of ambiguity and/or the hardware module of independent community.The example of entity can include
People, tissue, geographical position, date and/or time.During milking, one or more feature recognition and extraction algorithm can be used.
Additionally, mark can be assigned to the feature of each extraction, the instruction of this mark correctly extracts feature really by correct attribute
Qualitative level.Consider characteristic attribute, it may be determined that the relative weighting of each feature or dependency.It addition, the scoring of weighting can be used
Model determines the dependency of the association between each feature.
According to each embodiment, entity co-occurrence knowledge base 112 can be created as but be not limited to memory computer data base (not
Illustrate), and other assembly (not shown) can be included, the most one or more search controllers, multiple search node, compression data
Set and disambiguation computer module.One search controller optionally closes with one or more search nodes
Connection.Each search node can run through the set of compression data and perform fuzzy keyword search independently and by one group of appraisal result
Return to the search controller of associated.
That entity co-occurrence knowledge base 112 can include feature based and related entities according to confidence ranking.Can make
With for the various methods of feature ranking, these methods substantially can use be used for determining which entity type most important, which
There is the weighted model of bigger weight a bit, and these methods determine the most how to perform to confidence based on confidence
The extraction of proper characteristics.Entity index corpus 114 can include (such as having big corpus or the language material lived from multiple sources
The Internet in storehouse) data.
Fig. 2 be illustrate that can realize in the search system 100 (the search system described the most in FIG), be used for using
Entity co-occurrence searches for the flow chart of the method 200 of related entities.According to each embodiment, before method 200 starts, entity
Index corpus 114 (be similar to described by Fig. 1 that) can be supplied with from multiple sources (such as electronic data big
Corpus or live corpus (such as, the Internet, website, blog, word-processing document, text-only file)) data.Entity
Index corpus 114 can include multiple indexed entity, can update the plurality of quilt continuously when finding new data
The entity of index.
In one embodiment, in step 202, when user or the software application of calculating equipment 102 carry to search engine 104
When supplying the one or more search inquiry comprising one or more entity, method 200 can start.In the search that step 202 provides
Inquiry can be processed by search system 100, every time from 1 to n.The example of the search inquiry in step 202 can be crucial
Contamination, such as string, structural data or other suitable data forms.In this exemplary embodiment of Fig. 2, search is looked into
The key word ask can be the entity of representative, tissue, geographical position, date and/or time.
Subsequently the search inquiry from step 202 can be processed for the entity extraction in step 204.In this step
Suddenly, the search inquiry from step 202 can be processed into entity by entity extraction module 110, and by all of which and entity co-occurrence
Knowledge base 112 is made comparisons, to extract entity as much as possible and to eliminate the ambiguity of entity as much as possible.During milking, may be used
Use one or more feature recognition and extraction algorithm.Additionally, mark can be assigned to the feature of each extraction, this mark indicates
Feature qualitative level really is correctly extracted by correct attribute.Consider characteristic attribute, it may be determined that the relative power of each feature
Weight or dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
Additionally, the various methods for linking feature can be used, these methods substantially can use and be used for determining
Which entity type is most important, which has the weighted model of bigger weight, and these methods are come based on confidence
Determine the extraction performing to confidence proper characteristics.Once extract entity and based on confidence to entity ranking,
Then in step 206, index ID (can be numeral in some cases) can be distributed to extracted entity.
It follows that in step 208, search based on the entity index ID distributed in step 206 can be performed.At search step
208, can index, at entity, the entity that the data inner position of corpus 114 is extracted by using standard indexing means.Once fixed
The position entity extracted, then can follow and carry out entity associated step 210.In entity associated step 210, language can be indexed from entity
Material storehouse 114 pulls out all data (such as document, video, picture, file etc.) of the entity overlap that at least two of which is extracted.?
After, set up potential the results list, according to dependency, potential the results list is ranked up, and using potential the results list as search
Result presents to user, i.e. step 212.This results list can the most only show the connection leading to following data, in these data
User can find related entities interested.
Fig. 3 is as associatedly discussed with Fig. 2 above, for using entity co-occurrence to search for the method 300 of related entities
Concrete example.As described in Fig. 2, according to each embodiment, before method 300 starts, entity index corpus 114
(be similar to described by Fig. 1 that) can be supplied with from multiple sources (the biggest corpus or the language material lived
Storehouse (the Internet)) data.Entity index corpus 114 can include multiple indexed entity, can hold when finding new data
Continue and be continuously updated the plurality of indexed entity.
In the present example embodiment, user can find has with " Qiao Busi (Jobs) " in company " Fructus Mali pumilae (Apple) "
The information closed.To this end, user can input one or more entity by user interface 102, (such as, the search in step 302 is looked into
Ask), wherein user interface 102 can be but not limited to have the interface of search engine 104 (described in such as Fig. 1 that).
By the way of explanation rather than by the way of restriction, user can input the combination of entity, such as " Fructus Mali pumilae+Qiao Busi ".Connect
Getting off, search engine 104 can generate search inquiry i.e. step 302, and these inquiries are sent to server apparatus 106 place
Reason.At server apparatus 106, entity extraction module 110 can be from performing entity extraction step the search inquiry that step 302 inputs
Rapid 304.
Entity extraction module 110 can be subsequently by the search inquiry such as " Fructus Mali pumilae " inputted in step 302 and " Qiao Busi " place
Manage into entity, and make comparisons extract entity as much as possible and eliminate to the greatest extent facing to entity co-occurrence knowledge base 112 by all of which
The ambiguity of entity that may be many.During milking, one or more feature recognition and extraction algorithm can be used.Additionally, mark can
Being assigned to the feature of each extraction, the instruction of this mark correctly extracts feature qualitative level really by correct attribute.Examine
Consider characteristic attribute, it may be determined that the relative weighting of each feature or dependency.It addition, the Rating Model of weighting can be used to determine respectively
The dependency of the association between feature.
Additionally, the various methods for linking feature can be used, these methods substantially can use and be used for determining
Which entity type is most important, which has the weighted model of bigger weight, and these methods are come based on confidence
Determine the extraction performing to confidence proper characteristics.As a result, can create and include the table 306 of entity and co-occurrence.Table 306 can
Showing entity " Fructus Mali pumilae " and co-occurrence thereof subsequently, in this case, co-occurrence can be Fructus Mali pumilae and Qiao Busi, Fructus Mali pumilae and Si Difuqiao
Buss (Steve Jobs).Table 306 may also include Fructus Mali pumilae and can be the discovery that relevant tissue A, its reason be tissue A with Fructus Mali pumilae
Make business and generate " Qiao Busi " in described tissue A.Other co-occurrences that importance is relatively low can be found.So, Fructus Mali pumilae and Qiao Bu
This can have the highest mark (1) subsequently, is therefore listed in top, and then can to have second the highest for Fructus Mali pumilae and Si Difuqiao Buss
Mark (0.8), finally due to have minimum mark (0.3), Fructus Mali pumilae and its hetero-organization A can be listed in bottom.
Once extract entity and based on confidence to entity ranking, then (index ID can in some cases to index ID
To be numeral) extracted entity can be assigned in step 308.Table 310 shows the rope distributing to extracted entity
Draw ID.Table 310 shows " Fructus Mali pumilae " with index ID 1 subsequently, have index ID 2 " Qiao Busi ", there is index ID 3
" Si Difuqiao Buss " and have index ID 4 " tissue A ".
It follows that search step 312 based on entity index ID 308 can be performed.At search step 312, can be by using
Standard indexing means is at the entity of the data inner position extraction of entity index corpus 114, such as " Fructus Mali pumilae ", " Qiao Busi ", " this
Di Fuqiao Buss " and " tissue A ".
After the entity that entity index corpus 114 inner position is extracted, can follow and carry out entity associated 314 step.?
Entity associated step 314, can pull out all numbers of the entity overlap that at least two of which is extracted from entity index corpus 114
According to (such as document, video, picture or file etc.), to establish the link list as Search Results (step 318).By explanation
Mode rather than by restriction by the way of, table 316 show the entity of extraction can how to be associated with entity index corpus 114 in
Data.In table 316, document 1,4,5,7,8 and 10 shows the overlap of two entities extracted, and therefore in step 318, uses
Link in these documents can be shown as Search Results.
Fig. 4 is the block diagram of the search computer system 400 according to the disclosure.Search system 400 can include to search engine
One or more user interfaces 402 of 404, search engine 404 is communicated with server apparatus 406 by network 408.In this enforcement
In example, search system 400 is implemented in one or more special-purpose computer and computer module referenced below, and it includes
Framework by client/server type.But, search system 400 can use other computer architectures (such as, independent meter
Calculation machine, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) realize.In an embodiment, search computer
System 400 includes multiple network, such as LAN, wide area network, the Internet, wireless network and mobile telephone network etc..
Search engine 404 can include user interface, the most network instrument, and this instrument allows users to position ten thousand dimensions
Online information.Search engine 404 may also include user interface tool, and this instrument allows users to localization of internal Database Systems
Interior information.Server apparatus 406 was implemented in individual server equipment 406 or in dividing across multiple server computers
In cloth framework, server apparatus 406 can include entity extraction module 410, fuzzy score matching module 412 and entity co-occurrence
Repository database 414.
Entity extraction module 410 can be hardware and/or software module, this hardware and/or software module be configured to
A fixed group polling (such as, query string, partial query and structural data etc.) immediately extracts independent community and immediately eliminates solely
The ambiguity of vertical entity.The example of entity can include people, tissue, geographical position, date and/or time.During milking, can use
One or more feature recognition and extraction algorithms.Additionally, mark can be assigned to the feature of each extraction, the instruction of this mark is passed through
Correct attribute correctly extracts feature qualitative level really.Consider characteristic attribute, it may be determined that the relative weighting of each feature or
Dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between feature.
Fuzzy score matching module 412 can include polyalgorithm, can be according to the entity extracted from given search inquiry
Type selects the plurality of algorithm.The function of algorithm can determine that: is inputted by user and the given search that receives
Inquire about the most similar to by other searched strings arrived of algorithm identification, or whether with general given of pattern string
Join.Fuzzy matching also can be known as fuzzy String matching, inexact matching and probably mate.Entity extraction module 410 is with fuzzy
Fractional matching module 412 can work in combination with entity co-occurrence knowledge base 414, to generate search suggestion to user.
According to each embodiment, entity co-occurrence knowledge base 414 can be created as but be not limited to memory database, and can include
Multiple assemblies, the most one or more search controllers, multiple search node, the set of compression data and disambiguation mould
Block.One search controller optionally associates with one or more search nodes.Each search node can run through compression
The set of data performs fuzzy keyword search independently and one group of appraisal result returns to the search controller of associated.
That entity co-occurrence knowledge base 414 can include feature based and related entities according to confidence ranking.Can make
By the various methods for linking feature, these methods substantially can use and be used for determining which entity type is the heaviest
, which has the weighted model of bigger weight, and these methods determine the most how confidence based on confidence
Ground performs the extraction of proper characteristics.
Fig. 5 is to illustrate the method for using the fuzzy score coupling in knowledge base and entity co-occurrence to generate search suggestion
The flow chart of 500.Method 500 can be realized in search system 400 (be similar to described by Fig. 4 that).
In one embodiment, starting to search engine interface as describing such as Fig. 4 as user in step 502
When getting search inquiry in 402, method 500 can start.When getting search inquiry in step 502, search system 400 can perform
Instant process.According to each embodiment, the search inquiry input in step 502 can be complete or part, it may be possible to just
That really spell or misspellings.Then, in search system 400, the search inquiry of step 502 can be inputted executable portion entity
Extraction step 504.Part entity extraction step 504 can run fast search facing to entity co-occurrence knowledge base 414, to identify in step
Whether the search inquiry of rapid 502 inputs is entity, if entity, then it is which type of entity.According to each embodiment,
The search inquiry input of step 402 can mention people, tissue, local position and date etc. subsequently.Once identify search inquiry
The entity type of input, the then optional corresponding fuzzy matching algorithm i.e. step 506 of fuzzy score matching module 412.Such as, as
Really search inquiry is identified as mentioning the entity of people, and fuzzy score matching module 412 can be the most such as by extracting people the most subsequently
The different component (including name, Christian name, surname and title) of name, selects the string matching algorithm for people.In another embodiment, as
Really search inquiry is identified as mentioning the entity of tissue, and fuzzy score matching module 412 is alternatively used for tissue (it can the most subsequently
Including such as the identification term of institute, university, company and limited company etc.) string matching algorithm.Fuzzy score matching module
412 can select the string matching algorithm corresponding with the entity type identified in search inquiry input, subsequently to configure this search.Once
String matching algorithm is adjusted to the type for the entity identified, then can perform fuzzy score coupling step 508.
Mate step 508 at fuzzy score, can search for one or more entities and the non-physical extracted, and face toward
Entity co-occurrence knowledge base 414 is made comparisons.The entity extracted can include the first character of incomplete name, such as name and surname
Symbol, abbreviation (such as can represent " UN " of " the United Nations "), short form and the pet name etc. of tissue.Entity co-occurrence knowledge base 414 can
Can be already registered with being indexed as structural data (such as, entity and entity, entity and theme and entity and the fact etc.)
Multiple records.The latter can allow the fuzzy score coupling in step 508 to occur in a very quick way.Mould in step 508
Stick with paste fractional matching to use but be not limited to commonly go here and there tolerance, such as Levenshtein distance, strcmp95 and ITF scoring etc..Two
Levenshtein distance between individual word may refer to that a word changes over the single character needed for another word and compiles
The minimum number collected.
Finally, once fuzzy score coupling step 508 completes searching facing to all records in entity co-occurrence knowledge base 414
The rope comparison that carries out of inquiry and search, can arrange or most closely mate given pattern string (that is, the search of step 502
Inquiry input) record can be selected for use in step 510 search suggestion the first candidate.The most closely coupling is given
Pattern string other record can be with descending sort below the first candidate.Subsequently the search in step 510 can be advised with can
The drop-down list of the coupling of energy presents to user, and user is negligible or can not ignore this drop-down list.
Fig. 6 is according to for discussing, use the entity co-occurrence in knowledge base and fuzzy score the most in figs. 4-5
Join the exemplary user interface 600 generating the method for search suggestion.In this example, (similar by search engine interface 602
In described by Fig. 4 that), user in search box 606 importation inquiry 604.By the way of explanation rather than logical
Crossing the mode limited, partial query 604 can be the incomplete title of people, " Michael J " the most as shown in Figure 6.It
Be considered part searches 604, its reason be user may the most non-selected search button 608, or additionally to search system
400 submit to partial query 604 to perform actual search and to obtain result.
Follower method 500 (Fig. 5), when user gets " Michael J ", entity extraction module 410 is facing to entity co-occurrence
Knowledge base 414 performs the most instant of first word (Michael) and searches for, to identify the type of entity, in this example,
Entity can be mentioned that name.Therefore, the optional string matching algorithm being exclusively used in name of fuzzy score matching module 412.Can be with not
Same form (such as, only use initial (short form), or the first character of first name and last name, or name, the initial of Christian name and
Surname, or their any combination) data base that writes finds name.Fuzzy score matching module 412 can use commonly degree of string
Measure such as Levenshtein distance, with determine mark and in entity co-occurrence knowledge base 414, can be with entity " Michael "
Entity, theme or the true distribution mark joined.In this example, Michael and a large amount of record matchings with this title.So
And, when user gets character " J " followed, fuzzy score matching module 412 can be based on Levenshtein distance, facing to tool
There are all co-occurrences of Michael and perform to compare with another of entity co-occurrence knowledge base 414.Entity co-occurrence knowledge base 414 can be with
Rear selection has all possible coupling of highest score for " Michael J ".Such as, fuzzy score matching module 412 can
By search suggestion 610 such as " Michael Jackson ", " Michael Jordan ", " Michael J.Fox " or at some
In the case of even " Michael Dell " return to user.User be then able to from drop-down list select proposed by one of people with
Complete search inquiry.About the extension of aforementioned exemplary, the inquiry such as " basket baller Michael " can be produced based on following result
The suggestion of raw " Michael Jordan ": wherein by entity co-occurrence knowledge base in the entity name modification of seeker
" basket baller " in " Michael " and co-occurrence feature (such as key phrase, the fact and theme) and return this result.Another
Example can be " performer Alexander ", can produce the suggestion of " Alexander Polinsky ".Those skilled in the art will recognize
Know to, the search platform that presently, there are will not may generate suggestion in the foregoing manner.
Fig. 7 is the block diagram of the search system 700 according to the disclosure.Search system 700 can include to search engine 704
Individual or multiple user interfaces 702, search engine 704 is communicated with server apparatus 706 by network 708.In the present embodiment, search
Cable system 700 is implemented in the framework of client/server type;But, search system 700 can use other computers
Framework (such as, stand-alone computer, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) and multiple network
(such as, LAN, wide area network, the Internet, wireless network, mobile telephone network etc.) realize.
Search engine 704 may include but be not limited to the interface by network instrument, and this instrument allows users to fixed
Information on WWW, position.Search engine 704 may also include the work of the information allowed users in localization of internal Database Systems
Tool.It is implemented in the server in individual server equipment 706 or in the distributed structure/architecture across multiple server computers
Equipment 706, it may include entity extraction module 710, fuzzy score matching module 712 and entity co-occurrence repository database 714.
Entity extraction module 710 can be hardware and/or software computer module, this hardware and/or software computer mould
Block can extract independent community from a given group polling (such as, query string, partial query and structural data etc.) immediately
And immediately eliminate the ambiguity of independent community.The example of entity can include people, tissue, geographical position, date and/or time.Carrying
Take period, one or more feature recognition and extraction algorithm can be used.Additionally, mark can be assigned to the feature of each extraction,
The instruction of this mark correctly extracts feature qualitative level really by correct attribute.Consider characteristic attribute, it may be determined that Mei Gete
The relative weighting levied or dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
Fuzzy score matching module 712 can include polyalgorithm, can be according to the entity extracted from given search inquiry
Type regulate or select the plurality of algorithm.The function of algorithm can determine that: given search inquiry (input) and institute
Whether the searched string of suggestion is the most similar, or probably mate with given pattern string.Fuzzy matching may be additionally referred to as
Obscure String matching, inexact matching and probably mate.Entity extraction module 710 and fuzzy score matching module 712 can be with realities
Body co-occurrence knowledge base 714 works in combination, thinks that user generates search suggestion.
According to each embodiment, entity co-occurrence knowledge base 712 can be created as but be not limited to memory database, and can include
Multiple assemblies, the most one or more search controllers, multiple search node, the set of compression data and disambiguation mould
Block.One search controller optionally associates with one or more search nodes.Each search node can run through compression
The set of data performs fuzzy keyword search independently and one group of appraisal result returns to the search controller of associated.
That entity co-occurrence knowledge base 714 can include feature based and related entities according to confidence ranking.Can make
By the various methods for linking feature, these methods substantially can use and be used for determining which entity type is the heaviest
, which has the weighted model of bigger weight, and these methods determine the most how confidence based on confidence
Ground performs the extraction of proper characteristics.
Fig. 8 is to illustrate the method for mating the search suggestion generating related entities based on co-occurrence and/or fuzzy score
The flow chart of the embodiment of 800.Method 800 can realize in search system 700 (be similar to Fig. 7 described in that).
In one embodiment, look into when user starts to get search in the search engine 704 as described in the most in the figure 7
During the i.e. step 802 of inquiry, method 800 can start.When getting search inquiry, search system 700 can perform instant process.According to respectively
Individual embodiment, search inquiry can be complete and/or part, it may be possible to correct that spell and/or misspellings.Connect down
Come, the part entity extraction step 804 of search inquiry can be performed.Part entity extraction step 804 can face toward entity co-occurrence knowledge
Fast search is run in storehouse 714, to identify whether search inquiry includes entity, if including entity, then identifies entity type.Root
According to each embodiment, the entity of search inquiry can be mentioned that people, tissue, local position and date etc..It is once entity, then mould
Stick with paste the optional corresponding fuzzy matching algorithm i.e. step 806 of fractional matching module 712.Such as, if search inquiry is identified as
Mentioning that the entity of people, fuzzy score matching module 712 scalable or selection the most subsequently are used for the string matching algorithm of people, it can extract
The different component of name, including name, Christian name, surname and title.In another embodiment, if search inquiry is identified as mentioning
The entity of tissue, fuzzy score matching module 712 scalable or selection the most subsequently is used for organizing that (it can include such as institute, big
Learn, company and the identification term of limited company) string matching algorithm.Therefore, fuzzy score matching module 712 regulate or
Select the string matching algorithm for entity type, in order to search.Once string matching algorithm be conditioned or be selected to corresponding to
Entity type, then can perform fuzzy score coupling in step 808.
Mate step 808 at fuzzy score, can search for one or more entities of extraction and any non-physical, and will
One or more entities and any non-physical are made comparisons facing to entity co-occurrence knowledge base 714.The one or more realities extracted
Body can include the first character of incomplete name, such as name and surname, the abbreviation of tissue (such as can represent " the United Nations "
" UN "), short form and pet name etc..Entity co-occurrence knowledge base 714 may be already registered with according to structural data (such as entity
With entity, entity and theme and entity and fact index etc.) multiple records of indexing.This can allow obscuring in step 808
Fractional matching promptly occurs.Fuzzy score coupling can use but be not limited to commonly go here and there tolerance, such as Levenshtein apart from,
Strcmp95 and ITF scoring etc..Levenshtein distance between two words may refer to a word is changed over another
The minimum number of the single character editing needed for individual word.
Once the coupling of the fuzzy score in step 808 step completes facing to all records in entity co-occurrence knowledge base 714
The comparison carrying out search inquiry and search, can arrange or the given pattern string of most closely coupling search inquiry input
Record can be selected for use in search suggestion the first candidate, i.e. step 810.The most closely coupling search inquiry inputs
Other records of given pattern string can be with descending sort below the first candidate.Subsequently the search in step 810 can be advised
Presenting to user with the drop-down list of possible coupling, user may select this drop-down list so that this inquiry is complete.
In another embodiment, after user selects the coupling that he/her is interested, search system 700 can be by this selection
As new search inquiry, i.e. step 812.It follows that described new search inquiry can be performed entity extraction step 814.?
During extraction, one or more feature recognition and extraction algorithm can be used.Additionally, mark can be assigned to the spy of each extraction
Levying, the instruction of this mark correctly extracts feature qualitative level really by correct attribute.Consider characteristic attribute, it may be determined that each
The relative weighting of feature or dependency.It addition, the Rating Model of weighting can be used to determine the relevant of association between each feature
Property.Entity extraction module 710 can face toward entity co-occurrence knowledge base 714 run search, subsequently with based on having being total to of highest score
Now find related entities i.e. step 816.Finally, in step 818, the actual search of data can be performed in electronic document corpus
The drop-down list of the search suggestion including related entities is presented to user before.
Fig. 9 is and the method 800 for mating the search suggestion generating related entities based on co-occurrence and/or fuzzy score
The exemplary embodiment of the user interface 900 of association.In this example, (it is similar to pass through Fig. 7 by search engine interface 902
Describe that), user in search box 906 importation inquiry 904.By the way of explanation rather than by the side limited
Formula, partial query 304 can be the incomplete title of people, " Michael J " the most as shown in Figure 9.It may be considered that
Part searches 904, its reason be user may the most non-selected search button 908, or additionally submit part to search system 100
Inquiry 904 is to perform actual search and to obtain result.
Follower method 800, when user gets " Michael J ", entity extraction module 710 is facing to entity co-occurrence knowledge base
714 pairs first word (Michael) performs the most instant search, and to identify the type of entity, in this example, entity can carry
And name.Then, the optional string matching algorithm being exclusively used in name of fuzzy score matching module 712.Can be in (example in different forms
As, only use initial (short form), or name and the first character of surname, or name, the initial of Christian name and surname, or they
Any combination) data base that writes finds name.Fuzzy score matching module 712 can use common string tolerance such as
Levenshtein distance, to determine mark and to reality in entity co-occurrence knowledge base 714, that can mate with entity " Michael "
Body, theme or true distribution mark.In this example, Michael and a large amount of record matchings with this title.But, when with
When character " J " followed is got at family, fuzzy score matching module 712 can be based on Levenshtein distance, facing to having
All co-occurrences of Michael and perform to compare with another of entity co-occurrence knowledge base 714.Entity co-occurrence knowledge base 714 can be subsequently
Select all possible coupling for " Michael J " with highest score.Such as, fuzzy score matching module 712 can be by
Make search suggestion 910 such as " Michael Jackson ", " Michael Jordan ", " Michael that " Michael J " is complete
J.Fox " or the most even " Michael Dell " return to user.User is then able to select from drop-down list
One of proposed people, or ignore this suggestion and continue typewriting.About the extension of aforementioned exemplary, such as " basket baller
Michael " inquiry can produce the suggestion of " Michael Jordan " based on following result: wherein by searching entities altogether
In existing knowledge base, " Michael " in seeker's entity name modification and co-occurrence feature are (such as key phrase, the fact and theme
Deng) in " basket baller ", and return this result.Another example can be " performer Alexander ", can produce
The suggestion of " Alexander Polinsky ".As those skilled in the art will recognize that, existing search platform may will not carry
For the suggestion generated in the foregoing manner.
In the present embodiment, user can select " Michael Jordan " from drop-down list, so that partial query 904 is complete
Whole, as indicated in fig. 9.Described selection can be processed into new search inquiry 912 by search system 700 subsequently.Connect down
Come, described new search inquiry 912 can be performed entity extraction.During milking, can use one or more feature identification and
Extraction algorithm.Additionally, mark can be assigned to the feature of each extraction, the instruction of this mark is correctly extracted by correct attribute
Feature qualitative level really.Consider characteristic attribute, it may be determined that the relative weighting of each feature or dependency.Add it addition, can use
The Rating Model of power determines the dependency of the association between each feature.Entity extraction module 710 can face toward entity co-occurrence subsequently
Knowledge base 714 runs the search for " Michael Jordan ", relevant real to find based on the co-occurrence with highest score
Body.Finally, can be by the search suggestion including related entities before performed actual search by click search button 908
Drop-down list 914 presents to user.Aforementioned system and method described in Fig. 7-9 can be quick and be convenient to user
, its reason is that user can find useful relation.
Figure 10 is the block diagram of the search system 1000 according to the disclosure.Search system 1000 can include search engine 1002,
Such search engine 1002 can include one or more user interface, thus allows the data from user to input, such as, use
Family is inquired about.
Search system 1000 can include one or more data base.Such data base can include entity data bak 1004 He
Trend database 1006.Data base can be stored in home server or in network server.Therefore, search system
System 1000 is implemented in the framework of client/server type;But, search system 1000 can use other computer racks
Structure (such as, stand-alone computer, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) and multiple network (example
As, LAN, wide area network, the Internet, wireless network and mobile telephone network etc.) realize.
Search engine 1002 may include but be not limited to network instrument, and this instrument allows users to position on WWW
Information.Search engine 1002 may also include the instrument of the information allowed users in localization of internal Database Systems.
Entity data bak 1004 can be implemented as individual server or be implemented in the distributed structure/architecture across multiple servers
In.Entity data bak 1004 can allow a group object inquiry, such as query string and structural data etc..Such group object is looked into
Inquiry can be extracted in obtainable multiple corpus in advance from the Internet and/or local network.Object query can be indexed and comment
Point.The example of entity can include people, tissue, geographical position, date and/or time.During milking, can use one or more
Feature recognition and extraction algorithm.Additionally, mark can be assigned to the feature of each extraction, this mark indicates by correct attribute
Correctly extract feature qualitative level really.Consider characteristic attribute, it may be determined that the relative weighting of each feature or dependency.Separately
Outward, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
Trend database 1006 can be implemented as individual server or be implemented in the distributed structure/architecture across multiple servers
In.Trend database 1006 can allow a group object inquiry, such as query string and structural data etc..Such group object is looked into
Inquiry can be extracted in advance from the historical query performed by a user the Internet and/or local network and/or multiple user.
Object query can indexed and scoring.The example of entity can include people, tissue, geographical position, date and/or time.Extracting
Period, one or more feature recognition and extraction algorithm can be used.Additionally, mark can be assigned to the feature of each extraction, should
Mark instruction correctly extracts feature qualitative level really by correct attribute.Consider characteristic attribute, it may be determined that each feature
Relative weighting or dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
Entity data bak 1004 and trend database 1006 can include entity co-occurrence knowledge base, and entity co-occurrence knowledge base can quilt
It is created as but is not limited to memory database (not shown), and other assembly (not shown), the most one or more search can be included
Controller, multiple search node, the set of compression data and disambiguation module.One search controller optionally with
One or more search nodes associate.Each search node can run through the set of compression data and perform fuzzy key word independently
Search for and one group of appraisal result returned to the search controller of associated.
That co-occurrence knowledge base can include feature based and related entities according to confidence ranking.Can use for right
Feature carries out the various methods linked, and these methods substantially can use and be used for determining which entity type is most important, which tool
Have a weighted model of bigger weight, and these methods determine based on confidence perform to confidence correct
The extraction of feature.
Search system 100 can compare use facing to entity data bak 1004 and trend database 1006 at search engine 1002
Family is inquired about.Can from two data bases, i.e. entity data bak 1004 and trend database 1006 start on search engine 1002 from
Dynamic integrated pattern.Search system 1000 can dispose search suggestion lists 1008, each reality can being based upon in data base to user
Body advises that the fuzzy score of distribution generates and index such list.The mark of each entity suggestion can be by search system 1000
Automatically distribute and/or by system operator manual assignment.Based on the mark obtained by each entity, entity can be advised from phase
Close relevant carrys out ranking to less.It addition, can use from the one or more users' in local network and/or the Internet
Trend and enquiry frequency distribute the mark in trend database 1006.
The entity suggestion of each data base can be made comparisons between which, then according to the grade obtained in mark carrys out rope
Draw and ranking, therefore combine searching of the suggestion of the entity in two data bases (entity data bak 1004 and trend database 1006)
Rope suggestion lists 1008 can be displayed to user.If user selects suggestion from list or selects another knot from suggestion lists
Really, the most such information can be saved in trend database 1006 by search system 1000.Therefore, self study system can be allowed
System, this reliability that can increase search system 1000 and precision.In short, by the feature and selected extracted from the inquiry of user
The suggestion selected, the most more new trend co-occurrence knowledge base, thus the means of instant learning are provided, this improves the relevant of search
Property and precision.Further, trend co-occurrence knowledge base can be filled by the different user of the system of use, and also by such as becoming
The automated process of gesture detection module is filled.
Figure 11 is the block diagram of the search system 1100 according to the disclosure.Search system 1100 can include search engine 1102,
Such search engine 1102 can include one or more user interface, thus allows the data from user to input, such as, use
Family is inquired about.
Search system 1100 can include one or more data base.Such data base can include entity data bak 1104 He
Trend database 1106.Data base can be stored in home server or in network server.Therefore, search system
In 1100 frameworks being implemented in client/server type;But, search system 1100 can use other computer architectures
(such as, stand-alone computer, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) and multiple network (example
As, LAN, wide area network, the Internet, wireless network and mobile telephone network etc.) realize.
In one embodiment, the input of the user interface in user is by search engine 1102 (in search inquiry)
During individual or multiple entity, search system 1100 can start.The example of search inquiry can be in string data form and structuring
The crucial contamination of data etc..These key words can be the reality representing people, tissue, geographical position, date and/or time
Body.In the present embodiment, " Indiana Na " is used as search inquiry.
" Indiana Na " can be subsequently processed for entity extraction.Entity extraction module can by search inquiry such as
" Indiana Na " is processed as entity, and by all of which facing to the entity in entity data bak 1104 and trend database 1106
Co-occurrence knowledge base is made comparisons, to extract entity as much as possible and to eliminate the ambiguity of entity as much as possible.It addition, be detected as not
It is taken as the inquiry body part of entity (such as, people, tissue, position) and can be used for searching entities co-occurrence knowledge base (such as,
Entity data bak and trend database) concept characteristic (such as theme, the fact, key phrase).During milking, one can be used
Individual or multiple feature recognition and extraction algorithms.Additionally, mark can be assigned to the feature of each extraction, this mark indicates by just
True attribute correctly extracts feature qualitative level really.Consider characteristic attribute, it may be determined that the relative weighting of each feature or phase
Guan Xing.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
In the present embodiment, entity data bak 1104 can show search suggestion lists, as can the reality of indexed and ranking
Body suggestion lists 1108.Trend database 1106 can show search suggestion lists, as can indexed and ranking based on trend
Suggestion lists 1110.It follows that search system 1100 can provide based on by entity data bak 1104 and trend database 1106
Search suggestion and set up search suggestion lists 1112.Search suggestion lists 1112 can be based on each entity in each data base
Suggestion independent mark and indexed and ranking;Therefore, maximally related can first show, less relevant result can be below it
Continue.
In search system 1100, disclose the exemplary use for obtaining search suggestion.Search suggestion lists 1112
Suggestion can be shown based on " Indiana Na " user inquiry.As a result, " Indiana Name " can only based on for this entity
Found mark 0.9 and first occur, then as the result of independent mark 0.8, can show " Indiana Nascar ", finally can base
" Indiana Nashville " is shown in independent mark 0.7.In the case of not applying consideration to repeat entity, entity can be used
Independent mark is made comparisons by suggestion lists 1108 and suggestion lists based on trend 1110.
Figure 12 is the block diagram of the search system 1200 according to the disclosure.Search system 1200 can include search engine 1202,
Such search engine 1202 can include one or more user interface, thus allows the data from user to input, such as, use
Family is inquired about.
Search system 1200 can include one or more data base.Such data base can include entity data bak 1204 He
Trend database 1206.Data base can be stored in home server or in network server.Therefore, search system
System 1200 is implemented in the framework of client/server type;But, search system 1200 can use other computer racks
Structure (such as, stand-alone computer, there are the mainframe system of terminal, ASP model and peer-to-peer model etc.) and multiple network (example
As, LAN, wide area network, the Internet, wireless network and mobile telephone network etc.) realize.
In one embodiment, the user interface in user is by search engine 1202 inputs one or more entities
Time (search inquiry), search system 1200 can start.The example of search inquiry can be that key word is such as gone here and there and structural data
Deng combination.These key words can be the entity representing people, tissue, geographical position, date and/or time.At the present embodiment
In, " Indiana Na " is used as search inquiry.
" Indiana Na " can be subsequently processed for entity extraction.Entity extraction module can by search inquiry such as
" Indiana Na " is processed as entity, and by all of which facing to the entity in entity data bak 1204 and trend database 1206
Co-occurrence knowledge base is made comparisons, to extract entity as much as possible and to eliminate the ambiguity of entity as much as possible.It addition, be detected as not
It is taken as the query text part of entity (such as, people, tissue, position) and can be used for searching entities co-occurrence knowledge base (such as,
Entity data bak, trend database) concept characteristic (such as theme, the fact, key phrase).During milking, one can be used
Individual or multiple feature recognition and extraction algorithms.Additionally, mark can be assigned to the feature of each extraction, this mark indicates by just
True attribute correctly extracts feature qualitative level really.Based on corresponding characteristic attribute, it may be determined that the relative power of each feature
Weight and/or dependency.It addition, the Rating Model of weighting can be used to determine the dependency of the association between each feature.
In the present embodiment, entity data bak 1204 can show search suggestion lists, i.e. can the most indexed and ranking
Entity suggestion lists 1208.Similarly, trend database 1206 can show search suggestion lists, it is possible to the most indexed and
The suggestion lists based on trend 1210 of ranking.It follows that search system 1200 can be based on by entity data bak 1204 and trend
The search that data base 1206 provides is advised and is set up search suggestion lists 1212.Search suggestion lists 1212 can be based on two data
The gross score of each entity suggestion in storehouse indexes and ranking;Therefore, maximally related can be the most shown, less relevant knot
Fruit can continue with at it.
In search system 1200, disclose the exemplary use for obtaining search suggestion.Search suggestion lists 1212
Suggestion can be shown based on " Indiana Na " user inquiry.As a result, " Indiana Nascar " can be based on from entity suggestion row
Gross score 1.4 that mark 0.8 in table 1208 and the mark 0.6 in suggestion lists based on trend 1210 are sued for peace obtained and first
First occur.Similarly, as the result of gross score 0.9, can show " Indiana Name ", finally can show based on gross score 0.7
Show " Indiana Nashville ".
Figure 13 is at SharePointIn content made the system architecture 1300 of GEOGRAPHICAL INDICATION.Search rope
Drawing 1324 is in the multiple key components allowing for search in SharePoint 1302.SharePointAnother key component allowing for search in 1302 can be content capture, in order to be indexed content.
SharePoint 1302 includes reptile 1304 assembly so as to carry out content capture.
Reptile 1304 can be run through different content source 1306 and be crawled, to increase string metadata attributes to each content.
The example of content source can include but not be to limit: SharePoint content, network file are shared or user or Intranet content.
Reptile 1304 can be configured to the function performing to be securely connected to content source 1306, thus by from source document associations to they
Metadata as the attribute crawled.Reptile 1304 can be configured to fully or incrementally to content crawl.Crawled
The example of attribute can include such as but not be to limit: author, title, date created etc..
SharePointIncluding contents processing 1308 assembly.Contents processing 1308 assembly obtains from reptile 1304
Content preparing content are for index.Contents processing 1308 can relate to linguistics processing (language detection) stage, syntactic analysis
Stage, entity extraction management phase, file format detection-phase based on content, contents processing error reporting stage, natural language
The processing stage of speech and the attribute crawled is mapped to the stage etc. of managed attribute.
Contents processing 1308 can be extended by abundant in content network service (CEWS 1310).By allowing network service
Recalling 1312 and call external web services to perform other action the abundant data attribute crawled, CEWS 1310 makes it possible to
Enough carry out the abundant of contents processing 1308.Network service recall 1312 can be standard Simple Object Access Protocol (SOAP) please
Ask or adjust for enriching any other network service of the structured message of the data that service 1314 clearing houses crawl with entity
Use method.Network service recalls 1312 can include trigger condition, and this trigger condition is configured in abundant in content configuration object, touches
Clockwork spring part controls when call external web services for abundant process.Entity enriches service 1314 and may further determine that the number crawled
According to Doctype, in order to determine can with image (document of scanning, picture etc.) form arrive content.It is in whenever finding
During the content of pictorial form, entity enriches service 1314 and the position of the document crawled can be sent to OCR processes engine 1316, all
As not being such as but to limit: optical character recognition assembly or other image processing modules.OCR processes engine 1316 and can retrieve subsequently
And process image file, and it is converted into text asynchronously.The file 1318 that OCR processed next can again by
It is fed to reptile 1304, in order to be crawled and send back contents processing 1308, and the flow process that works on as text
Remainder.
System architecture 1300 can include outside geographical marker network service 1320 and name entity indicia device service 1322.
GEOGRAPHICAL INDICATION device network service 1320 and name entity indicia device service 1322 can be software modules, and this software module is configured to
As network service application provider and network service is recalled 1312 respond.GEOGRAPHICAL INDICATION device network service 1320 can make
By natural language processing entity extraction technology, machine learning model and other technologies, in order to geographical from the content recognition crawled
Entity also eliminates the ambiguity of geographical entity.Such as, by analyzing being total to statistically of the entity found in dictionary of place name
Existing, GEOGRAPHICAL INDICATION device network service 1320 can eliminate the ambiguity of geographical entity.GEOGRAPHICAL INDICATION device network service 1320 can include can
Data base that link for the content found by reptile 1304, the entity of co-occurrence statistically.Follow identical technology, life
Name entity indicia device service 1322 can be used for extracting other entity or text feature, such as tissue, people or theme.
GEOGRAPHICAL INDICATION device network service 1320 can be analyzed and be sent as inputting a collection of managed of attribute by CEWS 1310
Attribute also identifies any geographical entity mentioned in the text.The non-limiting example of input attribute comprises the steps that file type
(FileType), Is document (IsDocument), original path (OriginalPath) and main body etc..By referring to find
The attribute that each geographical entity creates or amendment is managed, text can be done geographical mark by GEOGRAPHICAL INDICATION device network service 1320 subsequently
Note.Modified or new management attribute can be sent to entity and enrich service 1314 by GEOGRAPHICAL INDICATION device network service 1320,
Makes conversion at the abundant service 1314 of entity, modified the managed attribute of this conversion map and using them as output attribute
Return to CEWS 1310.Identical process can be used for mutual with name entity indicia device service 1322, for other entities or
The extraction of text feature (such as, tissue, people or theme) and entity indicia.
After enriched the attribute managed that service 1314 return is enhanced by entity, attribute and the file crawled
The attribute managed merge and be sent to search for index 1324.
Once GEOGRAPHICAL INDICATION and other entity indicia with relevance indexed, then it be also possible to use geographical feature and
The substance feature named is to perform search inquiry.SharePointIn search UI 1326 can include being particularly shown
Device, this display can assist user to perform based on geographical search and the enhanced display of support facet Search Results.Search
Rope UI 1326 can be self-defined web page portions, or can also pass through conventional tool (such as HTML, HTML5,
JavaScript and CSS) amendment SharePointThe standard layout of search realizes.
Figure 14 is to illustrate for for SharePointThe stream of the process steps that the content of search is marked
Journey Figure 140 0.Work as SharePointIn reptile assembly perform time content is crawled (step 1402), this process can
Start.In one embodiment, crawling can be to crawl completely, and the most in another embodiment, crawling can be that increment crawls.
The attribute crawled and metadata can be fed to contents processing (step 1404) by reptile assembly subsequently.Make determine with checking crawl
Content whether can include geographical entity or name entity.It not such as but to limit, trigger condition can be used.Trigger condition can be wrapped
Containing batch processing logic or rule, it can determine that whether content can be benefited from GEOGRAPHICAL INDICATION or entity indicia.If trigger condition
Be evaluated as vacation, then the content crawled can with the Attribute Association (step 1406) managed, and be passed to search for indexing component (step
Rapid 1408).If trigger condition is evaluated as very, then network service can be recalled (step 1410) and be sent to the abundant clothes of entity by CEWS
Business.Entity enriches service can analyze sent content, with determine this content whether can be at picture format (document of scanning,
Picture etc.).The content being in picture format found can be processed asynchronously by OCR engine, and is sent back to by crawling group
Part is crawled (step 1412) as text again.If content is not in picture format, then can be marked by geography
Note network service or name entity indicia device service process content (step 1414).Network service can extract and mention in the content
Geographical entity or the entity named eliminate geographical entity or the ambiguity of entity named, and pass through entity metadata
Enrich them.The entity identified and their metadata can be sent back to contents processing assembly as the attribute managed
And with relevance (step 1416).The metadata of association can be subsequently sent to search for indexing component (step 1406).
Although having been disclosed for various aspects and embodiment, it is contemplated that to other aspects and embodiment.Disclosed is each
Individual aspect and embodiment are in order at descriptive purpose rather than are meant as limiting, and its real scope and spirit are wanted by appended right
Ask and indicate.
Preceding method describes and process flow diagram flow chart is only used as illustrated examples and provides, and is not intended to requirement or hint must be with
The order presented is to perform the step of each embodiment.As would be recognized by those skilled in the art, can be to appoint
What order performs the step in previous embodiment.Such as " then ", the word of " next " etc. is not intended to the order of conditioning step;
These words are only used for the description guiding reader to walk circulation method.Although process flow diagram flow chart may describe the operations as sequential process,
But a lot of operations can perform concurrently or simultaneously.It addition, the order of operation can be rearranged.Process may correspond to method,
Function, step, subroutine, subprogram etc..When a process corresponds to a function, it terminates may correspond to this function and returning to call merit
Energy or the function of tonic chord.
Each illustrative components, blocks, module, circuit and the algorithm steps associatedly described with embodiment disclosed herein
Electronic hardware, computer software or a combination of both can be implemented as.Can be mutual in order to clearly demonstrate this of hardware and software
Transsexual, have been described above describing each illustrative components, block, module, circuit and step generally in accordance with their function.
Whether such function is implemented as hardware or software depends on concrete application and applies design over the whole system about
Bundle.For each concrete application, skilled artisan can realize described function in every way, but such reality
Now determine should not be interpreted as causing a departure from the scope of the present invention.
Can by software, firmware, middleware, microcode, hardware description language or their any combination realize by terms of
The embodiment that calculation machine software realizes.Code segment or the executable instruction of machine can represent step, function, subprogram, program, example
Journey, subroutine, module, software kit, kind, or instruction, data structure or any combination of program statement.By transmission and/or
Reception information, data, command line parameter, parameter or memory content, code segment can be coupled to another code segment or hardware electricity
Road.Can transmit by any suitable means (including the transmission of Memory Sharing, message, alternative space, network transmission etc.), forward
Or transmit information, command line parameter, parameter, data etc..
It not limitation of the present invention for realizing actual software code or the special control hardware of these system and methods.
Therefore, descriptive system and the operation of method and behavior in the case of without reference to specific software code, this is understood to, software and
Control hardware to be designed to realize based on system and method described herein.
When implemented in software, function can be stored as storage Jie that non-transitory is computer-readable or processor is readable
One or more instructions in matter or code.The step of method disclosed herein or algorithm can be with the executable software of processor
Module is implemented, and this software module can reside on computer-readable or that processor is readable storage medium.Non-transitory calculates
The medium that machine is readable or processor is readable includes the calculating being easy to computer program from a position transfer to another position
Machine storage medium and tangible media.The readable storage medium of non-transitory processor can be can be accessed by computer
Any available medium.For example, but be not to limit, the readable medium of such non-transitory processor can include RAM,
ROM, EEPROM, CD-ROM or other disk storages, disk memory or other magnetic storage apparatus, or can be used for
The form of instruction or data structure store desired program code and can access by computer or processor any other is tangible
Storage medium.As used herein plate and dish includes compact dish (CD), laser dish, optics dish, Digital Versatile Disc (DVD), soft
Dish and Blu-ray Disc, its mid-game the most magnetically reproduces data, and dish uses laser optics ground to reproduce data.The group of above-mentioned item
Close and also should be included in the range of computer-readable medium.It addition, the operation of method or algorithm can as code and/or
In instruction one or any combination or set and reside in the readable medium of non-transitory processor and/or computer-readable
On medium, and this medium can be incorporated in computer program.
It is to be understood that, each assembly of technology can be located at the multiple remote part of distributed network and/or the Internet
Place, or it is positioned at a special safety, the unsafe and/or system of encryption.It is, therefore, to be understood that, system
Multiple parts can be combined into one or more equipment or to be co-located at of distributed network such as communication network specific
On node.As will understand from describe, and due to computational efficiency, in the situation of the operation not affecting system
Under, the parts of system can be disposed in any position in distributed network.Additionally, parts can be embedded into special machine
In.
Furthermore, it is to be understood that, each link being attached multiple elements can be expired air or wireless
Link, or their any combination, or can by data supply and/or be delivered to connected element and supply and/
Or transmission from the element connected data, known to any other or later developed element.Art as used herein
Language " module " may refer to be able to carry out functional, any of or later developed hardware with this element associated, soft
Part, firmware or combinations thereof.As used herein term " determines ", " calculating " and " computing " and modification thereof make interchangeably
With, and include any kind of method, process, mathematical operation or technology.
The described above, so that any person skilled in the art can make or use of the disclosed embodiments is provided
The present invention.Various amendments to these embodiments will be easily to those skilled in the art it will be evident that and without departing from
In the case of the spirit or scope of the present invention, the general principle limited herein can be applied to other embodiments.Therefore, this
Bright be not intended to be limited to the embodiments shown herein, but give with claim below and principle disclosed herein and
The widest scope that new feature is consistent.
Embodiment described above is intended to exemplary.Those skilled in the art are it is appreciated that multiple selectable unit (SU)
Specific examples described herein alternative with embodiment and in the range of being still within.
Claims (56)
1. a computer implemented method, including:
Included the search inquiry of one or more entity from client computer reception by entity extraction computer;
By described entity extraction computer by each corresponding entity and described corresponding entity in co-occurrence data storehouse one or many
Individual co-occurrence is made comparisons;
Exceed described in response to each corresponding entity in the subset determining one or more entity according to described co-occurrence data storehouse
The confidence in co-occurrence data storehouse, come by described entity extraction computer from described search inquiry extract described first or
The subset of multiple entities, wherein, described confidence based on the one or more related entities in electronic data corpus with
The co-occurrence qualitative extent really of described each corresponding entity;
The each described entity distribution index identifier (index in the multiple entities extracted is given by described entity extraction computer
ID);
By described entity extraction computer, the index ID of each entity in the multiple entities being used for described extraction is saved in
In described electronic data corpus, described electronic data corpus is each by with the one or more related entities
The index ID that entity is corresponding indexes;
By the described electronic data corpus of search server computer search entity index, to position the multiple of described extraction
Entity also identifies the index ID of data record, and in described data record, at least two in multiple entities of described extraction is real
Body co-occurrence;And
Setting up search result list by described search server computer, described search result list has and the rope identified
Draw data record corresponding for ID.
Method the most according to claim 1, farther includes: by described search server computer according to based on confidence
Described search result list is ranked up by the dependency of degree mark;And by described search server computer by ranked
Described search result list pass to subscriber equipment.
Multiple entities of described extraction wherein, are entered by method the most according to claim 1 based on described confidence
Row ranking.
Method the most according to claim 1, wherein, described entity extraction computer is by the entity extracted and entity index
One or more co-occurrence entity associated in described electronic data corpus.
Method the most according to claim 4, wherein, arranges the entity through association according to described confidence
Name.
Method the most according to claim 1, wherein, each entity in the plurality of entity is selected from people, tissue, geographical position
Put, date and time.
7. a system, including:
One or more server computers, have one or more processor, and the one or more processor performs to be used for
The computer-readable instruction of multiple computer modules, the plurality of computer module includes:
Entity extraction module, is configured to receive user's input of search inquiry parameter, and described entity extraction module configures further
For:
By the following from the described multiple entity of search inquiry parameter extraction: each entity the multiple entities that will extract and reality
Making comparisons in body co-occurrence data storehouse, wherein, described entity co-occurrence data storehouse includes confidence, described confidence instruction electricity
The co-occurrence qualitative extent really of the entity of the one or more related entities in subdata corpus and extraction,
Index identifier (index ID) is distributed to each described entity in multiple entities of described extraction,
The index ID of each entity in the multiple entities being used for described extraction is saved in described electronic data corpus, institute
Stating electronic data corpus is to be indexed by the index ID corresponding with each entity in the one or more related entities
's;And
Search server module, is configured to the described electronic data corpus of searching entities index, many to position described extraction
Individual entity also identifies the index ID of data record, at least two in described data record, in multiple entities of described extraction
Entity co-occurrence, described search server module is further configured to set up search result list, and described search result list has
With corresponding for the index ID data record identified.
System the most according to claim 7, wherein, described search server module is further configured to: according to based on institute
Described search result list is ranked up by the dependency stating confidence;And by ranked described search result list
Pass to subscriber equipment.
Multiple entities of described extraction wherein, are entered by system the most according to claim 7 based on described confidence
Row ranking.
System the most according to claim 7, wherein, described entity extraction module is configured that the entity extracted and entity
One or more co-occurrence entity associated in the described electronic data corpus of index.
11. systems according to claim 10, wherein, arrange the entity through association according to described confidence
Name.
12. systems according to claim 7, wherein, each entity in the plurality of entity is selected from people, tissue, geography
Position, date and time.
13. 1 kinds of non-transitory computer-readable medium, on it, storage has the executable instruction of computer, and described instruction includes:
The user being received search inquiry parameter by entity extraction computer is inputted;
By following operation by described entity extraction computer from the described multiple entity of search inquiry parameter extraction: will extract
Multiple entities in each entity make comparisons with entity co-occurrence data storehouse, wherein said entity co-occurrence data storehouse includes confidence level
Mark, the one or more related entities in described confidence instruction electronic data corpus are total to the entity extracted
The most really qualitative extent;
The each described entity distribution index identifier in multiple entities of described extraction is given by described entity extraction computer
(index ID);
By described entity extraction computer, the index ID of each entity in the multiple entities being used for described extraction is saved in
In described electronic data corpus, described electronic data corpus is each by with the one or more related entities
The index ID that entity is corresponding indexes;
By the described electronic data corpus of described search server computer search entity index, to position described extraction
Multiple entities also identify the index ID of data record, in described data record, and at least two in multiple entities of described extraction
Individual entity co-occurrence;And
Setting up search result list by described search server computer, described search result list has and the rope identified
Draw data record corresponding for ID.
14. computer-readable mediums according to claim 13, wherein, described instruction farther includes: searched by described
Described search result list is ranked up by rope server computer according to dependency based on described confidence;And it is logical
Cross described search server computer and ranked described search result list is passed to subscriber equipment.
15. computer-readable mediums according to claim 13, wherein, come described extraction based on described confidence
Multiple entities carry out ranking.
16. computer-readable mediums according to claim 13, wherein, described instruction farther includes: by described reality
Body extracts computer by the entity extracted and the one or more co-occurrence entities in the described electronic data corpus of entity index
Association.
17. computer-readable mediums according to claim 16, wherein, come through association according to described confidence
Entity carries out ranking.
18. computer-readable mediums according to claim 13, wherein, each selected from people, group in the plurality of entity
Knit, geographical position, date and time.
19. 1 kinds of methods, including:
The user receiving search inquiry parameter from user interface by entity extraction computer inputs;
By following operation by described entity extraction computer from the described one or more entity of search inquiry parameter extraction:
Described search inquiry parameter is made comparisons with entity co-occurrence data storehouse, and identifies and or many in described search inquiry parameter
At least one entity type that individual entity is corresponding, wherein, described entity co-occurrence data storehouse has the one or more entity and exists
The example of co-occurrence in electronic data corpus;
Select fuzzy matching algorithm by fuzzy score matching computer, identify for searching for described entity co-occurrence data storehouse
With one or more records of described search inquiry parameter association, wherein, described fuzzy matching algorithm with identified described extremely
A few entity type correspondence;
By described fuzzy score matching computer, use selected fuzzy matching algorithm to search for described entity co-occurrence data
Storehouse, and form one or more proposed search inquiry parameter from based on described search from the one or more record;With
And
The one or more proposed search is presented via described user interface by described fuzzy score matching computer
Query argument.
20. methods according to claim 19, farther include: before described user inputs verdict, by described mould
Stick with paste the fuzzy matching algorithm selected by the use of fractional matching computer and search for described entity co-occurrence data storehouse.
21. methods according to claim 19, wherein, with the one or more note of described search inquiry parameter association
Record includes concept characteristic.
22. methods according to claim 19, wherein, the one or more proposed search inquiry parameter includes many
Search inquiry parameter proposed by individual, described method farther includes: based on the described search inquiry in inputting with described user
The coupling nearness of parameter, by described fuzzy score matching computer to the plurality of proposed search inquiry parameter with fall
Sequence is ranked up.
23. methods according to claim 22, wherein, described fuzzy score matching computer via described user interface with
Drop-down list presents ranked the plurality of proposed search inquiry parameter.
24. methods according to claim 19, wherein, described entity co-occurrence data storehouse is indexed.
25. methods according to claim 1, wherein, described entity co-occurrence data storehouse includes the entity index to entity.
26. methods according to claim 19, wherein, described entity co-occurrence data storehouse includes the entity index to theme.
27. methods according to claim 19, wherein, described entity co-occurrence data storehouse includes that entity arrives the index of the fact.
28. 1 kinds of systems, including:
One or more server computers, have one or more processor, and the one or more processor performs to be used for
The computer-readable instruction of multiple computer modules, the plurality of computer module includes:
Entity extraction module, is configured to receive user's input of search inquiry parameter, described entity extraction module from user interface
It is further configured to:
By following operation from the described one or more entity of search inquiry parameter extraction: by described search inquiry parameter and entity
Making comparisons in co-occurrence data storehouse, and identifies at least one corresponding with the one or more entity in described search inquiry parameter
Entity type, wherein, described entity co-occurrence data storehouse has the one or more entity co-occurrence in electronic data corpus
Example;And
Fuzzy score matching module, is configured to select fuzzy matching algorithm to know for searching for described entity co-occurrence data storehouse
Other one or more records with described search inquiry parameter association, wherein, described fuzzy matching algorithm is described with identified
At least one entity type is corresponding, and described fuzzy score matching module is further configured to:
Use selected fuzzy matching algorithm to search for described entity co-occurrence data storehouse, and based on described search from one or
Multiple records form one or more proposed search inquiry parameter;And
The one or more proposed search inquiry parameter is presented via described user interface.
29. systems according to claim 28, wherein, described fuzzy score matching module is further configured to: described
Before user inputs verdict, use selected fuzzy matching algorithm to search for described entity co-occurrence data storehouse.
30. systems according to claim 28, wherein, with one or more record bags of described search inquiry parameter association
Include concept characteristic.
31. systems according to claim 28, wherein, the one or more proposed search inquiry parameter includes many
Search inquiry parameter proposed by individual, and described fuzzy score matching computer is further configured to: based on described user
The coupling nearness of the described search inquiry parameter in input, is carried out with descending the search inquiry parameter proposed by the plurality of
Sequence.
32. systems according to claim 32, wherein, described fuzzy score matching computer is configured that via described use
Interface, family presents ranked the plurality of proposed search inquiry parameter with drop-down list.
33. systems according to claim 28, wherein, described entity co-occurrence data storehouse is indexed.
34. systems according to claim 28, wherein, described entity co-occurrence data storehouse includes the entity index to entity.
35. systems according to claim 28, wherein, described entity co-occurrence data storehouse includes the entity index to theme.
36. systems according to claim 28, wherein, described entity co-occurrence data storehouse includes that entity arrives the index of the fact.
37. 1 kinds of methods, including:
Being inputted from the user of user interface receiving portion search inquiry parameter by entity extraction computer, described part searches is looked into
Ask parameter and there is at least one incomplete search inquiry parameter;
By described partial search query parameter is made comparisons with entity co-occurrence data storehouse, and identify and described partial search query
At least one entity type that one or more first instances in parameter are corresponding, comes by described entity extraction computer from institute
Stating the one or more first instance of partial search query parameter extraction, wherein, described entity co-occurrence data storehouse has described
One or more first instances are the example of co-occurrence in electronic data corpus;
Select fuzzy matching algorithm by fuzzy score matching computer, know for searching for described entity co-occurrence data storehouse
Not with one or more records of described partial search query parameter association, wherein, described fuzzy matching algorithm with identified
At least one entity type is corresponding;
Described entity co-occurrence data storehouse is searched for by the fuzzy matching algorithm selected by the use of described fuzzy score matching computer,
And form the search inquiry parameter of one or more first suggestion from the one or more record based on described search;
Searching of the one or more the first suggestion is presented via described user interface by described fuzzy score matching computer
Rope query argument;
The user being received the search inquiry parameter to the one or more the first suggestion by described entity extraction computer is selected
Select, to form complete search inquiry parameter;
By described entity extraction computer from the described complete one or more second instance of search inquiry parameter extraction;
By searching entities co-occurrence data storehouse described in described entity extraction computer, to identify and the one or more the second reality
One or more entities that body is relevant, thus form the search inquiry parameter of one or more second suggestion;And
The search presenting the one or more the second suggestion via described user interface by described entity extraction computer is looked into
Ask parameter.
38., according to the method described in claim 37, farther include: before described user inputs verdict, by described mould
Stick with paste the fuzzy matching algorithm selected by the use of fractional matching computer and search for described entity co-occurrence data storehouse.
39. according to the method described in claim 37, wherein, with one or more notes of described partial search query parameter association
Record includes concept characteristic.
40. according to the method described in claim 37, and wherein, the search inquiry parameter of the one or more the first suggestion includes
The search inquiry parameter of multiple first suggestions, described method farther includes: based on the part searches in inputting with described user
The coupling nearness of query argument, is joined the search inquiry of the plurality of first suggestion by described fuzzy score matching computer
Number is ranked up with descending.
41. methods according to claim 40, wherein, described fuzzy score matching computer via user interface with drop-down
List presents the search inquiry parameter of ranked the plurality of first suggestion.
42. according to the method described in claim 37, and wherein, described entity co-occurrence data storehouse is indexed.
43. according to the method described in claim 37, and wherein, described entity co-occurrence data storehouse includes the entity index to entity.
44. according to the method described in claim 37, and wherein, described entity co-occurrence data storehouse includes the entity index to theme.
45. according to the method described in claim 37, and wherein, described entity co-occurrence data storehouse includes that entity arrives the index of the fact.
46. 1 kinds of systems, including:
One or more server computers, have one or more processor, and the one or more processor performs to be used for
The computer-readable instruction of multiple computer modules, the plurality of computer module includes:
Entity extraction module, is configured to input from the user of user interface receiving portion search inquiry parameter, described part searches
Query argument has at least one incomplete search inquiry parameter, and described entity extraction module is further configured to:
By described partial search query parameter is made comparisons with entity co-occurrence data storehouse, and identify and described partial search query
At least one entity type that one or more first instances in parameter are corresponding, comes from described partial search query parameter extraction
The one or more first instance, wherein, described entity co-occurrence data storehouse has the one or more first instance at electricity
The example of co-occurrence in subdata corpus;And
Fuzzy score matching module, be configured to select fuzzy matching algorithm, for search for described entity co-occurrence data storehouse thus
Identify and one or more records of described partial search query parameter association, wherein, described fuzzy matching algorithm with identified
At least one entity type described corresponding, described fuzzy score matching module is further configured to:
Use selected fuzzy matching algorithm to search for described entity co-occurrence data storehouse, and based on described search from one or
Multiple records form the search inquiry parameter of one or more first suggestion;And present one via described user interface
Or multiple first suggestion search inquiry parameter;
Wherein, described entity extraction module is further configured to:
The user receiving the search inquiry parameter to the one or more the first suggestion selects, to form complete search inquiry
Parameter;
From the described complete one or more second instance of search inquiry parameter extraction;
Search for described entity co-occurrence data storehouse, to identify the one or more realities relevant to the one or more second instance
Body, thus form the search inquiry parameter of one or more second suggestion;And
The search inquiry parameter of the one or more the second suggestion is presented via described user interface.
47. systems according to claim 46, wherein, described fuzzy score matching module is further configured to: described
Before user inputs verdict, selected fuzzy matching algorithm is used to search for described entity co-occurrence data storehouse.
48. systems according to claim 46, wherein, with one or more notes of described partial search query parameter association
Record includes concept characteristic.
49. systems according to claim 46, wherein, the search inquiry parameter of the one or more the first suggestion includes
The search inquiry parameter of multiple first suggestions, described fuzzy score matching module is further configured to: based on defeated with described user
The coupling nearness of the described partial search query parameter in entering, to the plurality of first search inquiry parameter advised with descending
It is ranked up.
50. systems according to claim 49, wherein, described fuzzy score matching computer is configured to by described user
Interface presents the search inquiry parameter of ranked the plurality of first suggestion with drop-down list.
51. systems according to claim 46, wherein, described entity co-occurrence data storehouse is indexed.
52. systems according to claim 46, wherein, described entity co-occurrence data storehouse includes the entity index to entity.
53. systems according to claim 46, wherein, described entity co-occurrence data storehouse includes the entity index to theme.
54. systems according to claim 46, wherein, described entity co-occurrence data storehouse includes that entity arrives the index of the fact.
55. 1 kinds of computer implemented methods, including:
Included the search inquiry of one or more serial data, wherein, each corresponding entity from search engine reception by computer
Corresponding with the subset of the one or more serial data;
Make comparisons, by described computer identification institute facing to entity data bak and trend database based on by one or more entities
State the one or more entity in one or more serial data;
By in described the one or more serial data of computer identification, be identified as not corresponding with at least one entity
One or more features;
By described computer based on matching algorithm by each characteristic allocation in the one or more feature to one
Or at least one entity in multiple entity;
It is based upon the mark of each individual features distribution distributing to corresponding entity, gives each corresponding entity by described computer
Mark is extracted in distribution;
Received from entity data bak by described computer and comprise the first search listing of one or more entity, one or
Multiple entities have the extraction mark mark in threshold distance away from each corresponding entity;
Received from trend database by described computer and comprise the second search listing of one or more entity, one or
Multiple entities have the extraction mark mark in threshold distance away from each corresponding entity;
The list after the gathering including described first search listing and described second search listing is generated by described computer, its
In, according to the mark of each corresponding aggregate list, the entity of the list after described gathering is carried out ranking;And
Proposed search is provided according to the list after described gathering by described computer.
56. 1 kinds of computer implemented methods, including:
The multiple data streams associated respectively are received with multiple data sources by computer;
The array of the attribute associated with each respective stream of data is generated by described computer;
In response to described COMPUTER DETECTION to the trigger condition with the data association of data stream:
Geodata by the generation of described computer with the data association of described data stream;
It is not detected by the trigger condition for data source in response to described computer:
By described computer, the Array Mapping being used for the attribute of described data source is managed to a group associated with search index
The attribute of reason;And
Type in response to the content determining data source is view data:
In metadata, perform optical character recognition routine by described computer, described metadata with from described data sources
Data association;And
Retrieved from the number after the renewal of described data source from the network service identified by described metadata by described computer
According to stream, wherein, described data source associates with the network service identified by described metadata.
Applications Claiming Priority (11)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361910907P | 2013-12-02 | 2013-12-02 | |
US201361910900P | 2013-12-02 | 2013-12-02 | |
US201361910894P | 2013-12-02 | 2013-12-02 | |
US201361910905P | 2013-12-02 | 2013-12-02 | |
US61/910,900 | 2013-12-02 | ||
US61/910,907 | 2013-12-02 | ||
US61/910,894 | 2013-12-02 | ||
US61/910,905 | 2013-12-02 | ||
US201461947652P | 2014-03-04 | 2014-03-04 | |
US61/947,652 | 2014-03-04 | ||
PCT/US2014/067997 WO2015084759A1 (en) | 2013-12-02 | 2014-12-02 | Systems and methods for in-memory database search |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106164889A true CN106164889A (en) | 2016-11-23 |
Family
ID=53274014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201480072953.7A Pending CN106164889A (en) | 2013-12-02 | 2014-12-02 | System and method for internal storage data library searching |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP3077918A4 (en) |
JP (1) | JP2017504105A (en) |
KR (1) | KR20160124079A (en) |
CN (1) | CN106164889A (en) |
CA (1) | CA2932401A1 (en) |
WO (1) | WO2015084759A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991181A (en) * | 2017-04-07 | 2017-07-28 | 广州视源电子科技股份有限公司 | Method and device for extracting spoken sentences |
CN107643835A (en) * | 2017-10-19 | 2018-01-30 | 北京京东尚科信息技术有限公司 | Drop-down word determines method, apparatus, electronic equipment and storage medium |
CN107832459A (en) * | 2017-11-27 | 2018-03-23 | 公安部交通管理科学研究所 | The system and method that knowledge base content based on distributed network environment shares study |
CN108932248A (en) * | 2017-05-24 | 2018-12-04 | 苏宁云商集团股份有限公司 | A kind of search realization method and system |
CN109753517A (en) * | 2018-12-06 | 2019-05-14 | 北京明略软件系统有限公司 | A kind of method, apparatus, computer storage medium and the terminal of information inquiry |
CN110245357A (en) * | 2019-06-26 | 2019-09-17 | 北京百度网讯科技有限公司 | Principal recognition methods and device |
CN110347699A (en) * | 2019-06-26 | 2019-10-18 | 北京明略软件系统有限公司 | Determine the method and device of identity card related entities liveness |
CN110471886A (en) * | 2018-05-09 | 2019-11-19 | 富士施乐株式会社 | For based on detection desk around file and people come the system of search file and people |
CN112740196A (en) * | 2018-09-20 | 2021-04-30 | 华为技术有限公司 | Recognition model in artificial intelligence system based on knowledge management |
CN114900422A (en) * | 2021-01-26 | 2022-08-12 | 瞻博网络公司 | Enhanced chat interface for network management |
US12040934B1 (en) | 2021-12-17 | 2024-07-16 | Juniper Networks, Inc. | Conversational assistant for obtaining network information |
US12132622B2 (en) | 2022-10-28 | 2024-10-29 | Juniper Networks, Inc. | Enhanced conversation interface for network management |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10296627B2 (en) | 2015-08-18 | 2019-05-21 | Fiserv, Inc. | Generating integrated data records by correlating source data records from disparate data sources |
CN109964224A (en) * | 2016-09-22 | 2019-07-02 | 恩芙润斯公司 | System, method and the computer-readable medium that significant associated time signal is inferred between life science entity are visualized and indicated for semantic information |
CN106599547A (en) * | 2016-11-23 | 2017-04-26 | 中山健康医疗信息技术有限公司 | Intelligent medical knowledge base management system based on tags |
JP6971104B2 (en) * | 2017-09-20 | 2021-11-24 | ヤフー株式会社 | Information processing equipment, information processing methods, and programs |
WO2019235103A1 (en) * | 2018-06-07 | 2019-12-12 | 日本電信電話株式会社 | Question generation device, question generation method, and program |
US11487902B2 (en) | 2019-06-21 | 2022-11-01 | nference, inc. | Systems and methods for computing with private healthcare data |
CN112487214B (en) * | 2020-12-23 | 2024-06-04 | 中译语通科技股份有限公司 | Knowledge graph relation extraction method and system based on entity co-occurrence matrix |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079070A (en) * | 2006-05-26 | 2007-11-28 | 国际商业机器公司 | Computer and method for response of information query |
US20080306908A1 (en) * | 2007-06-05 | 2008-12-11 | Microsoft Corporation | Finding Related Entities For Search Queries |
US20090327223A1 (en) * | 2008-06-26 | 2009-12-31 | Microsoft Corporation | Query-driven web portals |
CN103186556A (en) * | 2011-12-28 | 2013-07-03 | 北京百度网讯科技有限公司 | Method for obtaining and searching structural semantic knowledge and corresponding device |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6965900B2 (en) * | 2001-12-19 | 2005-11-15 | X-Labs Holdings, Llc | Method and apparatus for electronically extracting application specific multidimensional information from documents selected from a set of documents electronically extracted from a library of electronically searchable documents |
US8438142B2 (en) * | 2005-05-04 | 2013-05-07 | Google Inc. | Suggesting and refining user input based on original user input |
JP4922692B2 (en) * | 2006-07-28 | 2012-04-25 | 富士通株式会社 | Search query creation device |
EP2291778A4 (en) * | 2008-06-14 | 2011-09-21 | Corp One Ltd | Searching using patterns of usage |
US8631004B2 (en) * | 2009-12-28 | 2014-01-14 | Yahoo! Inc. | Search suggestion clustering and presentation |
JP5256273B2 (en) * | 2010-11-24 | 2013-08-07 | ヤフー株式会社 | Intention extraction apparatus, method and program |
US20120143875A1 (en) * | 2010-12-01 | 2012-06-07 | Yahoo! Inc. | Method and system for discovering dynamic relations among entities |
JP5426526B2 (en) * | 2010-12-21 | 2014-02-26 | 日本電信電話株式会社 | Probabilistic information search processing device, probabilistic information search processing method, and probabilistic information search processing program |
EP2788907A2 (en) * | 2011-12-06 | 2014-10-15 | Perception Partners Inc. | Text mining analysis and output system |
WO2013170343A1 (en) * | 2012-05-15 | 2013-11-21 | Whyz Technologies Limited | Method and system relating to salient content extraction for electronic content |
-
2014
- 2014-12-02 KR KR1020167017516A patent/KR20160124079A/en not_active Application Discontinuation
- 2014-12-02 WO PCT/US2014/067997 patent/WO2015084759A1/en active Application Filing
- 2014-12-02 EP EP14867913.7A patent/EP3077918A4/en not_active Withdrawn
- 2014-12-02 CA CA2932401A patent/CA2932401A1/en not_active Abandoned
- 2014-12-02 CN CN201480072953.7A patent/CN106164889A/en active Pending
- 2014-12-02 JP JP2016536900A patent/JP2017504105A/en not_active Ceased
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079070A (en) * | 2006-05-26 | 2007-11-28 | 国际商业机器公司 | Computer and method for response of information query |
US20080306908A1 (en) * | 2007-06-05 | 2008-12-11 | Microsoft Corporation | Finding Related Entities For Search Queries |
US20090327223A1 (en) * | 2008-06-26 | 2009-12-31 | Microsoft Corporation | Query-driven web portals |
CN103186556A (en) * | 2011-12-28 | 2013-07-03 | 北京百度网讯科技有限公司 | Method for obtaining and searching structural semantic knowledge and corresponding device |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991181A (en) * | 2017-04-07 | 2017-07-28 | 广州视源电子科技股份有限公司 | Method and device for extracting spoken sentences |
CN106991181B (en) * | 2017-04-07 | 2020-04-21 | 广州视源电子科技股份有限公司 | Method and device for extracting spoken sentences |
CN108932248A (en) * | 2017-05-24 | 2018-12-04 | 苏宁云商集团股份有限公司 | A kind of search realization method and system |
CN107643835A (en) * | 2017-10-19 | 2018-01-30 | 北京京东尚科信息技术有限公司 | Drop-down word determines method, apparatus, electronic equipment and storage medium |
CN107832459A (en) * | 2017-11-27 | 2018-03-23 | 公安部交通管理科学研究所 | The system and method that knowledge base content based on distributed network environment shares study |
CN110471886A (en) * | 2018-05-09 | 2019-11-19 | 富士施乐株式会社 | For based on detection desk around file and people come the system of search file and people |
CN112740196A (en) * | 2018-09-20 | 2021-04-30 | 华为技术有限公司 | Recognition model in artificial intelligence system based on knowledge management |
CN109753517A (en) * | 2018-12-06 | 2019-05-14 | 北京明略软件系统有限公司 | A kind of method, apparatus, computer storage medium and the terminal of information inquiry |
CN110347699A (en) * | 2019-06-26 | 2019-10-18 | 北京明略软件系统有限公司 | Determine the method and device of identity card related entities liveness |
CN110245357A (en) * | 2019-06-26 | 2019-09-17 | 北京百度网讯科技有限公司 | Principal recognition methods and device |
CN110347699B (en) * | 2019-06-26 | 2022-01-28 | 北京明略软件系统有限公司 | Method and device for determining activity of entity related to identity card |
CN110245357B (en) * | 2019-06-26 | 2023-05-02 | 北京百度网讯科技有限公司 | Main entity identification method and device |
CN114900422A (en) * | 2021-01-26 | 2022-08-12 | 瞻博网络公司 | Enhanced chat interface for network management |
US12040934B1 (en) | 2021-12-17 | 2024-07-16 | Juniper Networks, Inc. | Conversational assistant for obtaining network information |
US12132622B2 (en) | 2022-10-28 | 2024-10-29 | Juniper Networks, Inc. | Enhanced conversation interface for network management |
Also Published As
Publication number | Publication date |
---|---|
EP3077918A1 (en) | 2016-10-12 |
JP2017504105A (en) | 2017-02-02 |
WO2015084759A1 (en) | 2015-06-11 |
KR20160124079A (en) | 2016-10-26 |
EP3077918A4 (en) | 2017-06-07 |
CA2932401A1 (en) | 2015-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106164889A (en) | System and method for internal storage data library searching | |
CN112437917B (en) | Natural language interface for databases using autonomous agents and thesaurus | |
Bizer et al. | Dbpedia-a crystallization point for the web of data | |
KR20160144384A (en) | Context-sensitive search using a deep learning model | |
Kaur et al. | Scholarometer: A social framework for analyzing impact across disciplines | |
AU2011269676A1 (en) | Systems of computerized agents and user-directed semantic networking | |
Nesi et al. | Geographical localization of web domains and organization addresses recognition by employing natural language processing, Pattern Matching and clustering | |
Van Hooland et al. | Evaluating the success of vocabulary reconciliation for cultural heritage collections | |
Weigl et al. | On providing semantic alignment and unified access to music library metadata | |
Lamba et al. | Text Mining for Information Professionals | |
Valentine et al. | EarthCube Data Discovery Studio: A gateway into geoscience data discovery and exploration with Jupyter notebooks | |
Hlava | The Taxobook: Applications, implementation, and integration in search: Part 3 of a 3-part series | |
Wang et al. | AceMap: Knowledge Discovery through Academic Graph | |
Charalabidis et al. | Open data interoperability | |
Musabeyezu | Comparative study of annotation tools and techniques | |
JP5380874B2 (en) | Information retrieval method, program and apparatus | |
Žumer | National bibliographies in the digital age: guidance and new directions | |
ElGindy et al. | Capturing place semantics on the geosocial web | |
JP5652519B2 (en) | Information retrieval method, program and apparatus | |
Kapadia | Web Search Engine Using Ontology | |
Beigzadeh | Component recommendation system | |
Gleich et al. | Some computational tools for digital archive and metadata maintenance | |
Hradec et al. | Semantic text analysis tool: SeTA | |
Elgindy | Extracting place semantics from geo-folksonomies | |
Wu et al. | Recommending Relevant Tutorial Fragments for API-Related Natural Language Questions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20161123 |