CN101894143A - Federated search and search result integrated display method and system - Google Patents

Federated search and search result integrated display method and system Download PDF

Info

Publication number
CN101894143A
CN101894143A CN 201010211359 CN201010211359A CN101894143A CN 101894143 A CN101894143 A CN 101894143A CN 201010211359 CN201010211359 CN 201010211359 CN 201010211359 A CN201010211359 A CN 201010211359A CN 101894143 A CN101894143 A CN 101894143A
Authority
CN
China
Prior art keywords
retrieval
result
structured
search
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010211359
Other languages
Chinese (zh)
Inventor
王仲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING UFIDA SOFTWARE CO LTD
Original Assignee
BEIJING UFIDA SOFTWARE CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING UFIDA SOFTWARE CO LTD filed Critical BEIJING UFIDA SOFTWARE CO LTD
Priority to CN 201010211359 priority Critical patent/CN101894143A/en
Publication of CN101894143A publication Critical patent/CN101894143A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a search and search result display method and a search and search result display system, in particular a federated search and search result integrated display method and a federated search and search result integrated display system. A method and a system for uniform search and natural display of structured data, unstructured data and semi-structured data lack in the prior art. The method comprises: firstly, inputting a search word; secondly, sending a search request to structured, semi-structured and unstructured information sources, searching and positioning information matched with the search word synchronously in the structured semi-structured and unstructured information sources according to the search word, and generating a search result by using all information matched with the search word; thirdly, performing nearreplicas detection, sequencing, sorting and aggregation pretreatment of the search result; and finally, integrally display the pre-treated search result in a natural form. The method and the system are applicable to all information system for storing and processing databases, XML files and text data.

Description

A kind of federal retrieval and integrated exhibiting method of result for retrieval and system
Technical field
The present invention relates to a kind of retrieval and result for retrieval exhibiting method and system, especially relate to a kind of federal retrieval and integrated exhibiting method of result for retrieval and system.
Background technology
In existing search method, mostly only at single structural data, unstructured data or the combination of the two, Shang Weiyou is to comprising the unified retrieval and the exhibiting method of three types of data of semi-structured data.
So-called structural data typically refers to institute's information of managing in the database, comprises the record of aspects such as production, business, transaction.The information that unstructured data is contained is very extensive, normally exist with various forms of content of multimedia, as text class contents such as document, contract, invoice, letters, binary files such as electrical form, briefing file and Email, multimedia form data such as sound, figure, image, video etc.Semi-structured data then is meant with markup languages such as SGML, XML to be the text of carrier format, be usually expressed as the mutually nested hierarchical relationship of a kind of semantic primitive, it is different from the structural data part and is that such data are to exist with textual form, is different from numerical value or content that the unstructured data part is that it indicates each node in the data with specific markers.Self-described characteristic based on markup language, semi-structured data becomes a kind of data type between structuring and unstructured information, can make up, resolve and retrieve it by PC Tools, thereby can in some Intellectual Information System, be used.
(publication number: CN101477568 discloses day to Chinese patent application: the method that 2009.7.8) discloses a kind of structural data and unstructured data integrated retrieval.This method represents the result with keyword (but not in full) search method again by unstructured data being configured, storing in the database behind parsing and the index.This method can't be handled accordingly at the semi-structured data characteristics, it stores into to retrieve in the database to unstructured data again and can cause extra data-switching time overhead and can bring the lot of data storage redundancy in addition, more seriously this method does not have the full-text search mode of the most suitable unstructured data retrieval of utilization, can't satisfy the demand aspect retrieval accuracy.As: method retrieval " China " according to keyword, may retrieve " ... wherein national income accounting is heavy ... ", " big-and-middle Gome " etc., but can't retrieve the synonym of " People's Republic of China (PRC) " this " China ".
Chinese patent application (publication number: CN101341486, open day: 2009.1.7) disclose a kind of method and system that is used for generating automatically multilingual electronic content from unstructured data.These method and system relate to representing of data, and it is to extract and the relevant information of one or more preliminary election themes from unstructured data, handle again according to formulation form generation content by the structuring that information merges.Its defective is the original form that represents of having lost data.
Summary of the invention
At the defective that exists in the prior art, technical matters to be solved by this invention provides in a kind of federal search method that combines based on database, full text and XQuery and the result for retrieval and comprises structuring, the integrated exhibiting method of ecosystem semi-structured and unstructured information.
For solving the problems of the technologies described above, the technical solution used in the present invention is as follows:
A kind of federal retrieval and the integrated exhibiting method of result for retrieval may further comprise the steps:
(1) input term;
(2) parallelly in structuring, semi-structured and unstructured information source, send retrieval request, according to the term information that simultaneously retrieval location and described term are complementary in structuring, semi-structured and unstructured information source, the information composition result for retrieval that all and described term are complementary;
(3) described result for retrieval is carried out integrated representing with the form of ecosystem.
Aforesaid method, in the step (2), retrieval employing standard SQL retrieval mode in structured data sources, the XQuery/XPath retrieval mode is adopted in retrieval in semi-structured information source, and the full-text search mode is adopted in retrieval in the unstructured information source.
Aforesaid method also comprises in the step (2) the disappear operation of weight, ordering, classification, aggregation processing of result for retrieval.
Aforesaid method in the step (3) may further comprise the steps the integrated method that represents of result for retrieval:
1. result for retrieval is analyzed, extracted the routing information that comprises in the result for retrieval;
2. analysis path information judges that in view of the above information source calls form and belong to following which kind of situation: interface, URL, functional fragment or calling system;
3. call form according to information source and take different ways of presentation: for interface shape, the corresponding interface is called in the runnable interface stake; For the URL form, directly forward corresponding URL to; For functional fragment, this functional fragment is introduced container and operation; For the calling system form, directly operation;
4. structuring, raw information semi-structured and the unstructured information source are integrated unified representing with multiwindow, multipage label form.
A kind of federal retrieval and the integrated system that represents of result for retrieval comprise:
Be used to import the input media of term;
Be used for parallel sending retrieval request to structuring, semi-structured and unstructured information source, according to term retrieval location and the term information that is complementary in structuring, semi-structured and unstructured information source simultaneously, and with the be complementary indexing unit of information composition result for retrieval of all and term;
And be used for result for retrieval is carried out the integrated demonstration device that represents with the form of ecosystem.
Aforesaid system, comprise that also being used for to result for retrieval the refitting of handling that disappears that disappears heavily puts, be used for collator that result for retrieval is sorted and handles, be used for sorter that result for retrieval is classified and handled and the polyplant that is used for result for retrieval is carried out aggregation processing.
Effect of the present invention is: compare with existing retrieval mode, the present invention covers the data retrieval of various Heterogeneous Information Sources fully, and at Heterogeneous Information Sources separately characteristics take the optimum search technology, retrieval request adopts parallel processing technique, make that the retrieving performance is good, efficient is high, result for retrieval possesses higher precision ratio and recall ratio.Specifically, for structured data sources, the present invention adopts standard SQL retrieval technique, adopt the XQuery/XPath retrieval mode for semi-structured information source, adopt the full-text search mode for the unstructured information source, and retrieval request is to send to three kinds of different structure information sources simultaneously to carry out parallel processing and receive the return results collection synchronously.Aspect result for retrieval represented, the present invention had adopted more thorough treatment technology, and result for retrieval is taked disappear weight, ordering, classification and aggregation processing respectively, at last to best embody information source integrated representing of ecosystem form of characteristics separately.And existing method is not taked above-mentioned technical finesse before information represents, and is to show with the form after the processing when representing, and has lost the speciality layout and the pattern of Heterogeneous Information Sources, and this point demand be concerned about most of user often.
Description of drawings
Fig. 1 is federal retrieval and the integrated structured flowchart that represents system of result for retrieval described in the embodiment;
Fig. 2 is the main flow chart of federal retrieval described in the embodiment and the integrated exhibiting method of result for retrieval;
Fig. 3 is the particular flow sheet of the integrated exhibiting method of federal result for retrieval described in the embodiment.
Embodiment
Describe the present invention below in conjunction with embodiment and accompanying drawing.
Fig. 1 has shown federal retrieval and the integrated structure that represents system of result for retrieval described in the present embodiment.As shown in Figure 1, this system comprises input media 11, the indexing unit 12 that is connected with input media 11, the structured data sources that is connected with indexing unit 12, semi-structured information source, unstructured information source, and disappear that refitting puts 14, collator 15, sorter 16, polyplant 17 and demonstration device 13.
Input media 11 is used to import term.
Indexing unit 12, be used for parallel sending retrieval request to structuring, semi-structured and unstructured information source, according to term retrieval location and the term information that is complementary in structuring, semi-structured and unstructured information source simultaneously, and with all and the term information composition result for retrieval that is complementary.
Disappearing refitting puts 14, is used for result for retrieval disappeared heavily handling; If described disappearing heavily handled the information that is meant that existence repeats in result for retrieval, then only keep an information.
Collator 15 is used for the result for retrieval processing of sorting; Described ordering processing is meant carries out ascending order or descending sort to result for retrieval by the designated key item of information, so that regular in order retrieval by window result fast.
Sorter 16 is used for the result for retrieval processing of classifying; Described classification processing is meant to be sorted out by one or more catalogue systems result for retrieval, is convenient to the following process utilization of retrieving information.
Polyplant 17 is used for result for retrieval is carried out aggregation processing; Described aggregation processing is meant the internal characteristics grouping by result for retrieval, and it is big as far as possible to form the interior data content similarity of a requirement group, and the as far as possible little set of similarity between group.
Demonstration device 13 is used for result for retrieval is carried out integrated representing with the form of ecosystem.
Fig. 2 has shown retrieval of employing system shown in Figure 1 and the integrated method flow that represents result for retrieval.As shown in Figure 2, this method may further comprise the steps:
(1) in input media 11, imports term.
Term can be a single speech, as " structuring "; Also can be the portmanteau word or the other forms of portmanteau word of a Boolean expression form, as " the semi-structured and destructuring of structuring and ", " the semi-structured or destructuring of structuring or ".
(2) indexing unit 12 is parallel sends retrieval request in structuring, semi-structured and unstructured information source, according to the term information that simultaneously retrieval location and term are complementary in structuring, semi-structured and unstructured information source, the information composition result for retrieval that all and term are complementary.
Retrieval employing standard SQL retrieval mode in structured data sources, the XQuery/XPath retrieval mode is adopted in retrieval in semi-structured information source, and the full-text search mode is adopted in retrieval in the unstructured information source.
Input media 11 is sent to three kinds of Heterogeneous Information Sources simultaneously with retrieval request and carries out parallel processing.
Input media 11 sends to the request of structured data sources and adopts standard SQL standard, after structured data sources treating apparatus (normally data base management system (DBMS) or data base querying engine) receives the retrieval request packet, will look concrete condition data will be carried out pre-service or directly processing.Pre-service mainly is meant the SQL statement of adding certain database expansion or the enhancement function that the certain database version is introduced, and as the prompting of the Hint among the Oracle, memorymodel etc., purpose is to improve the performance and the stability of retrieval.The structuring retrieval is a relational model principle of utilizing database, Database field is set up index, retrieve by the SQL script then, database is analyzed SQL, form inquiry plan, and use corresponding concordance list, thereby reach the purpose of high-level efficiency retrieval according to the index situation of having set up.
SQL (Structured Query Language, Structured Query Language (SQL)) is a kind of data base querying and programming language, is used for access data and inquiry, renewal and administrative relationships Database Systems.Rise in nineteen seventies, go through SQL/86, SQL/89, SQL/92, SQL/99 until up-to-date SQL/2008, it has following characteristics: comprehensive unified, height deproceduring, towards the mode of operation of set, with a kind of syntactic structure two kinds of use-patterns are provided, language is simple and direct is easy to learn and use.Taked DML data manipulation language (DML) in the SQL standard at the retrieval request bag of structured data sources in the present embodiment, shape is as select<col1, col2 ... from<table1, table2 ...〉the Chinese % ' and of where Title like ' price<30.
Input media 11 sends to the request in unstructured information source and adopts global search technology, after unstructured information source treating apparatus (normally full-text search engine) receives the retrieval request packet, automatically the parsing search condition is gone forward side by side, and lang is adopted to be analyzed, term is split as participial construction by dictionary, commit unit retrieves the full-text search index database that is pre-created, and at last result for retrieval is filtered, gathers and format and return the requesting party after handling.
The full-text search mode is directly to utilize Chinese word segmentation and semantic analysis technology, and outside unstructured data file (Local or Remote) is created full-text index, and utilizes search engine to provide retrieval service from the mode of interface aspect by function call.Utilize the advantage of full-text search on big the text field, not only can break away from database and directly from file system, obtain information needed, improve the adaptability and the performance of retrieval greatly, and can put forward the specific search mechanism of full-text search engine, as the retrieval of: individual character, phrase retrieval, whole sentence retrieval, paragraph retrieval, in abutting connection with retrieval, weight retrieval, multiple domain retrieval, conjunctive search, expression formula retrieval, synonym retrieval, antonym retrieval etc.Adopt the full-text search request package of class SQL in the present embodiment, shape is as select Title, FileUrl, Frequency, PosInFile from IndexLib whereContent:(China).
Input media 11 sends to the request of semi-structured information source and adopts the XQuery/XPath technology, after semi-structured information source treating apparatus (normally XML Query query engine) receives the retrieval request packet, automatically resolve search condition, submission retrieves XML document or thesaurus, at last result for retrieval is returned the requesting party.Taked the XQuery/XPath standard of W3C in the present embodiment at the retrieval request bag of semi-structured information source, shape as:
let $reportxml:=document(′report.xml′)
for $report?in?$reportxml//author[age?lt?30]
order?by$author/name
return?$report/title/text()
XPath and XQuery are the query languages of being released by W3C international organization, it is XML Query query pattern descriptive language towards the XML data, the user can describe own interested pattern by it, user's pattern is transferred to actual XML data handling system handle, return the result who is complementary with pattern.Its body feature is by adopting regular path expression and then obtaining structural relation and content between the XML data cell.Wherein, XPath is a basic language of realizing XML data traversal, is the basis of XQuery, is the complete and inalienable part of XQuery.XQuery is a kind of high-end, strongly-typed functional language, and it can be handled expression formula, the transformation result collection of more complicated record alternative condition or carry out recursive query by the characteristic of XPath.The XQuery code is made up of expression formula fully, does not have statement, and all values all are sequences, is the preferred plan of Query XML document or large-scale XML thesaurus (repository).
The above-mentioned information that is complementary with term that retrieve in three kinds of different structure information sources is formed result for retrieval.For described result for retrieval, can carry out pre-service, as the processing such as weight, ordering, classification and polymerization that disappear.If described disappearing heavily handled the information that is meant that existence repeats in result for retrieval, then only keep an information.Described ordering processing is meant carries out ascending order or descending sort to result for retrieval by the designated key item of information, so that regular in order retrieval by window result fast.Described classification processing is meant to be sorted out by one or more catalogue systems result for retrieval, is convenient to the following process utilization of retrieving information.Described aggregation processing is meant the internal characteristics grouping by result for retrieval, and it is big as far as possible to form the interior data content similarity of a requirement group, and the as far as possible little set of similarity between group.
(3) demonstration device 13 carries out integrated representing with result for retrieval with the form of ecosystem.
Demonstration device 13 represents result for retrieval by calling modes such as data place system interface or function URL path so that the ecosystem form is integrated.Fig. 3 has shown the integrated method flow that represents of result for retrieval.
The difference that represents Technical Architecture with information source, the form of expression, interface mode of result for retrieval and difference is broadly divided into following several:
A., calling interface is provided, and interface is forms such as Web Service, API, RMI normally, can be and platform, language independent as Web Service, also can be and particular platform, language binding, as Java API, C++API.
B. provide and call URL, URL is the URI with protocol-independent based on HTTP URL more widely.
C., functional fragment is provided, and functional fragment can be a functional module of certain application system, normally is embedded in container or the standard card cage, as Portlet.
D. provide calling system, but described calling system is normally through encapsulating the compact applications system of independent operating.
As shown in Figure 3, the concrete process that represents may further comprise the steps:
1. 13 pairs of pretreated result for retrieval of process of demonstration device are analyzed, and extract the routing information that wherein comprises.
2. demonstration device 13 analysis path information judge that in view of the above information source calls form and belong to following which kind of situation: interface, URL, functional fragment or calling system.
3. demonstration device 13 calls form according to information source and takes different ways of presentation: for interface shape, the corresponding interface is called in the runnable interface stake; For the URL form, directly redirect or Forward are to corresponding URL; For functional fragment, then this segment is introduced demonstration device 13 built-in containers and also moved; For the calling system form, then directly operation.
4. demonstration device 13 is signed (horizontal or vertical) form with the raw information of various Heterogeneous Information Sources with multiwindow (nested or stacked), multipage and is integrated unified representing.
Embodiment
(1) input term.
In the present embodiment, term is made as " company ", retrieves from being arranged in relevant database Oracle, XBRL business report thesaurus and Microsoft Word document storehouse respectively.See the following form:
Information source Information source type The information source parameter Retrieve statement
Relevant database Oracle Structural data Tables of data: ReportLib search field: Title search condition: comprise " company " Select Title, ReportID from ReportLib where Title like " % of % company "
XBRL business report thesaurus Semi-structured data Storage library name: report.xml retrieval node: xbrl/context search condition: doc (" report.xml ")/xbrl/co ntext[substring (csrc-common, 1,2)=' company '] Let $reportxml:=doc (' report.xml ') for $report in $reportxml/xbrl/context[substring (csrc-common, 1,2)=' company '] order by $report/xbrl/context/csrc-co mmon return<td>{ $report//context/csrc-c
?ommon}</td><td>{$report/x?brl/csrc-pfs/text()}</td>
Microsof t Word document storehouse Unstructured data Full-text index library name: IndexLib searching object: text retrieval condition: comprise " company " Select Title, FileUrl, Frequency, PosI nFile from IndexLib where Content:(company)
(2) to the parallel retrieval request of submitting in relevant database Oracle, XBRL business report thesaurus and Microsoft Word document storehouse, the information that retrieval location and term are complementary in three kinds of information sources, and the information composition result for retrieval that all and term are complementary.
Submit to retrieve statement behind relevant database, oracle database returns the result set that meets search condition, the result for retrieval one of shape such as following table:
Title ReportID
Software company's 2009 annual financial reports 101
The first quarter is sold performance by manufacturing industry company 110
Joint-stock company's annual work plans 150
Joint-stock company's annual work plans 150
Submit to retrieve statement behind XBRL business report thesaurus, after XML Query search engine receives the retrieval request packet, resolve search condition automatically, retrieve thesaurus, at last result for retrieval is returned the requesting party, shape such as following result for retrieval two:
<td〉energy company of Shanghai Stock Exchange</td〉<td〉2464527,202,500</td 〉
<td〉build throw energy company</td<td 4,734,452,100</td
Submit to retrieve statement behind Microsoft Word document full-text index storehouse, full-text search engine is resolved the search condition lang justice analysis of going forward side by side automatically, term is split as participial construction by dictionary, commit unit retrieves the full-text search index database that is pre-created, and at last result for retrieval is filtered, gathers and format and return the requesting party after handling.Shape such as following result for retrieval three:
10 in listed company's annual report " "
Stock markets of Shanghai and Shenzhen listed company annual report in 2009 was all come out of the stove by April 30, and the financial report data are varied, and the annual report achievement is risen and fallen.Each is opened under the annual report types of facial makeup in Beijing operas, all shines upon different mental state and situation ...
Listing bank of 14 families earned 4,348 hundred million last year altogether
By today, the listing bank annual report in 2009 that all sprouts.Show that according to the information statistics listing banks of 14 families realize belonging to the shareholder's of parent company 4348.33 hundred million yuan of net profits altogether, increase by 16.45% on a year-on-year basis.......
(3) result for retrieval is carried out pre-service.
Native system provide disappearing that refitting puts 14, collator 15, sorter 16,17 pairs of federal result for retrieval of polyplant carry out pre-service, promptly federal result for retrieval carried out processing before integrated representing.With result for retrieval one is example:
Disappear heavily: have two repeating datas, ReportID=150﹠amp in the result for retrieval one; The annual work plans of Title=joint-stock company are heavily merged into one after the processing through disappearing.
Ordering: be followed successively by ReportID=150﹠amp by three data after the arrangement of dictionary ascending order; The annual work plans of Title=joint-stock company, ReportID=101﹠amp; Title=software company 2009 annual financial reports, ReportID=110﹠amp; The first quarter is sold performance by Title=manufacturing industry company.
Classification:, above-mentioned result for retrieval can be grouped into software industry (ReportID=101﹠amp respectively according to the system intialization classification tree; Title=software company 2009 annual financial reports), manufacturing industry (ReportID=110﹠amp; The first quarter is sold performance by Title=manufacturing industry company), unfiled (ReportID=150﹠amp; The annual work plans of Title=joint-stock company).
Polymerization: at the classification (ReportID=150﹠amp that fails to discern that exists in the classification processing of front; The annual work plans of Title=joint-stock company), polyplant 17 will recomputate the similarity of sample and by the internal characteristics of result for retrieval it be divided into groups, and form new sorted group---joint-stock company's class.At last with result for retrieval (ReportID=150﹠amp; The annual work plans of Title=joint-stock company) cluster becomes joint-stock company's class.
(4) represent pretreated result for retrieval is integrated.
13 pairs of pretreated result for retrieval of above-mentioned process of demonstration device are analyzed, and extract the URL routing information that wherein comprises, and judge the form of calling of information source, and then take different ways of presentation.In the present embodiment, it is as shown in the table that result for retrieval integrated represents form:
Sequence number Information source Result for retrieval Call form Exhibiting method
1 Relevant database Oracle ReportID=150﹠Title=joint-stock company's annual work plans Interface (new webservice()).invoke (150)
2 Relevant database Oracle ReportID=101﹠Title=software company's 2009 annual financial reports Interface (new webservice()).invoke (101)
3 Relevant database Oracle The first quarter is sold performance by ReportID=110﹠Title=manufacturing industry company Interface (new webservice()).invoke (110)
4 XBRL business report thesaurus <td>Build and throw energy company</td><td>4,734,452,100</td> URL http://xbrl.com/XBR L /info.jsp?stkid=0006 00& year=2005&reportTy pe=GB0110
5 XBRL business report thesaurus <td>Energy company of Shanghai Stock Exchange</td><td>2464527,202,50 0</td> URL http://xbrl.com/XBR L /info.jsp?stkid=0000 32& year=2005&reportTy pe=GB0110
6 Microsoft Word document storehouse Listing bank of 14 families earned 4,348 hundred million last year altogether Functional fragment Opening document in embedded OLE container
7 Microsoft Word document storehouse 10 in listed company's annual report " " Functional fragment Opening document in embedded OLE container
Demonstration device 13 is signed (horizontal or vertical) form with the raw information of various Heterogeneous Information Sources with multiwindow (nested or stacked), multipage according to above-mentioned exhibiting method and is integrated unified representing.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technology thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (10)

1. a federation is retrieved and the integrated exhibiting method of result for retrieval, may further comprise the steps:
(1) input term;
(2) parallelly in structuring, semi-structured and unstructured information source, send retrieval request, according to the term information that simultaneously retrieval location and described term are complementary in structuring, semi-structured and unstructured information source, the information composition result for retrieval that all and described term are complementary;
(3) described result for retrieval is carried out integrated representing with the form of ecosystem.
2. the method for claim 1, it is characterized in that: in the step (2), retrieval employing standard SQL retrieval mode in structured data sources, the XQuery/XPath retrieval mode is adopted in retrieval in semi-structured information source, and the full-text search mode is adopted in retrieval in the unstructured information source.
3. the method for claim 1 is characterized in that: also comprise the operation that result for retrieval is disappeared and heavily handles in the step (2).
4. the method for claim 1 is characterized in that: also comprise the operation that result for retrieval is sorted and handles in the step (2).
5. the method for claim 1 is characterized in that: also comprise the operation that result for retrieval is classified and handled in the step (2).
6. the method for claim 1, it is characterized in that: step also comprises the operation of result for retrieval being carried out aggregation processing in (2).
7. as the described method of one of claim 1 to 6, it is characterized in that: in the step (3) the integrated method that represents of result for retrieval be may further comprise the steps:
1. result for retrieval is analyzed, extracted the routing information that comprises in the result for retrieval;
2. analysis path information judges that in view of the above information source calls form and belong to following which kind of situation: interface, URL, functional fragment or calling system;
3. call form according to information source and take different ways of presentation: for interface shape, the corresponding interface is called in the runnable interface stake; For the URL form, directly forward corresponding URL to; For functional fragment, this functional fragment is introduced container and operation; For the calling system form, directly operation;
4. the raw information with the result for retrieval in structuring, semi-structured and unstructured information source integrates unified representing.
8. method as claimed in claim 7 is characterized in that: step 4. described in the raw information of result for retrieval integrate unified representing with multiwindow, multipage label form.
9. a federation is retrieved and the integrated system that represents of result for retrieval, comprising:
Be used to import the input media (11) of term;
Be used for parallel sending retrieval request to structuring, semi-structured and unstructured information source, according to term retrieval location and the term information that is complementary in structuring, semi-structured and unstructured information source simultaneously, and with the be complementary indexing unit (12) of information composition result for retrieval of all and term;
And be used for result for retrieval is carried out the integrated demonstration device that represents (13) with the form of ecosystem.
10. system as claimed in claim 9, it is characterized in that: described system comprises that also being used for to result for retrieval the refitting of handling that disappears that disappears heavily puts (14), be used for collator (15) that result for retrieval is sorted and handles, be used for sorter (16) that result for retrieval is classified and handled and the polyplant (17) that is used for result for retrieval is carried out aggregation processing.
CN 201010211359 2010-06-28 2010-06-28 Federated search and search result integrated display method and system Pending CN101894143A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010211359 CN101894143A (en) 2010-06-28 2010-06-28 Federated search and search result integrated display method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010211359 CN101894143A (en) 2010-06-28 2010-06-28 Federated search and search result integrated display method and system

Publications (1)

Publication Number Publication Date
CN101894143A true CN101894143A (en) 2010-11-24

Family

ID=43103335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010211359 Pending CN101894143A (en) 2010-06-28 2010-06-28 Federated search and search result integrated display method and system

Country Status (1)

Country Link
CN (1) CN101894143A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020093A (en) * 2011-09-28 2013-04-03 上海证券交易所 Method and system for recording and displaying extensible business reporting language (XBRL) information disclosure report
CN103455641A (en) * 2013-09-29 2013-12-18 方正国际软件有限公司 Crossing repeated retrieval system and method
WO2014033724A1 (en) * 2012-08-29 2014-03-06 Hewlett-Packard Development Company L.P. Querying structured and unstructured databases
CN103891244A (en) * 2012-09-04 2014-06-25 华为技术有限公司 Method and device for storing and retrieving data
CN104160394A (en) * 2011-12-23 2014-11-19 阿米亚托股份有限公司 Scalable analysis platform for semi-structured data
CN104391945A (en) * 2014-11-28 2015-03-04 厦门市美亚柏科信息股份有限公司 Method and device for processing database file data index
CN104750812A (en) * 2015-03-30 2015-07-01 浪潮集团有限公司 Automatic data collecting method based on webpage label analysis
CN106649863A (en) * 2016-12-30 2017-05-10 天津市测绘院 Non-structured data management method and apparatus
CN107656909A (en) * 2017-10-30 2018-02-02 北京明朝万达科技股份有限公司 A kind of Documents Similarity decision method and device based on document composite character
CN109635160A (en) * 2018-12-03 2019-04-16 四川长虹电器股份有限公司 A kind of implementation method of the quick-searching based on XBRL
CN111192690A (en) * 2019-12-24 2020-05-22 泰康保险集团股份有限公司 Medical data retrieval method, medical data retrieval device, electronic equipment and medium
WO2021184572A1 (en) * 2020-03-20 2021-09-23 平安国际智慧城市科技股份有限公司 Query method and apparatus, computer device and storage medium
CN115577034A (en) * 2022-11-21 2023-01-06 中国电子信息产业集团有限公司 Federal computing system and method based on data system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924860A (en) * 2006-10-08 2007-03-07 网之易信息技术(北京)有限公司 Search engine based search result fast pre-reading device
CN1987853A (en) * 2005-12-23 2007-06-27 北大方正集团有限公司 Searching method for relational data base and full text searching combination
CN101183376A (en) * 2007-12-07 2008-05-21 武汉达梦数据库有限公司 XML data-base enquiring method based on relation algebra range arithmetic
CN101719156A (en) * 2009-12-30 2010-06-02 南开大学 System of seamless integrated pure XML query engine in relational database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1987853A (en) * 2005-12-23 2007-06-27 北大方正集团有限公司 Searching method for relational data base and full text searching combination
CN1924860A (en) * 2006-10-08 2007-03-07 网之易信息技术(北京)有限公司 Search engine based search result fast pre-reading device
CN101183376A (en) * 2007-12-07 2008-05-21 武汉达梦数据库有限公司 XML data-base enquiring method based on relation algebra range arithmetic
CN101719156A (en) * 2009-12-30 2010-06-02 南开大学 System of seamless integrated pure XML query engine in relational database

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020093A (en) * 2011-09-28 2013-04-03 上海证券交易所 Method and system for recording and displaying extensible business reporting language (XBRL) information disclosure report
CN103020093B (en) * 2011-09-28 2015-11-25 上海证券交易所 The recording of XBRL information announcing report and methods of exhibiting and system thereof
CN104160394B (en) * 2011-12-23 2017-08-15 亚马逊科技公司 Scalable analysis platform for semi-structured data
CN104160394A (en) * 2011-12-23 2014-11-19 阿米亚托股份有限公司 Scalable analysis platform for semi-structured data
CN107451225A (en) * 2011-12-23 2017-12-08 亚马逊科技公司 Scalable analysis platform for semi-structured data
WO2014033724A1 (en) * 2012-08-29 2014-03-06 Hewlett-Packard Development Company L.P. Querying structured and unstructured databases
CN104541267A (en) * 2012-08-29 2015-04-22 惠普发展公司,有限责任合伙企业 Querying structured and unstructured databases
CN103891244A (en) * 2012-09-04 2014-06-25 华为技术有限公司 Method and device for storing and retrieving data
CN103891244B (en) * 2012-09-04 2016-11-16 华为技术有限公司 A kind of method and device carrying out data storage and search
CN103455641A (en) * 2013-09-29 2013-12-18 方正国际软件有限公司 Crossing repeated retrieval system and method
CN103455641B (en) * 2013-09-29 2017-02-22 北大医疗信息技术有限公司 Crossing repeated retrieval system and method
CN104391945A (en) * 2014-11-28 2015-03-04 厦门市美亚柏科信息股份有限公司 Method and device for processing database file data index
CN104391945B (en) * 2014-11-28 2018-04-10 厦门市美亚柏科信息股份有限公司 The treating method and apparatus of database file data directory
CN104750812A (en) * 2015-03-30 2015-07-01 浪潮集团有限公司 Automatic data collecting method based on webpage label analysis
CN106649863A (en) * 2016-12-30 2017-05-10 天津市测绘院 Non-structured data management method and apparatus
CN107656909A (en) * 2017-10-30 2018-02-02 北京明朝万达科技股份有限公司 A kind of Documents Similarity decision method and device based on document composite character
CN109635160A (en) * 2018-12-03 2019-04-16 四川长虹电器股份有限公司 A kind of implementation method of the quick-searching based on XBRL
CN109635160B (en) * 2018-12-03 2022-05-03 四川长虹电器股份有限公司 Method for realizing rapid retrieval based on XBRL
CN111192690A (en) * 2019-12-24 2020-05-22 泰康保险集团股份有限公司 Medical data retrieval method, medical data retrieval device, electronic equipment and medium
CN111192690B (en) * 2019-12-24 2023-11-17 泰康保险集团股份有限公司 Medical data retrieval method, device, electronic equipment and medium
WO2021184572A1 (en) * 2020-03-20 2021-09-23 平安国际智慧城市科技股份有限公司 Query method and apparatus, computer device and storage medium
CN115577034A (en) * 2022-11-21 2023-01-06 中国电子信息产业集团有限公司 Federal computing system and method based on data system

Similar Documents

Publication Publication Date Title
CN101894143A (en) Federated search and search result integrated display method and system
Shigarov et al. Rule-based spreadsheet data transformation from arbitrary to relational tables
Kiryakov et al. Semantic annotation, indexing, and retrieval
Cafarella et al. Web-scale extraction of structured data
CN1845104B (en) System and method for intelligent retrieval and processing of information
US9064004B2 (en) Extensible surface for consuming information extraction services
CN101201838A (en) Method for improving searching engine based on keyword index using phrase index technique
CN103365914A (en) Database query system and method based on search engine
US9063957B2 (en) Query systems
Yafooz et al. Managing unstructured data in relational databases
Abramowicz et al. Filtering the Web to feed data warehouses
Kozaki et al. Understanding semantic web applications
CN102930030A (en) Ontology-based intelligent semantic document indexing reasoning system
Graubitz et al. The DIAsDEM framework for converting domain-specific texts into XML documents with data mining techniques
Manica et al. Handling temporal information in web search engines
Lehmberg Web table integration and profiling for knowledge base augmentation
Graubitz et al. Semantic tagging of domain-specific text documents with DIAsDEM
Wang et al. Normalized Storage Model Construction and Query Optimization of Book Multi-Source Heterogeneous Massive Data
Feng et al. An XML-enabled data mining query language: XML-DMQL
Nassis et al. A systematic design approach for XML-view driven web document warehouses
Kong et al. Word File Parsing Based On Python
Novello et al. Empowering Natural Language Interfaces to Databases with Aggregations
Tran Process-oriented Semantic Web Search
Becker Effective databases for text & document management
Tan et al. A Joint Entity-Relation Detection and Generalization Method Based on Syntax and Semantics for Chinese Intangible Cultural Heritage Texts

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20101124