US20080259084A1 - Method and apparatus for organizing data sources - Google Patents
Method and apparatus for organizing data sources Download PDFInfo
- Publication number
- US20080259084A1 US20080259084A1 US12/163,485 US16348508A US2008259084A1 US 20080259084 A1 US20080259084 A1 US 20080259084A1 US 16348508 A US16348508 A US 16348508A US 2008259084 A1 US2008259084 A1 US 2008259084A1
- Authority
- US
- United States
- Prior art keywords
- sources
- attributes
- communities
- vertices
- cliques
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99951—File or database maintenance
- Y10S707/99952—Coherency, e.g. same view to multiple users
- Y10S707/99953—Recoverability
Definitions
- the present disclosure relates to data processing, more specifically, to a method for collecting and organizing the metadata about on-line or World Wide Web (Web) databases, which for example can support search, navigation and exploration of these Web databases.
- Web World Wide Web
- a method and apparatus for organizing data sources are provided.
- the method and apparatus organize Web services and allow access to deep Web databases.
- the method and apparatus in one aspect store, model and analyze the deep Web services to obtain knowledge that can be used to navigate, explore and query web services.
- the method of organizing data sources includes grouping a plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources into a plurality of cliques of highly correlated items.
- the method also includes clustering the plurality of cliques into one or more signatures. For each of the one or more signatures, the method includes selecting one or more sources that are associated with a signature and forming the selected sources into a community.
- the method in one aspect may also include constructing a graph representation of a plurality of communities.
- the graph representation includes at least a plurality of vertices representing the plurality of communities respectively and one or more edges connecting the plurality of vertices.
- the one or more edges represent one or more input attributes, output attributes, or keywords or combination thereof that are shared between the communities represented in the connecting vertices.
- the step of grouping may be performed using a hyperclique mining algorithm.
- the step of clustering may be performed using a hierarchical agglomerative clustering algorithm.
- the method may further include obtaining the plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources using a crawling algorithm.
- An apparatus for organizing data sources includes a means for grouping a plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources into a plurality of cliques of highly correlated items.
- the apparatus also includes a means for clustering the plurality of cliques into one or more signatures and a means for selecting one or more sources that are associated with a signature and forming the selected sources into a community for each of the one or more signatures.
- the apparatus further includes a means for constructing a graph representation of a plurality of communities, the graph representation including at least a plurality of vertices representing the plurality of communities respectively and one or more edges connecting the plurality of vertices, the one or more edges representing one or more input attributes, output attributes, or keywords or combination thereof that are shared between the communities represented in the connecting vertices.
- FIG. 1 is a block diagram illustrating the data flow in one embodiment of the present disclosure.
- the present application discloses a method and apparatus for organizing deep Web services that provide access to online databases.
- the method and apparatus in one embodiment store interface information of deep Web services and perform analysis on the interface information in order to organize the information to support non-trivial queries.
- interface information of Web services is obtained by crawling the Web. Any other known or will-be-known method may be used to gather the interface information of Web services.
- An exemplary embodiment of the present disclosure models on-line databases using three entities: a source is the database itself, an attribute is one of the input attributes of the query interface for a source; a community is a collection of sources that logically belong together. A community can contain many sources. A source can be associated with many attributes. An attribute can be associated with several input modes. For example, an attribute can be a text box or combo box on the query interface.
- a crawling technology may be used to obtain a collection of sources and their associated attributes and input modes.
- the method and apparatus use this information to organize the sources into communities.
- a mining algorithm such as the hyperclique mining algorithm is used to obtain cliques of highly correlated attributes.
- a clustering algorithm such as the hierarchical agglomerative clustering algorithm is used to further cluster the cliques of attributes into larger cliques, which in the present disclosure is referred to as signatures.
- the sources that are associated with each signature form a community and a graph representation of the communities is constructed, where the vertices are communities and the edges are the shared attributes.
- FIG. 1 is a block diagram illustrating the data flow of the present disclosure in one embodiment.
- the collection 110 of data sources 112 together with their associated attributes 114 and keywords 116 are obtained for example using a crawling algorithm or any other information gathering method.
- Attributes for example are input and output attributes appearing on a Web query forms. Examples of attributes include but are not limited to departure data, departure city, airline flights, prices, etc., which for instance may appear on an airline query form.
- Keywords 116 are, for example, texts appearing on a Web query form or interface.
- a mining algorithm such as the hyperclique mining algorithm 120 is used to obtain a collection 130 of cliques of highly correlated attributes 132 and/or keywords 134 from the sources 110 .
- the hyperclique mining algorithm is described in detail in Y. Huang, H. Xiong, W. Wu, and Z. Zhang, A hybrid Approach for Mining Maximal Hyperclique Patterns.
- a hyperclique pattern refers to a type of association pattern that contains items that are highly affiliated with each other. The presence of an item in one transaction strongly implies the presence of every other item that belongs to the same clique.
- a clustering algorithm such as the hierarchical agglomerative clustering algorithm 140 is used to further cluster the cliques of attributes into larger cliques, which in the present disclosure is referred to as signatures.
- the hierarchical agglomerative clustering algorithm is described in detail in A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31(3):264-323, 1999.
- a hierarchy 150 of sources and communities is generated, for instance, using the clustering algorithm.
- the sources 152 that are associated with each signature form a community 154 .
- a graph construction method or tool is used at 160 to provide a graph representation of communities 170 where the edges represent one or more sets of shared attributes between the communities. For instance, community 176 and community 178 share the attributes 174 .
- the graph can be navigated by starting at a vertex (community) and following the outgoing edges to related communities. For instance, starting at a source, all associated vertices (communities) can be found and navigated by following the edges to related communities. In addition, navigation may start from an attribute or keyword, finding all associated edges and traversing to incident vertices.
- the method and apparatus of the present disclosure in an exemplary embodiment provide searching capabilities, not only in Web pages and their content, but also in Web services and their interfaces.
- the method and apparatus of the present disclosure in an exemplary embodiment allow for beyond keyword-based and hyperlink-based processing by exploiting the co-occurrence information of the input attributes of deep Web services to “mine” out the non-trivial knowledge for answering queries.
- the method and apparatus of the present disclosure in an exemplary embodiment provide a common infrastructure to organize data sources by grouping similar data sources together and discovering relationships between data sources or groups, for example, shared attributes.
- the system and method of the present disclosure may be implemented and run on a general-purpose computer or computer system.
- the computer system may be any type of known or will be known systems and may typically include a processor, memory device, a storage device, input/output devices, internal buses, and/or a communications interface for communicating with other computer systems in conjunction with communication hardware and software, etc.
- the terms “computer system” as may be used in the present application may include a variety of combinations of fixed and/or portable computer hardware, software, peripherals, and storage devices.
- the computer system may include a plurality of individual components that are networked or otherwise linked to perform collaboratively, or may include one or more stand-alone components.
- the hardware and software components of the computer system of the present application may include and may be included within fixed and portable devices such as desktop, laptop, and server.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A method and apparatus for organizing deep Web services are provided. In one aspect, the method and apparatus obtains a collection of sources and their associated attributes and/or input modes, for instance, using a crawling algorithm. The method and apparatus uses this information to organize the sources into communities. A mining algorithm such as the hyperclique mining algorithm is used to obtain cliques of highly correlated attributes. A clustering algorithm such as the hierarchical agglomerative clustering algorithm is used to further cluster the cliques of attributes into larger cliques, which in the present disclosure is referred to as signatures. The sources that are associated with each signature form a community and a graph representation of the communities is constructed, where the vertices are communities and the edges are the shared attributes.
Description
- This application is a continuation of U.S. Ser. No. 11/503,713, filed on Aug. 14, 2006, the entire contents of which are incorporated herein by reference.
- The present disclosure relates to data processing, more specifically, to a method for collecting and organizing the metadata about on-line or World Wide Web (Web) databases, which for example can support search, navigation and exploration of these Web databases.
- As the Internet becomes increasingly driven by Web services and on-line databases, the simple keyword search interfaces provided by most Web search engines are no longer adequate for searching, navigating an exploring web services and on-line databases. When navigating or searching for Web services, a user is often more interested in Web services from a particular domain, or Web services containing a particular set of input attributes, or Web services that are similar to a given example of a Web service. For example, a user may want to find all on-line bookstore databases, or all on-line databases with an input for ISBN and price, or all on-line databases similar to “amazon.com.” Current search engine technology does not support such queries.
- A method and apparatus for organizing data sources are provided. In one aspect, the method and apparatus organize Web services and allow access to deep Web databases. The method and apparatus in one aspect store, model and analyze the deep Web services to obtain knowledge that can be used to navigate, explore and query web services.
- The method of organizing data sources, in one aspect, includes grouping a plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources into a plurality of cliques of highly correlated items. The method also includes clustering the plurality of cliques into one or more signatures. For each of the one or more signatures, the method includes selecting one or more sources that are associated with a signature and forming the selected sources into a community.
- The method in one aspect may also include constructing a graph representation of a plurality of communities. The graph representation includes at least a plurality of vertices representing the plurality of communities respectively and one or more edges connecting the plurality of vertices. The one or more edges represent one or more input attributes, output attributes, or keywords or combination thereof that are shared between the communities represented in the connecting vertices.
- The step of grouping may be performed using a hyperclique mining algorithm. The step of clustering may be performed using a hierarchical agglomerative clustering algorithm. The method may further include obtaining the plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources using a crawling algorithm.
- An apparatus for organizing data sources, in one aspect, includes a means for grouping a plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources into a plurality of cliques of highly correlated items. The apparatus also includes a means for clustering the plurality of cliques into one or more signatures and a means for selecting one or more sources that are associated with a signature and forming the selected sources into a community for each of the one or more signatures. The apparatus further includes a means for constructing a graph representation of a plurality of communities, the graph representation including at least a plurality of vertices representing the plurality of communities respectively and one or more edges connecting the plurality of vertices, the one or more edges representing one or more input attributes, output attributes, or keywords or combination thereof that are shared between the communities represented in the connecting vertices.
- Further features as well as the structure and operation of various embodiments are described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.
-
FIG. 1 is a block diagram illustrating the data flow in one embodiment of the present disclosure. - The present application discloses a method and apparatus for organizing deep Web services that provide access to online databases. The method and apparatus in one embodiment store interface information of deep Web services and perform analysis on the interface information in order to organize the information to support non-trivial queries. In one embodiment of the present disclosure, interface information of Web services is obtained by crawling the Web. Any other known or will-be-known method may be used to gather the interface information of Web services.
- An exemplary embodiment of the present disclosure models on-line databases using three entities: a source is the database itself, an attribute is one of the input attributes of the query interface for a source; a community is a collection of sources that logically belong together. A community can contain many sources. A source can be associated with many attributes. An attribute can be associated with several input modes. For example, an attribute can be a text box or combo box on the query interface.
- A crawling technology may be used to obtain a collection of sources and their associated attributes and input modes. The method and apparatus in an exemplary embodiment use this information to organize the sources into communities. In an exemplary embodiment, a mining algorithm such as the hyperclique mining algorithm is used to obtain cliques of highly correlated attributes. A clustering algorithm such as the hierarchical agglomerative clustering algorithm is used to further cluster the cliques of attributes into larger cliques, which in the present disclosure is referred to as signatures. The sources that are associated with each signature form a community and a graph representation of the communities is constructed, where the vertices are communities and the edges are the shared attributes.
-
FIG. 1 is a block diagram illustrating the data flow of the present disclosure in one embodiment. Thecollection 110 ofdata sources 112 together with their associatedattributes 114 andkeywords 116 are obtained for example using a crawling algorithm or any other information gathering method. Attributes for example are input and output attributes appearing on a Web query forms. Examples of attributes include but are not limited to departure data, departure city, airline flights, prices, etc., which for instance may appear on an airline query form.Keywords 116 are, for example, texts appearing on a Web query form or interface. - In an exemplary embodiment, a mining algorithm such as the
hyperclique mining algorithm 120 is used to obtain acollection 130 of cliques of highly correlatedattributes 132 and/orkeywords 134 from thesources 110. The hyperclique mining algorithm is described in detail in Y. Huang, H. Xiong, W. Wu, and Z. Zhang, A hybrid Approach for Mining Maximal Hyperclique Patterns. A hyperclique pattern refers to a type of association pattern that contains items that are highly affiliated with each other. The presence of an item in one transaction strongly implies the presence of every other item that belongs to the same clique. - A clustering algorithm such as the hierarchical
agglomerative clustering algorithm 140 is used to further cluster the cliques of attributes into larger cliques, which in the present disclosure is referred to as signatures. The hierarchical agglomerative clustering algorithm is described in detail in A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31(3):264-323, 1999. Ahierarchy 150 of sources and communities is generated, for instance, using the clustering algorithm. Thesources 152 that are associated with each signature form acommunity 154. - A graph construction method or tool is used at 160 to provide a graph representation of
communities 170 where the edges represent one or more sets of shared attributes between the communities. For instance,community 176 andcommunity 178 share theattributes 174. In one embodiment, the graph can be navigated by starting at a vertex (community) and following the outgoing edges to related communities. For instance, starting at a source, all associated vertices (communities) can be found and navigated by following the edges to related communities. In addition, navigation may start from an attribute or keyword, finding all associated edges and traversing to incident vertices. - Unlike the known search engine techniques, the method and apparatus of the present disclosure in an exemplary embodiment provide searching capabilities, not only in Web pages and their content, but also in Web services and their interfaces. The method and apparatus of the present disclosure in an exemplary embodiment allow for beyond keyword-based and hyperlink-based processing by exploiting the co-occurrence information of the input attributes of deep Web services to “mine” out the non-trivial knowledge for answering queries. The method and apparatus of the present disclosure in an exemplary embodiment provide a common infrastructure to organize data sources by grouping similar data sources together and discovering relationships between data sources or groups, for example, shared attributes.
- The system and method of the present disclosure may be implemented and run on a general-purpose computer or computer system. The computer system may be any type of known or will be known systems and may typically include a processor, memory device, a storage device, input/output devices, internal buses, and/or a communications interface for communicating with other computer systems in conjunction with communication hardware and software, etc.
- The terms “computer system” as may be used in the present application may include a variety of combinations of fixed and/or portable computer hardware, software, peripherals, and storage devices. The computer system may include a plurality of individual components that are networked or otherwise linked to perform collaboratively, or may include one or more stand-alone components. The hardware and software components of the computer system of the present application may include and may be included within fixed and portable devices such as desktop, laptop, and server.
- The embodiments described above are illustrative examples and it should not be construed that the present invention is limited to these particular embodiments. Thus, various changes and modifications may be effected by one skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.
Claims (7)
1. A computer-implemented method of organizing data sources, comprising:
grouping a plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources into a plurality of cliques of highly correlated items;
clustering the plurality of cliques into one or more signatures; and
for each of the one or more signatures,
selecting one or more sources that are associated with a signature; and
forming the selected sources into a community.
2. The method of claim 1 , further including:
constructing a graph representation of a plurality of communities, the graph representation including at least a plurality of vertices representing the plurality of communities respectively and one or more edges connecting the plurality of vertices, the one or more edges representing one or more input attributes, output attributes, or keywords or combination thereof that are shared between the communities represented in the connecting vertices.
3. The method of claim 2 , further including:
navigating the graph representation including at least one of:
starting at a vertex representing a community, following one or more edges to one or more second vertices representing one or more related communities;
starting at a source, traversing to one or more associated vertices; and
starting from an attribute or a keyword or combination thereof, traversing one or more associated edges and connected vertices.
4. The method of claim 1 , wherein the step of grouping is performed using a hyperclique mining algorithm.
5. The method of claim 1 , wherein the step of clustering is performed using a hierarchical agglomerative clustering algorithm.
6. The method of claim 1 , further including:
obtaining the plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources using a crawling algorithm.
7. An apparatus for organizing data sources, comprising:
a means for grouping a plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources into a plurality of cliques of highly correlated items; and
a means for clustering the plurality of cliques into one or more signatures,
a means for selecting one or more sources that are associated with a signature and forming the selected sources into a community for each of the one or more signatures; and
a means for constructing a graph representation of a plurality of communities, the graph representation including at least a plurality of vertices representing the plurality of communities respectively and one or more edges connecting the plurality of vertices, the one or more edges representing one or more input attributes, output attributes, or keywords or combination thereof that are shared between the communities represented in the connecting vertices.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/163,485 US20080259084A1 (en) | 2006-08-14 | 2008-06-27 | Method and apparatus for organizing data sources |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/503,713 US7529740B2 (en) | 2006-08-14 | 2006-08-14 | Method and apparatus for organizing data sources |
US12/163,485 US20080259084A1 (en) | 2006-08-14 | 2008-06-27 | Method and apparatus for organizing data sources |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/503,713 Continuation US7529740B2 (en) | 2006-08-14 | 2006-08-14 | Method and apparatus for organizing data sources |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080259084A1 true US20080259084A1 (en) | 2008-10-23 |
Family
ID=39052065
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/503,713 Active 2027-01-28 US7529740B2 (en) | 2006-08-14 | 2006-08-14 | Method and apparatus for organizing data sources |
US12/163,485 Abandoned US20080259084A1 (en) | 2006-08-14 | 2008-06-27 | Method and apparatus for organizing data sources |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/503,713 Active 2027-01-28 US7529740B2 (en) | 2006-08-14 | 2006-08-14 | Method and apparatus for organizing data sources |
Country Status (1)
Country | Link |
---|---|
US (2) | US7529740B2 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110087646A1 (en) * | 2009-10-08 | 2011-04-14 | Nilesh Dalvi | Method and System for Form-Filling Crawl and Associating Rich Keywords |
US20110113063A1 (en) * | 2009-11-09 | 2011-05-12 | Bob Schulman | Method and system for brand name identification |
US20120226463A1 (en) * | 2011-03-02 | 2012-09-06 | Nokomis, Inc. | System and method for physically detecting counterfeit electronics |
US8380735B2 (en) | 2001-07-24 | 2013-02-19 | Brightplanet Corporation II, Inc | System and method for efficient control and capture of dynamic database content |
CN107305490A (en) * | 2016-04-22 | 2017-10-31 | 中国移动通信集团湖南有限公司 | A kind of metadata groupings method and device |
US9887721B2 (en) | 2011-03-02 | 2018-02-06 | Nokomis, Inc. | Integrated circuit with electromagnetic energy anomaly detection and processing |
WO2018109243A1 (en) * | 2016-12-16 | 2018-06-21 | Telefonica Digital España, S.L.U. | Method, system and computer program products for recognising, validating and correlating entities in a communications darknet |
WO2018121854A1 (en) * | 2016-12-28 | 2018-07-05 | Khalifa University Of Science, Technology And Research | Methods and systems for searching |
CN109948019A (en) * | 2019-01-10 | 2019-06-28 | 中央财经大学 | A kind of deep layer Network Data Capture method |
US10448864B1 (en) | 2017-02-24 | 2019-10-22 | Nokomis, Inc. | Apparatus and method to identify and measure gas concentrations |
US11489847B1 (en) | 2018-02-14 | 2022-11-01 | Nokomis, Inc. | System and method for physically detecting, identifying, and diagnosing medical electronic devices connectable to a network |
US11586674B2 (en) | 2016-12-28 | 2023-02-21 | Khalifa University of Science and Technology | Methods and systems for searching |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006084102A2 (en) * | 2005-02-03 | 2006-08-10 | Musicstrands, Inc. | Recommender system for identifying a new set of media items responsive to an input set of media items and knowledge base metrics |
EP1844386A4 (en) | 2005-02-04 | 2009-11-25 | Strands Inc | System for browsing through a music catalog using correlation metrics of a knowledge base of mediasets |
US8429184B2 (en) | 2005-12-05 | 2013-04-23 | Collarity Inc. | Generation of refinement terms for search queries |
US7756855B2 (en) * | 2006-10-11 | 2010-07-13 | Collarity, Inc. | Search phrase refinement by search term replacement |
US8903810B2 (en) | 2005-12-05 | 2014-12-02 | Collarity, Inc. | Techniques for ranking search results |
BRPI0620084B1 (en) | 2005-12-19 | 2018-11-21 | Apple Inc | method for identifying individual users in a defined user community, based on comparing the first user's profile with other user profiles, for a first community member, and method for measuring individual user similarity for a first user in a defined user community. users |
US20070244880A1 (en) * | 2006-02-03 | 2007-10-18 | Francisco Martin | Mediaset generation system |
KR20080100342A (en) | 2006-02-10 | 2008-11-17 | 스트랜즈, 아이엔씨. | Dynamic interactive entertainment |
WO2007103923A2 (en) * | 2006-03-06 | 2007-09-13 | La La Media, Inc | Article trading process |
US8442972B2 (en) * | 2006-10-11 | 2013-05-14 | Collarity, Inc. | Negative associations for search results ranking and refinement |
US20080215416A1 (en) * | 2007-01-31 | 2008-09-04 | Collarity, Inc. | Searchable interactive internet advertisements |
US20090228296A1 (en) * | 2008-03-04 | 2009-09-10 | Collarity, Inc. | Optimization of social distribution networks |
US8438178B2 (en) * | 2008-06-26 | 2013-05-07 | Collarity Inc. | Interactions among online digital identities |
US8601003B2 (en) | 2008-09-08 | 2013-12-03 | Apple Inc. | System and method for playlist generation based on similarity data |
US20100169328A1 (en) * | 2008-12-31 | 2010-07-01 | Strands, Inc. | Systems and methods for making recommendations using model-based collaborative filtering with user communities and items collections |
US20110060738A1 (en) | 2009-09-08 | 2011-03-10 | Apple Inc. | Media item clustering based on similarity data |
EP2488960A4 (en) * | 2009-10-15 | 2016-08-03 | Hewlett Packard Entpr Dev Lp | Heterogeneous data source management |
US8875038B2 (en) | 2010-01-19 | 2014-10-28 | Collarity, Inc. | Anchoring for content synchronization |
US8983905B2 (en) | 2011-10-03 | 2015-03-17 | Apple Inc. | Merging playlists from multiple sources |
CN103257981B (en) * | 2012-06-12 | 2016-04-13 | 苏州大学 | Deep Web data surfacing method based on query interface attribute characteristics |
CN104281714A (en) * | 2014-10-29 | 2015-01-14 | 南通大学 | Hospital portal website clinic specialist information extracting system |
US10949387B1 (en) * | 2016-09-29 | 2021-03-16 | Triad National Security, Llc | Scalable filesystem enumeration and metadata operations |
US10936653B2 (en) | 2017-06-02 | 2021-03-02 | Apple Inc. | Automatically predicting relevant contexts for media items |
CN112560504B (en) * | 2021-02-24 | 2021-06-11 | 北京庖丁科技有限公司 | Method, electronic equipment and computer readable medium for extracting information in form document |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050246321A1 (en) * | 2004-04-30 | 2005-11-03 | Uma Mahadevan | System for identifying storylines that emegre from highly ranked web search results |
US20060089947A1 (en) * | 2001-08-31 | 2006-04-27 | Dan Gallivan | System and method for dynamically evaluating latent concepts in unstructured documents |
US7143091B2 (en) * | 2002-02-04 | 2006-11-28 | Cataphorn, Inc. | Method and apparatus for sociological data mining |
US20060271564A1 (en) * | 2005-05-10 | 2006-11-30 | Pekua, Inc. | Method and apparatus for distributed community finding |
US20070061319A1 (en) * | 2005-09-09 | 2007-03-15 | Xerox Corporation | Method for document clustering based on page layout attributes |
US20070168856A1 (en) * | 2006-01-13 | 2007-07-19 | Kathrin Berkner | Tree pruning of icon trees via subtree selection using tree functionals |
US20070174267A1 (en) * | 2003-09-26 | 2007-07-26 | David Patterson | Computer aided document retrieval |
US7346629B2 (en) * | 2003-10-09 | 2008-03-18 | Yahoo! Inc. | Systems and methods for search processing using superunits |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6012052A (en) * | 1998-01-15 | 2000-01-04 | Microsoft Corporation | Methods and apparatus for building resource transition probability models for use in pre-fetching resources, editing resource link topology, building resource link topology templates, and collaborative filtering |
US6947953B2 (en) * | 1999-11-05 | 2005-09-20 | The Board Of Trustees Of The Leland Stanford Junior University | Internet-linked system for directory protocol based data storage, retrieval and analysis |
US6820075B2 (en) * | 2001-08-13 | 2004-11-16 | Xerox Corporation | Document-centric system with auto-completion |
US7035877B2 (en) * | 2001-12-28 | 2006-04-25 | Kimberly-Clark Worldwide, Inc. | Quality management and intelligent manufacturing with labels and smart tags in event-based product manufacturing |
US20050210008A1 (en) * | 2004-03-18 | 2005-09-22 | Bao Tran | Systems and methods for analyzing documents over a network |
US20050210009A1 (en) * | 2004-03-18 | 2005-09-22 | Bao Tran | Systems and methods for intellectual property management |
-
2006
- 2006-08-14 US US11/503,713 patent/US7529740B2/en active Active
-
2008
- 2008-06-27 US US12/163,485 patent/US20080259084A1/en not_active Abandoned
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060089947A1 (en) * | 2001-08-31 | 2006-04-27 | Dan Gallivan | System and method for dynamically evaluating latent concepts in unstructured documents |
US7143091B2 (en) * | 2002-02-04 | 2006-11-28 | Cataphorn, Inc. | Method and apparatus for sociological data mining |
US20070174267A1 (en) * | 2003-09-26 | 2007-07-26 | David Patterson | Computer aided document retrieval |
US7346629B2 (en) * | 2003-10-09 | 2008-03-18 | Yahoo! Inc. | Systems and methods for search processing using superunits |
US20050246321A1 (en) * | 2004-04-30 | 2005-11-03 | Uma Mahadevan | System for identifying storylines that emegre from highly ranked web search results |
US20060271564A1 (en) * | 2005-05-10 | 2006-11-30 | Pekua, Inc. | Method and apparatus for distributed community finding |
US20070061319A1 (en) * | 2005-09-09 | 2007-03-15 | Xerox Corporation | Method for document clustering based on page layout attributes |
US20070168856A1 (en) * | 2006-01-13 | 2007-07-19 | Kathrin Berkner | Tree pruning of icon trees via subtree selection using tree functionals |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8380735B2 (en) | 2001-07-24 | 2013-02-19 | Brightplanet Corporation II, Inc | System and method for efficient control and capture of dynamic database content |
US8793239B2 (en) * | 2009-10-08 | 2014-07-29 | Yahoo! Inc. | Method and system for form-filling crawl and associating rich keywords |
US20110087646A1 (en) * | 2009-10-08 | 2011-04-14 | Nilesh Dalvi | Method and System for Form-Filling Crawl and Associating Rich Keywords |
US20110113063A1 (en) * | 2009-11-09 | 2011-05-12 | Bob Schulman | Method and system for brand name identification |
US9887721B2 (en) | 2011-03-02 | 2018-02-06 | Nokomis, Inc. | Integrated circuit with electromagnetic energy anomaly detection and processing |
US20120226463A1 (en) * | 2011-03-02 | 2012-09-06 | Nokomis, Inc. | System and method for physically detecting counterfeit electronics |
US10475754B2 (en) * | 2011-03-02 | 2019-11-12 | Nokomis, Inc. | System and method for physically detecting counterfeit electronics |
CN107305490A (en) * | 2016-04-22 | 2017-10-31 | 中国移动通信集团湖南有限公司 | A kind of metadata groupings method and device |
WO2018109243A1 (en) * | 2016-12-16 | 2018-06-21 | Telefonica Digital España, S.L.U. | Method, system and computer program products for recognising, validating and correlating entities in a communications darknet |
WO2018121854A1 (en) * | 2016-12-28 | 2018-07-05 | Khalifa University Of Science, Technology And Research | Methods and systems for searching |
US11586674B2 (en) | 2016-12-28 | 2023-02-21 | Khalifa University of Science and Technology | Methods and systems for searching |
US10448864B1 (en) | 2017-02-24 | 2019-10-22 | Nokomis, Inc. | Apparatus and method to identify and measure gas concentrations |
US11229379B2 (en) | 2017-02-24 | 2022-01-25 | Nokomis, Inc. | Apparatus and method to identify and measure gas concentrations |
US11489847B1 (en) | 2018-02-14 | 2022-11-01 | Nokomis, Inc. | System and method for physically detecting, identifying, and diagnosing medical electronic devices connectable to a network |
CN109948019A (en) * | 2019-01-10 | 2019-06-28 | 中央财经大学 | A kind of deep layer Network Data Capture method |
Also Published As
Publication number | Publication date |
---|---|
US7529740B2 (en) | 2009-05-05 |
US20080040326A1 (en) | 2008-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7529740B2 (en) | Method and apparatus for organizing data sources | |
Chung et al. | A visual framework for knowledge discovery on the Web: An empirical study of business intelligence exploration | |
Minghim et al. | Content-based text mapping using multi-dimensional projections for exploration of document collections | |
US8473473B2 (en) | Object oriented data and metadata based search | |
US8935249B2 (en) | Visualization of concepts within a collection of information | |
CA2628930C (en) | System and method for information retrieval from object collections with complex interrelationships | |
JP5368100B2 (en) | System, method, and computer program product for concept-based search and analysis | |
US7171405B2 (en) | Systems and methods for organizing data | |
US8311999B2 (en) | System and method for knowledge research | |
Poelmans et al. | Text mining scientific papers: a survey on FCA-based information retrieval research | |
US20050060287A1 (en) | System and method for automatic clustering, sub-clustering and cluster hierarchization of search results in cross-referenced databases using articulation nodes | |
US6684218B1 (en) | Standard specific | |
US20080222105A1 (en) | Entity recommendation system using restricted information tagged to selected entities | |
US20090157618A1 (en) | Entity networking system using displayed information for exploring connectedness of selected entities | |
US20100138414A1 (en) | Methods and systems for associative search | |
Priss | A graphical interface for document retrieval based on formal concept analysis | |
Liu et al. | Visualizing document classification: A search aid for the digital library | |
Wittenburg et al. | An adaptive document management system for shared multimedia data | |
Murata | Visualizing the structure of web communities based on data acquired from a search engine | |
Saddal et al. | ISRE-Framework: nonlinear and multimodal exploration of image search result spaces | |
Choudhary et al. | Exploring the Landscape of Web Data Mining: An In-depth Research Analysis | |
Li et al. | Information mining: Integrating data mining and text mining for business intelligence | |
Zeiller | A case study based approach to knowledge visualization | |
Zhou et al. | Automobile, car and BMW: Horizontal and hierarchical approach in social tagging systems | |
Abdulmunim et al. | Links Evaluation and Ranking Based on Semantic Metadata Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |