CN102073734A - Method for providing structured query by search engine - Google Patents

Method for providing structured query by search engine Download PDF

Info

Publication number
CN102073734A
CN102073734A CN 201110022749 CN201110022749A CN102073734A CN 102073734 A CN102073734 A CN 102073734A CN 201110022749 CN201110022749 CN 201110022749 CN 201110022749 A CN201110022749 A CN 201110022749A CN 102073734 A CN102073734 A CN 102073734A
Authority
CN
China
Prior art keywords
search engine
structured
data
structured query
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201110022749
Other languages
Chinese (zh)
Inventor
汪洋
凌世播
彭艳兵
廖闻剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd
Original Assignee
NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd filed Critical NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd
Priority to CN 201110022749 priority Critical patent/CN102073734A/en
Publication of CN102073734A publication Critical patent/CN102073734A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a method for providing structured query by a search engine. Popular networks are classified into a universal superset through various attributes, and page information of the networks is subjected to structured extraction and stored in a database; meanwhile, indexes can be created to improve retrieval speed. The search engine provides structured data retrieval service through an application program interface (API) supporting structured query language so as to provide unified full-text search and structured data retrieval service for the outside. Therefore, the structured data analysis application can fully utilize mass text information of the search engine and does not need complicated program transplant or Chinese information processing.

Description

Search engine provides the method for structuralized query
Technical field
Patent of the present invention relates to a kind of data enquire method of message area, relate in particular to and a kind ofly be used to utilize search engine that the method for structuralized query is provided, it utilizes general search engine to carry out the structural data service, and making the structuring application and development excavate unstructured data becomes possibility.
Background technology
In general, search engine provides the inquiry service of non-structured text, and database engine provides the inquiry service of structural data.Therefore structuring is used and is utilized the data mining process of database realization to be difficult to be extended on the unstructured data.After openly website carries out index to one such as search engine,, as a rule be unlikely if attempt to utilize the structural data analytical approach to come registered user's behavior of website is analyzed.Paste the people such as the top of BBS, blog and microblogging and analyze, which is star's bean vermicelli of personation, and who is holder, is useful for some commercialization companies, particularly advertising company.Lacking the analysis-by-synthesis that effective means are crossed over website at present, generally is to carry out design analyzer at specific website.If the enough search engines of energy provide the method for structuralized query, the structured analysis program of a lot of standards can use.
Summary of the invention
Utilize search engine that text messages such as webpage are carried out structuring and resolve, high speed access is just built index according to the mode of database if desired, utilizes the database access middleware to come the action of simulated database engine then.The structuring application program drives search engine by accessing database visit middleware and visits structured message in the text.
By with the text attribute classification, a very little classification is handled as table, can be divided into the Type of website such as us and include but not limited to blog, microblogging, forum, news, video or the like subclassification attribute.In the inside of some application scenarios such as ICP, e-mail messages also can be brought as a classification.
Provide some general tables of data according to different classification, these show all fields of author's relevant information of the identical information of most of popular network applications such as blog are taken back as a superset; Can define different tables and field for different data; Safeguard the big table of a superset to the mapping of different web sites field name.As for blog, author informations such as author's title are arranged as the author information table, can find in the superset universal field to explain by web site name, field name etc.; The blog article of delivering etc. has time that blog article title, blog article deliver, website that blog article is delivered and plate etc. also so to do corresponding processing.
Can do identical thing to the unstructured data of differences such as the comment of BBS, news etc., Email classification equally, remove to describe the unstructured data of the same alike result that belongs to this classification on the all-network (pasting people's information etc. of being correlated with) with identical general purpose table as blog author relevant information, top; Remove to describe the different attribute of different classification with different general purpose tables.
The benefit of the general purpose table of this leap data set and website is to allow structuring application, excavation and analytic process become easier.If use the same alike result of each classification in heterogeneous networks source (as different websites) to use different tables, the expense of conversion can be bigger in the time of excavation.Certainly this patent also supports the same alike result of each classification in heterogeneous networks source to use different tables, to guarantee compatibility.
By with after can structured message in the unstructured data carrying out the field processing, just can utilize ripe information extraction technology to come from unstructured data, to extract structurized information.Structural data after the extraction is gone into conventional database (as Mysql, oracle etc.) and is set up index, provides the data query service by the routine data storehouse.Also can set up the work that inquiry, storage administration also safeguarded in index by search engine oneself.
Search engine utilizes above-mentioned data that the standard compliant SQL grammer of grammer of service externally is provided.Provides an example below and describe, but this example is not represented final patent implementation:
select?TITTLE?from?BLOG?where?WEBSITE=’blog.sina.com’and?AUTHOR=’Xu’;
Above author's the title of all blog articles of SQL statement ' blog.sina.com ' lining ' Xu ' that is inquiry BLOG classification the inside website.
Integrated to the support of routine search engine. retrieves and the support of structuralized query in the API of search engine.By standard SQL query grammer, the API that can utilize the data access middleware to visit search engine to provide for the structuring application program ability of direct visit non-structured text data, has also expanded the service function of search engine simultaneously.
The Advanced Search function is provided on the interface of search engine, utilizes Field Options from the interface to select to utilize after the assembly unit field the directly structural data that parses in the search engine of inquiry of search engine API.
Routine analyzer based on structural data can utilize above-mentioned engine to come the seamless access non-structured text to finish the analysis of data, need not carry out complicated transplanting and adjustment.
Description of drawings
Embodiment
Embodiment is as follows:
1, collects and constructs the structured message field of the text document of classifications (including but not limited to the network application of these types) such as some popular network applications such as blog, forum, microblogging, comment;
2, construct the superset of these fields, and safeguard the field mappings of field superset to each classification information;
3, utilize the structured message in the above-mentioned text message that the information extraction instrument will grasp or receive to extract then, go into the routine data storehouse or set up index, manage the storage of these structural datas by search engine oneself as structural data;
4, search engine utilizes above-mentioned data to handle, and the API of use search engine externally provides the structuralized query service of standard SQL grammer
After above-mentioned steps, the structural data routine analyzer just can visit the text data of non-structured WEB page and so on by search engine.Come the structuring and the destructuring difference in shadow data source further by the data access middleware, utilize above-mentioned search engine and other database engine as the blended data source, mixing inquiry to structural data and unstructured data comprehensively provides search engine and data base querying service.

Claims (1)

1. a method of utilizing search engine that structuralized query is provided is characterized in that: 1) be organized into general superset by each generic attribute of network application to hot topic; 2) page info that these are related to carries out going into database after structuring is extracted, and can set up index simultaneously to accelerate retrieval rate; 3) search engine is visited by the structural data that API externally provides SQL to drive.
CN 201110022749 2011-01-20 2011-01-20 Method for providing structured query by search engine Pending CN102073734A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110022749 CN102073734A (en) 2011-01-20 2011-01-20 Method for providing structured query by search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110022749 CN102073734A (en) 2011-01-20 2011-01-20 Method for providing structured query by search engine

Publications (1)

Publication Number Publication Date
CN102073734A true CN102073734A (en) 2011-05-25

Family

ID=44032273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110022749 Pending CN102073734A (en) 2011-01-20 2011-01-20 Method for providing structured query by search engine

Country Status (1)

Country Link
CN (1) CN102073734A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077307A (en) * 2013-03-29 2014-10-01 中国科学院青岛生物能源与过程研究所 Single-cell phenotype database system and search engine
CN105574086A (en) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 Artificial intelligence extraction method of internet unstructured data fields
CN106021553A (en) * 2016-05-30 2016-10-12 深圳市华傲数据技术有限公司 Structuralized data matching method and system
CN111737336A (en) * 2020-07-30 2020-10-02 湖南中车时代通信信号有限公司 Database and rail transit signal system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761962A (en) * 2003-03-21 2006-04-19 国际商业机器公司 Real-time aggregation of unstructured data into structured data for SQL processing by a relational database engine
US7543232B2 (en) * 2004-10-19 2009-06-02 International Business Machines Corporation Intelligent web based help system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761962A (en) * 2003-03-21 2006-04-19 国际商业机器公司 Real-time aggregation of unstructured data into structured data for SQL processing by a relational database engine
US7543232B2 (en) * 2004-10-19 2009-06-02 International Business Machines Corporation Intelligent web based help system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077307A (en) * 2013-03-29 2014-10-01 中国科学院青岛生物能源与过程研究所 Single-cell phenotype database system and search engine
CN104077307B (en) * 2013-03-29 2017-08-29 中国科学院青岛生物能源与过程研究所 Unicellular phenotype Database Systems and search engine
CN105574086A (en) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 Artificial intelligence extraction method of internet unstructured data fields
CN106021553A (en) * 2016-05-30 2016-10-12 深圳市华傲数据技术有限公司 Structuralized data matching method and system
CN111737336A (en) * 2020-07-30 2020-10-02 湖南中车时代通信信号有限公司 Database and rail transit signal system

Similar Documents

Publication Publication Date Title
EP3161678B1 (en) Deep links for native applications
CN101464897A (en) Word matching and information query method and device
WO2011088521A2 (en) Improved searching using semantic keys
CN102073734A (en) Method for providing structured query by search engine
CN101944093A (en) Method and system for searching network information
Widmann et al. EUDAT B2FIND: a cross-discipline metadata service and discovery portal
Taibi et al. Search as research practices on the web: the SaR-Web platform for cross-language engine results analysis
Burghardt et al. Usability guidelines for desktop search engines
Pergantis et al. Searching Online for Art and Culture: User Behavior Analysis
Rozell et al. From international open government dataset search to discovery: a semantic web service approach
Woerndl et al. SeMoDesk: towards a mobile semantic desktop
Lei et al. The current status of usability studies of information technologies in China: a systematic study
Albert et al. Research discovery through linked open data
Nölscher et al. AwesomeGeodataTable-Towards a community-maintained searchable table for data sets easily usable as predictors for spatial machine learning
QIU An Analysis on Scientific Output and Discipline Development Potential of Colleges and Universities Based on ESI Data——A Case Study of Southwest University
Comstock Criminalizing atrocity: the global spread of criminal laws against international crimes
García Semantic Web End-User Tasks.
Bisch et al. A New Textual Search Engine to Discover VizieR Catalogues
Ananiadou et al. Towards interoperability of European language resources
Lipsett-Rivera Profit and Passion: Transactional Sex in Colonial Mexico
Qin et al. The Knowledge Base, Development Stage, Hotspot and Trend of Mobile Learning Research in China——Analysis of Mapping Knowledge Domain based on CSSCI Database (2002-2016)
Boll et al. Location and the web (LocWeb 2008)
AU2014203117A1 (en) Zapaat context internet search engine
Latif et al. Automating Property Binding into Informational Aspects from Linked Data
Liu Christianity in China: Past, Present and Challenges

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Gu Jian

Inventor after: Wang Yang

Inventor after: Ling Shibo

Inventor after: Peng Yanbing

Inventor after: Liao Wenjian

Inventor before: Wang Yang

Inventor before: Ling Shibo

Inventor before: Peng Yanbing

Inventor before: Liao Wenjian

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: WANG YANG LING SHIBO PENG YANBING LIAO WENJIAN TO: GU JIAN WANG YANG LING SHIBO PENG YANBING LIAO WENJIAN

C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110525