CN106354759B - The retrieval of article and data based on biological cloud platform and automatic download system - Google Patents

The retrieval of article and data based on biological cloud platform and automatic download system Download PDF

Info

Publication number
CN106354759B
CN106354759B CN201610687029.0A CN201610687029A CN106354759B CN 106354759 B CN106354759 B CN 106354759B CN 201610687029 A CN201610687029 A CN 201610687029A CN 106354759 B CN106354759 B CN 106354759B
Authority
CN
China
Prior art keywords
data
module
retrieval
article
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610687029.0A
Other languages
Chinese (zh)
Other versions
CN106354759A (en
Inventor
郑洪坤
刘祖明
杨峻
张增金
刘东源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hundred Cloud Technology Co Ltd
Original Assignee
Beijing Hundred Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hundred Cloud Technology Co Ltd filed Critical Beijing Hundred Cloud Technology Co Ltd
Priority to CN201610687029.0A priority Critical patent/CN106354759B/en
Publication of CN106354759A publication Critical patent/CN106354759A/en
Application granted granted Critical
Publication of CN106354759B publication Critical patent/CN106354759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The present invention discloses retrieval and the automatic download system of a kind of article based on biological cloud platform and data, the system comprises: data download module, data resolution module, data memory module, web graph shape interface module and data retrieval module.It after initial data downloading, is resolvable to reference format and these standard datas is integrated, then segmented according to scheduled participle strategy, establish index and stored, Retrieval Interface is provided.The present invention provides retrieval, browsing that web interface carries out article and data for user by the way that in disorder initial data is parsed into reference format according to fixed rule and is stored into a retrieval cluster, convenient for data are utilized and studied again.

Description

The retrieval of article and data based on biological cloud platform and automatic download system
Technical field
The present invention relates to data downloading and search fields, and in particular to a kind of article and data based on biological cloud platform Retrieval and automatic download system.
Background technique
With the continuous development of sequencing technologies, the speed of response of biological data becomes quickly, and according to statistics, two generation of the whole world surveys The data speed of response of sequence technology is annual 13Pbp, and also in continuous accelerate, bioinformatics research formally enters Big data era.The speed of response of article also constantly increases simultaneously.But these data in disclosed database on internet Retrieval be it is isolated, after such as searching article, can not directly take the data such as SRA, GSM of this article, need to re-search for The databases such as SRA, GEO DataSets make reusing for internet data become abnormal cumbersome and difficult.
Summary of the invention
In view of the defects existing in the prior art, the present invention provides the retrieval of a kind of article based on biological cloud platform and data With automatic download system.
The embodiment of the present invention proposes retrieval and the automatic download system of a kind of article based on biological cloud platform and data, packet It includes:
Data download module, data resolution module, data memory module, web graph shape interface module and data retrieval mould Block;Wherein,
The data download module, for downloading all articles and data in sequencing field from the database in network,
The data resolution module, article and data for obtaining downloading are parsed into the data of standard data format,
The data memory module, the data for obtaining to the data resolution module are according to preset participle and index Strategy is handled, and obtained data are stored,
The web graph shape interface module, passes through for providing a user the search interface of article and data, and by user The search interface carries out article and the search result of data retrieval is shown,
The data retrieval module, search condition for being arranged according to user by the search interface is from the data Retrieval obtains search result in the data of memory module storage, and the search result is fed back to web graph shape interface Module.
The retrieval of article and data provided in an embodiment of the present invention based on biological cloud platform and automatic download system, pass through Sequencing field all data and article are downloaded, data are parsed with article, be associated with integration, participle, establishes to index and go forward side by side Row storage, allows user to carry out the retrieval of article and data in a web page, convenient for the utilization again of public data And research.
Detailed description of the invention
Fig. 1 is a kind of one embodiment of retrieval and automatic download system of article and data based on biological cloud platform of the present invention Structural schematic diagram.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical solution in the embodiment of the present invention is explicitly described, it is clear that described embodiment is the present invention A part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not having Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present embodiment discloses retrieval and the automatic downloading system of a kind of article based on biological cloud platform and data System, comprising:
Data download module 1, data resolution module 2, data memory module 3, web graph shape interface module 4 and data inspection Rope module 5;Wherein,
The data download module 1, for downloading all articles and data in sequencing field from the database in network,
The data resolution module 2, article and data for obtaining downloading are parsed into the data of standard data format,
In a particular application, the standard data format can be JSON format.
The data memory module 3, the data for obtaining to the data resolution module 2 parsing are according to preset participle Strategy carries out word segmentation processing, obtains data participle, segments to the data and establishes search index, and to the institute for establishing search index Data participle is stated to be stored,
The web graph shape interface module 4, leads to for providing a user the search interface of article and data, and by user The search result for crossing the search interface progress article and data retrieval is shown,
The data retrieval module 5, search condition for being arranged according to user by the search interface is from the number Retrieval obtains search result in the data stored according to memory module 3, and the search result is fed back to web graph Xing Hua circle Face mould block 4.
In the embodiment of the present invention, data memory module 3 can parse obtained data to data resolution module 2 first and count It is integrated according to association, i.e., database different in various databases is associated with by force, facilitates the various conditions of data retrieval module Combined retrieval, guarantee the accurate inquiry of user, the data after integration can be segmented later, establish index and deposit Storage.
In the embodiment of the present invention, a variety of search conditions can be shown on search interface, user is when retrieving, Ke Yitong Cross the retrieval for inputting or selecting corresponding retrieval type to carry out article and data.The retrieval for inputting or selecting with specific reference to user Formula carries out retrieval from the participle data that data memory module stores can use the existing searching document from bibliographic data base Search method, details are not described herein again for specific retrieving.
The retrieval of article and data provided in an embodiment of the present invention based on biological cloud platform and automatic download system, pass through Sequencing field all data and article are downloaded, data are parsed with article, be associated with integration, participle, establishes to index and go forward side by side Row storage, allows user to carry out the retrieval of article and data in a web page, convenient for the utilization again of public data And research.
Optionally, the present invention is based on the retrieval of the article of biological cloud platform and data and another realities of automatic download system It applies in example, further includes:
Timing update module;Wherein,
The timing update module, for the latest data and article in timing acquisition network, and by the latest data The data memory module is sent to article.
Optionally, the present invention is based on the retrieval of the article of biological cloud platform and data and another realities of automatic download system It applies in example, the data retrieval module is also used to latest data and article being pushed to booking reader.
Optionally, the present invention is based on the retrieval of the article of biological cloud platform and data and another realities of automatic download system It applies in example, the data download module, it is all for downloading sequencing field from the database in network by internet crawler Article and data.
Optionally, the present invention is based on the retrieval of the article of biological cloud platform and data and another realities of automatic download system It applies in example, the data memory module, specifically for will treated in data are stored according to the participle and index strategy The distributed elasticsearch cluster in portion.
In the embodiment of the present invention, elasticsearch cluster has the performances such as High Availabitity, high extension.When being retrieved, Search API can be externally provided according to elasticsearch cluster facilitates web graph shape interface module to call, and facilitates figure Change checking and using for interface user.User, can be according to different combination conditions pair after logging in graphic interface Mass data in elasticsearch carries out a variety of different combinatorial search and details are checked.Number is stored by clustering According to can guarantee integrality, safety, availability and the quick response of data.
Although the embodiments of the invention are described in conjunction with the attached drawings, but those skilled in the art can not depart from this hair Various modifications and variations are made in the case where bright spirit and scope, such modifications and variations are each fallen within by appended claims Within limited range.

Claims (5)

1. retrieval and the automatic download system of a kind of article and data based on biological cloud platform characterized by comprising
Data download module, data resolution module, data memory module, web graph shape interface module and data retrieval module;Its In,
The data download module, for downloading all articles and data in sequencing field from the database in network,
The data resolution module, article and data for obtaining downloading are parsed into the data of standard data format,
The data memory module, the data for obtaining to the data resolution module are according to preset participle and index strategy It is handled, and obtained data is stored,
The web graph shape interface module, for providing a user the search interface of article and data, and by user described in Search interface carries out article and the search result of data retrieval is shown,
The data retrieval module, for being stored by the search condition that the search interface is arranged from the data according to user Retrieval obtains search result in the data of module storage, and the search result is fed back to the web graph shape interface module.
2. system according to claim 1, which is characterized in that further include:
Timing update module;Wherein,
The timing update module, for the latest data and article in timing acquisition network, and by the latest data and text Chapter is sent to the data memory module.
3. system according to claim 2, which is characterized in that the data retrieval module, be also used to latest data and Article is pushed to booking reader.
4. system according to claim 1, which is characterized in that the data download module, for passing through internet crawler The article and data all from the database downloading sequencing field in network.
5. system according to claim 1, which is characterized in that the data memory module, being specifically used for will be according to described Participle and index strategy treated data are stored in internal distributed elasticsearch cluster.
CN201610687029.0A 2016-08-18 2016-08-18 The retrieval of article and data based on biological cloud platform and automatic download system Active CN106354759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610687029.0A CN106354759B (en) 2016-08-18 2016-08-18 The retrieval of article and data based on biological cloud platform and automatic download system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610687029.0A CN106354759B (en) 2016-08-18 2016-08-18 The retrieval of article and data based on biological cloud platform and automatic download system

Publications (2)

Publication Number Publication Date
CN106354759A CN106354759A (en) 2017-01-25
CN106354759B true CN106354759B (en) 2019-07-12

Family

ID=57843505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610687029.0A Active CN106354759B (en) 2016-08-18 2016-08-18 The retrieval of article and data based on biological cloud platform and automatic download system

Country Status (1)

Country Link
CN (1) CN106354759B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032436B (en) * 2021-04-16 2022-05-31 苏州臻璇数据信息技术有限公司 Searching method and device based on article content and title

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412933A (en) * 2013-08-20 2013-11-27 南京物联网应用研究院有限公司 Cloud search platform
CN103699572A (en) * 2013-11-26 2014-04-02 北京航空航天大学 Digital media content and resource integration and sharing method in cloud environment
CN104462865A (en) * 2014-10-17 2015-03-25 北京百迈客生物科技有限公司 Article analysis system and method based on biological cloud platform
CN105159971A (en) * 2015-08-26 2015-12-16 成都布林特信息技术有限公司 Cloud platform data retrieval method
CN105183809A (en) * 2015-08-26 2015-12-23 成都布林特信息技术有限公司 Cloud platform data query method
CN105205104A (en) * 2015-08-26 2015-12-30 成都布林特信息技术有限公司 Cloud platform data acquisition method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412933A (en) * 2013-08-20 2013-11-27 南京物联网应用研究院有限公司 Cloud search platform
CN103699572A (en) * 2013-11-26 2014-04-02 北京航空航天大学 Digital media content and resource integration and sharing method in cloud environment
CN104462865A (en) * 2014-10-17 2015-03-25 北京百迈客生物科技有限公司 Article analysis system and method based on biological cloud platform
CN105159971A (en) * 2015-08-26 2015-12-16 成都布林特信息技术有限公司 Cloud platform data retrieval method
CN105183809A (en) * 2015-08-26 2015-12-23 成都布林特信息技术有限公司 Cloud platform data query method
CN105205104A (en) * 2015-08-26 2015-12-30 成都布林特信息技术有限公司 Cloud platform data acquisition method

Also Published As

Publication number Publication date
CN106354759A (en) 2017-01-25

Similar Documents

Publication Publication Date Title
Mehmood et al. Implementing big data lake for heterogeneous data sources
US9189280B2 (en) Tracking large numbers of moving objects in an event processing system
CN110533055B (en) Point cloud data processing method and device
CN110622153B (en) Method and system for query segmentation
US20170235726A1 (en) Information identification and extraction
US11599591B2 (en) System and method for updating a search index
US20190147090A1 (en) Internet of Things Search and Discovery Using Graph Engine
CN110018982A (en) Method, apparatus, equipment and the computer readable storage medium of locating file
JP2018531379A (en) Route inquiry method, apparatus, device, and non-volatile computer storage medium
JP2018531379A6 (en) Route inquiry method, apparatus, device, and non-volatile computer storage medium
US20190266030A1 (en) System and Method for Processing of Events
CN106354759B (en) The retrieval of article and data based on biological cloud platform and automatic download system
US20210397621A1 (en) System and Method for Processing of Events
US8667008B2 (en) Search request control apparatus and search request control method
CN106547803A (en) The method and apparatus for crawling website incremental resource
CN114519061A (en) Map data updating method, device, electronic equipment and medium
CN109739885A (en) Data query method, apparatus, equipment and storage medium based on local cache
CN104462257B (en) The method and apparatus of page information among a kind of verification
CN113094444A (en) Data processing method, data processing apparatus, computer device, and medium
CN106325925A (en) Browser service information updating method and device
CN106934007B (en) Associated information pushing method and device
CN104636384B (en) A kind of method and device handling document
CN102339292A (en) Distributed searching method and system
CN112527388B (en) GitHub large-scale open source code-oriented quick code file tracing method and device
Santos et al. Comparative performance evaluation of relational and NoSQL databases for spatial and mobile applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant