CN106354759B - The retrieval of article and data based on biological cloud platform and automatic download system - Google Patents
The retrieval of article and data based on biological cloud platform and automatic download system Download PDFInfo
- Publication number
- CN106354759B CN106354759B CN201610687029.0A CN201610687029A CN106354759B CN 106354759 B CN106354759 B CN 106354759B CN 201610687029 A CN201610687029 A CN 201610687029A CN 106354759 B CN106354759 B CN 106354759B
- Authority
- CN
- China
- Prior art keywords
- data
- module
- retrieval
- article
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
Abstract
The present invention discloses retrieval and the automatic download system of a kind of article based on biological cloud platform and data, the system comprises: data download module, data resolution module, data memory module, web graph shape interface module and data retrieval module.It after initial data downloading, is resolvable to reference format and these standard datas is integrated, then segmented according to scheduled participle strategy, establish index and stored, Retrieval Interface is provided.The present invention provides retrieval, browsing that web interface carries out article and data for user by the way that in disorder initial data is parsed into reference format according to fixed rule and is stored into a retrieval cluster, convenient for data are utilized and studied again.
Description
Technical field
The present invention relates to data downloading and search fields, and in particular to a kind of article and data based on biological cloud platform
Retrieval and automatic download system.
Background technique
With the continuous development of sequencing technologies, the speed of response of biological data becomes quickly, and according to statistics, two generation of the whole world surveys
The data speed of response of sequence technology is annual 13Pbp, and also in continuous accelerate, bioinformatics research formally enters
Big data era.The speed of response of article also constantly increases simultaneously.But these data in disclosed database on internet
Retrieval be it is isolated, after such as searching article, can not directly take the data such as SRA, GSM of this article, need to re-search for
The databases such as SRA, GEO DataSets make reusing for internet data become abnormal cumbersome and difficult.
Summary of the invention
In view of the defects existing in the prior art, the present invention provides the retrieval of a kind of article based on biological cloud platform and data
With automatic download system.
The embodiment of the present invention proposes retrieval and the automatic download system of a kind of article based on biological cloud platform and data, packet
It includes:
Data download module, data resolution module, data memory module, web graph shape interface module and data retrieval mould
Block;Wherein,
The data download module, for downloading all articles and data in sequencing field from the database in network,
The data resolution module, article and data for obtaining downloading are parsed into the data of standard data format,
The data memory module, the data for obtaining to the data resolution module are according to preset participle and index
Strategy is handled, and obtained data are stored,
The web graph shape interface module, passes through for providing a user the search interface of article and data, and by user
The search interface carries out article and the search result of data retrieval is shown,
The data retrieval module, search condition for being arranged according to user by the search interface is from the data
Retrieval obtains search result in the data of memory module storage, and the search result is fed back to web graph shape interface
Module.
The retrieval of article and data provided in an embodiment of the present invention based on biological cloud platform and automatic download system, pass through
Sequencing field all data and article are downloaded, data are parsed with article, be associated with integration, participle, establishes to index and go forward side by side
Row storage, allows user to carry out the retrieval of article and data in a web page, convenient for the utilization again of public data
And research.
Detailed description of the invention
Fig. 1 is a kind of one embodiment of retrieval and automatic download system of article and data based on biological cloud platform of the present invention
Structural schematic diagram.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical solution in the embodiment of the present invention is explicitly described, it is clear that described embodiment is the present invention
A part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not having
Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present embodiment discloses retrieval and the automatic downloading system of a kind of article based on biological cloud platform and data
System, comprising:
Data download module 1, data resolution module 2, data memory module 3, web graph shape interface module 4 and data inspection
Rope module 5;Wherein,
The data download module 1, for downloading all articles and data in sequencing field from the database in network,
The data resolution module 2, article and data for obtaining downloading are parsed into the data of standard data format,
In a particular application, the standard data format can be JSON format.
The data memory module 3, the data for obtaining to the data resolution module 2 parsing are according to preset participle
Strategy carries out word segmentation processing, obtains data participle, segments to the data and establishes search index, and to the institute for establishing search index
Data participle is stated to be stored,
The web graph shape interface module 4, leads to for providing a user the search interface of article and data, and by user
The search result for crossing the search interface progress article and data retrieval is shown,
The data retrieval module 5, search condition for being arranged according to user by the search interface is from the number
Retrieval obtains search result in the data stored according to memory module 3, and the search result is fed back to web graph Xing Hua circle
Face mould block 4.
In the embodiment of the present invention, data memory module 3 can parse obtained data to data resolution module 2 first and count
It is integrated according to association, i.e., database different in various databases is associated with by force, facilitates the various conditions of data retrieval module
Combined retrieval, guarantee the accurate inquiry of user, the data after integration can be segmented later, establish index and deposit
Storage.
In the embodiment of the present invention, a variety of search conditions can be shown on search interface, user is when retrieving, Ke Yitong
Cross the retrieval for inputting or selecting corresponding retrieval type to carry out article and data.The retrieval for inputting or selecting with specific reference to user
Formula carries out retrieval from the participle data that data memory module stores can use the existing searching document from bibliographic data base
Search method, details are not described herein again for specific retrieving.
The retrieval of article and data provided in an embodiment of the present invention based on biological cloud platform and automatic download system, pass through
Sequencing field all data and article are downloaded, data are parsed with article, be associated with integration, participle, establishes to index and go forward side by side
Row storage, allows user to carry out the retrieval of article and data in a web page, convenient for the utilization again of public data
And research.
Optionally, the present invention is based on the retrieval of the article of biological cloud platform and data and another realities of automatic download system
It applies in example, further includes:
Timing update module;Wherein,
The timing update module, for the latest data and article in timing acquisition network, and by the latest data
The data memory module is sent to article.
Optionally, the present invention is based on the retrieval of the article of biological cloud platform and data and another realities of automatic download system
It applies in example, the data retrieval module is also used to latest data and article being pushed to booking reader.
Optionally, the present invention is based on the retrieval of the article of biological cloud platform and data and another realities of automatic download system
It applies in example, the data download module, it is all for downloading sequencing field from the database in network by internet crawler
Article and data.
Optionally, the present invention is based on the retrieval of the article of biological cloud platform and data and another realities of automatic download system
It applies in example, the data memory module, specifically for will treated in data are stored according to the participle and index strategy
The distributed elasticsearch cluster in portion.
In the embodiment of the present invention, elasticsearch cluster has the performances such as High Availabitity, high extension.When being retrieved,
Search API can be externally provided according to elasticsearch cluster facilitates web graph shape interface module to call, and facilitates figure
Change checking and using for interface user.User, can be according to different combination conditions pair after logging in graphic interface
Mass data in elasticsearch carries out a variety of different combinatorial search and details are checked.Number is stored by clustering
According to can guarantee integrality, safety, availability and the quick response of data.
Although the embodiments of the invention are described in conjunction with the attached drawings, but those skilled in the art can not depart from this hair
Various modifications and variations are made in the case where bright spirit and scope, such modifications and variations are each fallen within by appended claims
Within limited range.
Claims (5)
1. retrieval and the automatic download system of a kind of article and data based on biological cloud platform characterized by comprising
Data download module, data resolution module, data memory module, web graph shape interface module and data retrieval module;Its
In,
The data download module, for downloading all articles and data in sequencing field from the database in network,
The data resolution module, article and data for obtaining downloading are parsed into the data of standard data format,
The data memory module, the data for obtaining to the data resolution module are according to preset participle and index strategy
It is handled, and obtained data is stored,
The web graph shape interface module, for providing a user the search interface of article and data, and by user described in
Search interface carries out article and the search result of data retrieval is shown,
The data retrieval module, for being stored by the search condition that the search interface is arranged from the data according to user
Retrieval obtains search result in the data of module storage, and the search result is fed back to the web graph shape interface module.
2. system according to claim 1, which is characterized in that further include:
Timing update module;Wherein,
The timing update module, for the latest data and article in timing acquisition network, and by the latest data and text
Chapter is sent to the data memory module.
3. system according to claim 2, which is characterized in that the data retrieval module, be also used to latest data and
Article is pushed to booking reader.
4. system according to claim 1, which is characterized in that the data download module, for passing through internet crawler
The article and data all from the database downloading sequencing field in network.
5. system according to claim 1, which is characterized in that the data memory module, being specifically used for will be according to described
Participle and index strategy treated data are stored in internal distributed elasticsearch cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610687029.0A CN106354759B (en) | 2016-08-18 | 2016-08-18 | The retrieval of article and data based on biological cloud platform and automatic download system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610687029.0A CN106354759B (en) | 2016-08-18 | 2016-08-18 | The retrieval of article and data based on biological cloud platform and automatic download system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106354759A CN106354759A (en) | 2017-01-25 |
CN106354759B true CN106354759B (en) | 2019-07-12 |
Family
ID=57843505
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610687029.0A Active CN106354759B (en) | 2016-08-18 | 2016-08-18 | The retrieval of article and data based on biological cloud platform and automatic download system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106354759B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113032436B (en) * | 2021-04-16 | 2022-05-31 | 苏州臻璇数据信息技术有限公司 | Searching method and device based on article content and title |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103412933A (en) * | 2013-08-20 | 2013-11-27 | 南京物联网应用研究院有限公司 | Cloud search platform |
CN103699572A (en) * | 2013-11-26 | 2014-04-02 | 北京航空航天大学 | Digital media content and resource integration and sharing method in cloud environment |
CN104462865A (en) * | 2014-10-17 | 2015-03-25 | 北京百迈客生物科技有限公司 | Article analysis system and method based on biological cloud platform |
CN105159971A (en) * | 2015-08-26 | 2015-12-16 | 成都布林特信息技术有限公司 | Cloud platform data retrieval method |
CN105183809A (en) * | 2015-08-26 | 2015-12-23 | 成都布林特信息技术有限公司 | Cloud platform data query method |
CN105205104A (en) * | 2015-08-26 | 2015-12-30 | 成都布林特信息技术有限公司 | Cloud platform data acquisition method |
-
2016
- 2016-08-18 CN CN201610687029.0A patent/CN106354759B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103412933A (en) * | 2013-08-20 | 2013-11-27 | 南京物联网应用研究院有限公司 | Cloud search platform |
CN103699572A (en) * | 2013-11-26 | 2014-04-02 | 北京航空航天大学 | Digital media content and resource integration and sharing method in cloud environment |
CN104462865A (en) * | 2014-10-17 | 2015-03-25 | 北京百迈客生物科技有限公司 | Article analysis system and method based on biological cloud platform |
CN105159971A (en) * | 2015-08-26 | 2015-12-16 | 成都布林特信息技术有限公司 | Cloud platform data retrieval method |
CN105183809A (en) * | 2015-08-26 | 2015-12-23 | 成都布林特信息技术有限公司 | Cloud platform data query method |
CN105205104A (en) * | 2015-08-26 | 2015-12-30 | 成都布林特信息技术有限公司 | Cloud platform data acquisition method |
Also Published As
Publication number | Publication date |
---|---|
CN106354759A (en) | 2017-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mehmood et al. | Implementing big data lake for heterogeneous data sources | |
US9189280B2 (en) | Tracking large numbers of moving objects in an event processing system | |
CN110533055B (en) | Point cloud data processing method and device | |
CN110622153B (en) | Method and system for query segmentation | |
US20170235726A1 (en) | Information identification and extraction | |
US11599591B2 (en) | System and method for updating a search index | |
US20190147090A1 (en) | Internet of Things Search and Discovery Using Graph Engine | |
CN110018982A (en) | Method, apparatus, equipment and the computer readable storage medium of locating file | |
JP2018531379A (en) | Route inquiry method, apparatus, device, and non-volatile computer storage medium | |
JP2018531379A6 (en) | Route inquiry method, apparatus, device, and non-volatile computer storage medium | |
US20190266030A1 (en) | System and Method for Processing of Events | |
CN106354759B (en) | The retrieval of article and data based on biological cloud platform and automatic download system | |
US20210397621A1 (en) | System and Method for Processing of Events | |
US8667008B2 (en) | Search request control apparatus and search request control method | |
CN106547803A (en) | The method and apparatus for crawling website incremental resource | |
CN114519061A (en) | Map data updating method, device, electronic equipment and medium | |
CN109739885A (en) | Data query method, apparatus, equipment and storage medium based on local cache | |
CN104462257B (en) | The method and apparatus of page information among a kind of verification | |
CN113094444A (en) | Data processing method, data processing apparatus, computer device, and medium | |
CN106325925A (en) | Browser service information updating method and device | |
CN106934007B (en) | Associated information pushing method and device | |
CN104636384B (en) | A kind of method and device handling document | |
CN102339292A (en) | Distributed searching method and system | |
CN112527388B (en) | GitHub large-scale open source code-oriented quick code file tracing method and device | |
Santos et al. | Comparative performance evaluation of relational and NoSQL databases for spatial and mobile applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |