CN106354759A - Retrieving and automatically downloading system of articles and data based on biological cloud platform - Google Patents

Retrieving and automatically downloading system of articles and data based on biological cloud platform Download PDF

Info

Publication number
CN106354759A
CN106354759A CN201610687029.0A CN201610687029A CN106354759A CN 106354759 A CN106354759 A CN 106354759A CN 201610687029 A CN201610687029 A CN 201610687029A CN 106354759 A CN106354759 A CN 106354759A
Authority
CN
China
Prior art keywords
data
module
retrieval
article
retrieving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610687029.0A
Other languages
Chinese (zh)
Other versions
CN106354759B (en
Inventor
郑洪坤
刘祖明
杨峻
张增金
刘东源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hundred Cloud Technology Co Ltd
Original Assignee
Beijing Hundred Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hundred Cloud Technology Co Ltd filed Critical Beijing Hundred Cloud Technology Co Ltd
Priority to CN201610687029.0A priority Critical patent/CN106354759B/en
Publication of CN106354759A publication Critical patent/CN106354759A/en
Application granted granted Critical
Publication of CN106354759B publication Critical patent/CN106354759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The invention discloses a retrieving and automatically downloading system of articles and data based on a biological cloud platform. The retrieving and automatically downloading system comprises a data downloading module, a data analysis module, a data memory module, a web graphic interface module and a data retrieving module. After original data is downloaded, the downloaded original data is analyzed into the standard format, the standard-format data is integrated, then word dividing, index setting and storing are carried out according to the preset word dividing strategy, and a retrieving interface is provided. According to the retrieving and automatically downloading system of articles and data based on the biological cloud platform, disheveled original data is analyzed into the standard format according to the regular rule and stored into a retrieving colony, a web interface is provided for allowing users to retrieve and browse articles and data, and data is conveniently recycled and researched.

Description

The retrieval of the article data based on biological cloud platform and automatic download system
Technical field
The present invention relates to data downloads and search field is and in particular to a kind of article data based on biological cloud platform Retrieval and automatic download system.
Background technology
With the continuous development of sequencing technologies, the speed of response of biological data becomes quickly, according to statistics, the secondary survey in the whole world The data speed of response of sequence technology is annual 13pbp, and also in continuous acceleration, bioinformatics research formally enters The big data epoch.The speed of response of article also constantly increases simultaneously.But these data in disclosed data base on the Internet Retrieval is isolated, it is impossible to directly take the data such as sra, gsm of this article after such as searching article, needs to re-search for The data bases such as sra, geo datasets, make reusing of internet data become abnormal loaded down with trivial details and difficult.
Content of the invention
The defect existing for prior art, the present invention provides a kind of retrieval of the article data based on biological cloud platform With automatic download system.
The embodiment of the present invention proposes a kind of retrieval of the article data based on biological cloud platform and automatic download system, bag Include:
Data download module, data resolution module, data memory module, web graph shape interface module data retrieval mould Block;Wherein,
Described data download module, downloads sequencing field all of article data for the data base from network,
Described data resolution module, the article data for obtaining download is parsed into the data of standard data format,
Described data memory module, for data that described data resolution module is obtained according to default participle and index Strategy is processed, and the data obtaining is stored,
Described web graph shape interface module, for providing a user with the search interface of article data, and user is passed through The retrieval result that described search interface carries out article data retrieval is shown,
Described data retrieval module, for the search condition that arranged by described search interface according to user from described data Retrieval in the data of memory module storage obtains retrieval result, and described retrieval result is fed back to described web graph shape interface Module.
The retrieval of the article data based on biological cloud platform provided in an embodiment of the present invention and automatic download system, pass through Download all of data in sequencing field and article, data and article are carried out parse, associate integration, participle, set up index and go forward side by side Row storage, so that user can carry out the retrieval of article data in a web page, is easy to reusing of public data And research.
Brief description
Fig. 1 is a kind of retrieval of the article data based on biological cloud platform of the present invention and automatic download system one embodiment Structural representation.
Specific embodiment
Purpose, technical scheme and advantage for making the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is explicitly described it is clear that described embodiment is the present invention A part of embodiment, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not having The every other embodiment being obtained under the premise of making creative work, broadly falls into the scope of protection of the invention.
Referring to Fig. 1, the present embodiment discloses a kind of retrieval of the article data based on biological cloud platform and automatic download is System, comprising:
Data download module 1, data resolution module 2, data memory module 3, the inspection of web graph shape interface module 4 data Rope module 5;Wherein,
Described data download module 1, downloads sequencing field all of article data for the data base from network,
Described data resolution module 2, the article data for obtaining download is parsed into the data of standard data format,
In a particular application, described standard data format can be json form.
Described data memory module 3, for parsing the data obtaining according to default participle to described data resolution module 2 Strategy carries out word segmentation processing, obtains data participle, and described data participle is set up with search index, and the institute to foundation search index State data participle to be stored,
Described web graph shape interface module 4, for providing a user with the search interface of article data, and user is led to Cross described search interface and carry out the retrieval result of article data retrieval and be shown,
Described data retrieval module 5, for the search condition that arranged by described search interface according to user from described number Obtain retrieval result according to retrieval in the data of memory module 3 storage, and described retrieval result is fed back to described web graph Xing Hua circle Face mould block 4.
In the embodiment of the present invention, data memory module 3 can enter line number to the data that data resolution module 2 parsing obtains first Integrate according to association, different data bases will be associated by force in various data bases, facilitate the various conditions of data retrieval module Combined retrieval it is ensured that the accurate inquiry of user, the data after integrating can be carried out with participle afterwards, set up and index and deposit Storage.
In the embodiment of the present invention, search interface can show multiple search conditions, user when entering line retrieval, Ke Yitong Cross input or select corresponding retrieval type to carry out the retrieval of article data.Retrieval with specific reference to user input or selection Formula is entered line retrieval from the participle data that data memory module stores and can be adopted existing searching document from bibliographic data base Search method, here is omitted for concrete retrieving.
The retrieval of the article data based on biological cloud platform provided in an embodiment of the present invention and automatic download system, pass through Download all of data in sequencing field and article, data and article are carried out parse, associate integration, participle, set up index and go forward side by side Row storage, so that user can carry out the retrieval of article data in a web page, is easy to reusing of public data And research.
Alternatively, in the present invention based on the retrieval of the article data of biological cloud platform and another reality of automatic download system Apply in example, also include:
Timing update module;Wherein,
Described timing update module, for the latest data in timing acquisition network and article, and by described latest data It is sent to described data memory module with article.
Alternatively, in the present invention based on the retrieval of the article data of biological cloud platform and another reality of automatic download system Apply in example, described data retrieval module, it is additionally operable to for latest data and article to be pushed to booking reader.
Alternatively, in the present invention based on the retrieval of the article data of biological cloud platform and another reality of automatic download system Apply in example, described data download module, download sequencing field for the data base from network by the Internet reptile all of Article data.
Alternatively, in the present invention based on the retrieval of the article data of biological cloud platform and another reality of automatic download system Apply in example, described data memory module, specifically for by including the data storage after processing according to described participle and index strategy The distributed elasticsearch cluster in portion.
In the embodiment of the present invention, elasticsearch cluster has the performances such as High Availabitity, high extension.When entering line retrieval, Can externally provide search api to facilitate web graph shape interface module to call according to elasticsearch cluster, and facilitate figure Change checking and using of interface user.User, can be according to different combination conditions pair after logging in graphic interface Mass data in elasticsearch carries out various different combinatorial search and details are checked.Number is stored by clustered According to ensure that integrity, safety, availability and the quick response of data.
Although being described in conjunction with the accompanying embodiments of the present invention, those skilled in the art can be without departing from this Various modifications and variations are made, such modification and modification each fall within by claims in the case of bright spirit and scope Within limited range.

Claims (5)

1. a kind of retrieval of the article data based on biological cloud platform and automatic download system are it is characterised in that include:
Data download module, data resolution module, data memory module, web graph shape interface module data retrieval module;Its In,
Described data download module, downloads sequencing field all of article data for the data base from network,
Described data resolution module, the article data for obtaining download is parsed into the data of standard data format,
Described data memory module, the data for obtaining to described data resolution module is tactful according to default participle and index Processed, and the data obtaining stored,
Described web graph shape interface module, for providing a user with the search interface of article data, and user is passed through described The retrieval result that search interface carries out article data retrieval is shown,
Described data retrieval module, for the search condition that arranged by described search interface according to user from described data storage In the data of module stores, retrieval obtains retrieval result, and described retrieval result is fed back to described web graph shape interface module.
2. system according to claim 1 is it is characterised in that also include:
Timing update module;Wherein,
Described timing update module, for the latest data in timing acquisition network and article, and by described latest data and literary composition Chapter is sent to described data memory module.
3. system according to claim 2 is it is characterised in that described data retrieval module, be additionally operable to latest data and Article is pushed to booking reader.
4. system according to claim 1 is it is characterised in that described data download module, for by the Internet reptile Data base from network downloads sequencing field all of article data.
5. system according to claim 1 is it is characterised in that described data memory module, specifically for will be according to described Data storage after participle and index strategy process distributed elasticsearch cluster internally.
CN201610687029.0A 2016-08-18 2016-08-18 The retrieval of article and data based on biological cloud platform and automatic download system Active CN106354759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610687029.0A CN106354759B (en) 2016-08-18 2016-08-18 The retrieval of article and data based on biological cloud platform and automatic download system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610687029.0A CN106354759B (en) 2016-08-18 2016-08-18 The retrieval of article and data based on biological cloud platform and automatic download system

Publications (2)

Publication Number Publication Date
CN106354759A true CN106354759A (en) 2017-01-25
CN106354759B CN106354759B (en) 2019-07-12

Family

ID=57843505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610687029.0A Active CN106354759B (en) 2016-08-18 2016-08-18 The retrieval of article and data based on biological cloud platform and automatic download system

Country Status (1)

Country Link
CN (1) CN106354759B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032436A (en) * 2021-04-16 2021-06-25 苏州臻璇数据信息技术有限公司 Searching method and device based on article content and title

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412933A (en) * 2013-08-20 2013-11-27 南京物联网应用研究院有限公司 Cloud search platform
CN103699572A (en) * 2013-11-26 2014-04-02 北京航空航天大学 Digital media content and resource integration and sharing method in cloud environment
CN104462865A (en) * 2014-10-17 2015-03-25 北京百迈客生物科技有限公司 Article analysis system and method based on biological cloud platform
CN105159971A (en) * 2015-08-26 2015-12-16 成都布林特信息技术有限公司 Cloud platform data retrieval method
CN105183809A (en) * 2015-08-26 2015-12-23 成都布林特信息技术有限公司 Cloud platform data query method
CN105205104A (en) * 2015-08-26 2015-12-30 成都布林特信息技术有限公司 Cloud platform data acquisition method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412933A (en) * 2013-08-20 2013-11-27 南京物联网应用研究院有限公司 Cloud search platform
CN103699572A (en) * 2013-11-26 2014-04-02 北京航空航天大学 Digital media content and resource integration and sharing method in cloud environment
CN104462865A (en) * 2014-10-17 2015-03-25 北京百迈客生物科技有限公司 Article analysis system and method based on biological cloud platform
CN105159971A (en) * 2015-08-26 2015-12-16 成都布林特信息技术有限公司 Cloud platform data retrieval method
CN105183809A (en) * 2015-08-26 2015-12-23 成都布林特信息技术有限公司 Cloud platform data query method
CN105205104A (en) * 2015-08-26 2015-12-30 成都布林特信息技术有限公司 Cloud platform data acquisition method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032436A (en) * 2021-04-16 2021-06-25 苏州臻璇数据信息技术有限公司 Searching method and device based on article content and title

Also Published As

Publication number Publication date
CN106354759B (en) 2019-07-12

Similar Documents

Publication Publication Date Title
EP3400540B1 (en) Database operation using metadata of data sources
US9703882B2 (en) Generating search results containing state links to applications
CN109344223B (en) Building information model management system and method based on cloud computing technology
US9600530B2 (en) Updating a search index used to facilitate application searches
CN103488781B (en) Method, the search engine server of information search are provided
CN102682090B (en) A kind of sensitive word matching treatment system and method based on polymerization word tree
CN105956123A (en) Local updating software-based data processing method and apparatus
CN107145496A (en) The method for being matched image with content item based on keyword
CN107861753B (en) APP generation index, retrieval method and system and readable storage medium
JP6554791B2 (en) Information processing system and information processing method for character input prediction
TW201241773A (en) Method and apparatus of determining product category information
WO2014183956A4 (en) Social media content analysis and output
US8862610B2 (en) Method and system for content search
CN105224658A (en) A kind of Query method in real time of large data and system
CN103365992A (en) Method for realizing dictionary search of Trie tree based on one-dimensional linear space
Lapi et al. Identification and utilization of components for a linked open data platform
CN108470296B (en) Business object information processing method and device
CN106650408B (en) Method and system for judging whether android system has root permission
CN110018982A (en) Method, apparatus, equipment and the computer readable storage medium of locating file
US20150294005A1 (en) Method and device for acquiring information
CN101957860B (en) Method and device for releasing and searching information
CN105095436A (en) Automatic modeling method for data of data sources
CN106354759A (en) Retrieving and automatically downloading system of articles and data based on biological cloud platform
US8667008B2 (en) Search request control apparatus and search request control method
KR20140026796A (en) System and method for providing customized patent analysis service

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant