CN106295252B - Search method for gene prod - Google Patents
Search method for gene prod Download PDFInfo
- Publication number
- CN106295252B CN106295252B CN201610687440.8A CN201610687440A CN106295252B CN 106295252 B CN106295252 B CN 106295252B CN 201610687440 A CN201610687440 A CN 201610687440A CN 106295252 B CN106295252 B CN 106295252B
- Authority
- CN
- China
- Prior art keywords
- gene
- keyword
- prod
- unique features
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Abstract
The present invention provides the search methods for gene prod, belong to information retrieval field, including constructing homologous gene database, obtain keyword to be retrieved, determine that unique features label corresponding with keyword expands keyword according to unique features label, it obtains and expands keyword, carry out network retrieval according to keyword is expanded.By obtaining unique features label according to keyword to be retrieved, expansion processing is carried out to keyword based on unique features label, it is final that the whole network retrieval is carried out according to obtained expansion keyword, multiple restriction corresponding with keyword to be retrieved is contained due to expanding in keyword, to guarantee can search on the internet with the strongest resource of keyword relevance, reduce other interference of the unrelated resource to search result.
Description
Technical field
The invention belongs to information retrieval fields, in particular to are used for the search method of gene prod.
Background technique
With the development of sequencing technologies, several species gene order-checking is completed successively, and rapid due to Internet technology
Development has become becoming in the industry based on the search that internet carries out the associated materials such as gene and gene document, gene prod
Gesture.
So far, the inner number of genes of including of U.S. National Institutes gene database (NCBI) alreadys exceed 1,003,000,000
Item.But due to the presence of the historical reasons of naming rule and homologous gene, every gene is in addition to numbering (gene ID) with gene
Except, it is also possible to have gene full name (gene full name), gene symbol (gene symbol), also known as (aliase,
The title in the industry such as synonym), can not be included when including gene document, gene prod by unified title.Cause to work as
It is preceding be based on term single gene name keyword search inquiry specific gene relevant information and product when, search efficiency it is low and inquiry knot
Easily there is situations such as extraneous data or missing data in fruit.Huge difficulty is brought to the search in later period in this way.
Summary of the invention
In order to solve shortcoming and defect existing in the prior art, the present invention provides for improving being used for for recall precision
The search method of gene prod.
In order to reach above-mentioned technical purpose, the present invention provides the search method for gene prod, the search method
Include:
According to gene number, gene symbol, gene full name and nickname building homologous gene database;
Keyword to be retrieved is obtained, unique features label corresponding with keyword is determined from homologous gene database;
According to unique features label, keyword is opened up in conjunction with gene number, gene symbol, gene full name and nickname
Exhibition obtains and expands keyword;
Network retrieval is carried out according to keyword is expanded, search result is exported.
Optionally, the search method, further includes:
Building include gene document, gene prod searching database, the searching database be equipped with it is each described
Gene document, the corresponding unique features label of each gene prod.
Optionally, the search method, further includes:
Including gene document corresponding with the unique features label and/or gene are chosen in the searching database
The search result of product;
The search result is exported.
Optionally, described according to unique features label, it is right in conjunction with gene number, gene symbol, gene full name and nickname
Keyword is expanded, and is obtained and is expanded keyword, comprising:
According to unique features label, target gene number corresponding with unique features label, target gene symbol, mesh are determined
Gene full name and nickname;
Based on keyword, by target gene number, the target gene symbol, the target gene full name with
And also known as by or logical construction expanded, obtain expand keyword.
Optionally, further includes:
The unique features label is character string, and sequence byte and verifying byte are equipped in the character string.
Optionally, it is equipped with and gene number, gene symbol, gene full name and nickname in the homologous gene database
Corresponding label.
Optionally, including the expansion keyword is including at least gene number, gene symbol, gene full name and nickname
Character string.
Optionally, further includes:
Species gene data are obtained from gene word bank, and species gene data are screened in conjunction with comparison database, are obtained
To across the direct homologous gene of species;
Based on across the direct homologous gene of species, being numbered with gene full name or gene is mutually all that standard carries out in gene word bank
Expand matching, obtain direct homologous gene keyword data collection, is established according to obtained direct homologous gene keyword data collection
Non-redundant database;
The expansion keyword with Keywords matching is chosen in non-redundant database.
Optionally, the combination comparison database screens species gene data, obtains across the directly homologous base of species
Cause, comprising:
Sample gene data corresponding with species gene data is extracted from comparison database, is based on sample gene data pair
Species gene data carry out duplicate removal screening, after being screened across the direct homologous gene of species.
Optionally, the direct homologous gene keyword data stored in the non-redundant database has uniqueness.
Technical solution provided by the invention has the benefit that
By obtaining unique features label according to keyword to be retrieved, keyword is opened up based on unique features label
Exhibition processing, it is final that the whole network retrieval is carried out according to obtained expansion keyword, due to expand in keyword contain with it is to be retrieved
The corresponding multiple restriction of keyword, to guarantee to search on the internet and the strongest resource of keyword relevance, drop
Other the low interference of unrelated resource to search result.
Detailed description of the invention
It, below will be to attached drawing needed in embodiment description in order to illustrate more clearly of technical solution of the present invention
It is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow diagram provided by the present invention for the search method of gene prod;
Fig. 2 is the flow diagram of the acquisition modes provided by the invention for expanding keyword.
Specific embodiment
To keep structure and advantage of the invention clearer, structure of the invention is made further below in conjunction with attached drawing
Description.
Embodiment one
The present invention provides the search methods for gene prod, as shown in Figure 1, the search method includes:
11, according to gene number, gene symbol, gene full name and nickname building homologous gene database.
12, keyword to be retrieved is obtained, unique features mark corresponding with keyword is determined from homologous gene database
Label.
13, according to unique features label, in conjunction with gene number, gene symbol, gene full name and also known as to keyword into
Row is expanded, and is obtained and is expanded keyword.
14, according to keyword progress network retrieval is expanded, search result is exported.
In an implementation, in order to search result as abundant as possible and with gene-correlation according to keyword acquisition, this hair
The bright search method provided for gene prod constructs homologous gene database, in homologous base in this search method first
Because including that a large amount of gene is numbered in database, gene symbol, gene full name and nickname.In order in the next steps, energy
Enough particular contents according to keyword, in homologous gene database it is determining with the associated gene number of keyword, gene symbol,
Gene full name and nickname.Then according to keyword to be retrieved is got, the homologous gene database that is constructed from back
Middle determination unique features label corresponding with keyword.Again according to contents such as the corresponding gene numbers of unique features label to pass
Keyword carries out expansion processing, obtains that treated and expands keyword.The whole network retrieval is finally carried out according to expansion keyword, is examined
Hitch fruit.
In above-mentioned steps, why be arranged obtain unique features label the step of, be in order to will include gene number,
Resource in the homologous gene database of gene symbol, gene full name and nickname expands keyword, to keyword into
Row accurately limit, thus guarantee can search on the internet with the strongest resource of keyword relevance, reduce other nothings
Close interference of the resource to search result.
It is worth noting that, being closed present in homologous gene database when determining unique features label in step 12
Keyword group may be corresponded with keyword, in this way, unique features label can be directly determined with corresponding crucial phrase;If
In homologous gene database, for keyword to be retrieved, there are more than one crucial phrases to be corresponding to it, and needs in this way
More close crucial phrase is chosen from multiple crucial phrases, and then determines unique features corresponding with the crucial phrase selected
Label, consequently facilitating completing subsequent processing steps according to determining unique features label.
The step of expanding keyword is obtained in step 13 to specifically include:
According to unique features label, target gene number corresponding with unique features label, target gene symbol, mesh are determined
Gene full name and nickname;
Based on keyword, by target gene number, the target gene symbol, the target gene full name with
And also known as by or logical construction expanded, obtain expand keyword.
Unique features label therein is character string, and sequence byte and verifying byte are equipped in the character string.So as to
In after determining unique features label, calculated sequence byte is verified by verifying byte.In addition, in order to homologous
It is equipped in gene database and gene number, gene symbol, gene full name and also known as corresponding label.The expansion got is closed
Keyword is including at least the character string including gene number, gene symbol, gene full name and nickname.
Specifically, the search method, further includes: building includes the searching database of gene document, gene prod, in institute
It states searching database and is equipped with unique features label corresponding with each gene document, each gene prod.
In an implementation, in addition to what is proposed in the above method expands keyword, the whole network is carried out based on keyword is expanded
Retrieval is unexpected, further includes building searching database, and then retrieved in searching database according to unique features label, obtains
Result after retrieval.
So-called searching database in this step in advance may be used comprising the database including gene document, gene prod
Can as search result gene document and gene prod construct database, and be searching database in each gene pairs
The content answered assigns unique features label.It, can be in the retrieval number in this way after determining unique features label according to keyword
According to selection search result corresponding with the unique features label, including gene document and/or gene prod in library, and then will
The search result output, selects retrieval content corresponding with keyword, phase according to unique features label in searching database
For carrying out the whole network retrieval by internet, more rapid and accurately retrieval can be realized.
In the first retrieval mode, the mode for carrying out the whole network retrieval according to expansion keyword is proposed, is set forth below another
A kind of acquisition modes about expansion keyword, detailed process is as shown in Figure 2.
21, species gene data are obtained from gene word bank, and species gene data are screened in conjunction with comparison database,
It obtains across the direct homologous gene of species.
22, based on across the direct homologous gene of species, being numbered with gene full name or gene is mutually all standard in gene word bank
Expansion matching is carried out, direct homologous gene keyword data collection is obtained, according to obtained direct homologous gene keyword data collection
Establish non-redundant database.
23, the expansion keyword with Keywords matching is chosen in non-redundant database.
In an implementation, according to National Center for Biotechnology Information (National Center of
Biotechnology Information, NCBI) gene word bank arrange several species gene data, in conjunction with HomoloGene
Database screens across the direct homologous gene of species, is all standard in gene with gene symbol Symbol or full name full name phase
Direct homologous gene data are expanded in matching in word bank, finally generate direct homologous gene keyword data collection, establish gene symbol
Symbol title non-redundant database chooses the expansion keyword with Keywords matching.
Combination comparison database in step 21 screens species gene data, obtains across the direct homologous gene of species
Concrete mode are as follows: corresponding with species gene data sample gene data is extracted from comparison database, based on sample gene
Data to species gene data carry out duplicate removal screening, after being screened across the direct homologous gene of species.
Also, the direct homologous gene keyword data stored in non-redundant database has uniqueness.
The present invention provides the search methods for gene prod, including building homologous gene database, obtain to be retrieved
Keyword, determine that corresponding with keyword unique features label expands keyword according to unique features label, obtain
Expansion keyword is taken, carries out network retrieval according to keyword is expanded.By obtaining unique features mark according to keyword to be retrieved
Label, carry out expansion processing to keyword based on unique features label, final to carry out the whole network retrieval according to obtained expansion keyword,
Multiple restriction corresponding with keyword to be retrieved is contained due to expanding in keyword, to guarantee to search on the internet
Rope to the strongest resource of keyword relevance, reduce other interference of the unrelated resource to search result.
Each serial number in above-described embodiment is for illustration only, the assembling for not representing each component or the elder generation in use process
Sequence afterwards.
The above description is only an embodiment of the present invention, is not intended to limit the invention, all in the spirit and principles in the present invention
Within, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Claims (9)
1. being used for the search method of gene prod, which is characterized in that the search method includes:
According to gene number, gene symbol, gene full name and nickname building homologous gene database;
Keyword to be retrieved is obtained, unique features label corresponding with keyword is determined from homologous gene database;
According to unique features label, keyword is expanded in conjunction with gene number, gene symbol, gene full name and nickname,
It obtains and expands keyword;
Network retrieval is carried out according to keyword is expanded, search result is exported;
Wherein, it obtains and expands keyword further include:
From gene word bank obtain species gene data, species gene data are screened in conjunction with comparison database, obtain across
The direct homologous gene of species;
Based on across the direct homologous gene of species, being numbered with gene full name or gene is mutually all that standard is expanded in gene word bank
Matching, obtains direct homologous gene keyword data collection, is established according to obtained direct homologous gene keyword data collection non-superfluous
Remaining database;
The expansion keyword with Keywords matching is chosen in non-redundant database.
2. the search method according to claim 1 for gene prod, which is characterized in that the search method is also wrapped
It includes:
Building includes the searching database of gene document, gene prod, is equipped with and each gene in the searching database
Document, the corresponding unique features label of each gene prod.
3. the search method according to claim 2 for gene prod, which is characterized in that the search method is also wrapped
It includes:
Including gene document corresponding with the unique features label and/or gene prod are chosen in the searching database
Search result;
The search result is exported.
4. the search method according to claim 1 for gene prod, which is characterized in that described according to unique features mark
Label expand keyword in conjunction with gene number, gene symbol, gene full name and nickname, obtain and expand keyword, packet
It includes:
According to unique features label, target gene number corresponding with unique features label, target gene symbol, purpose base are determined
Because of full name and nickname;
Based on keyword, by target gene number, the target gene symbol, the target gene full name and not
Claim by or logical construction expanded, obtain expand keyword.
5. the search method according to claim 1 for gene prod, which is characterized in that further include:
The unique features label is character string, and sequence byte and verifying byte are equipped in the character string.
6. the search method according to claim 1 for gene prod, which is characterized in that in the homologous gene data
It is equipped in library and gene number, gene symbol, gene full name and also known as corresponding label.
7. being used for the search method of gene prod according to claim 1 or 5, which is characterized in that the expansion keyword
Character string to be numbered including at least gene, including gene symbol, gene full name and nickname.
8. the search method according to claim 1 for gene prod, which is characterized in that the combination comparison database
Species gene data are screened, are obtained across the direct homologous gene of species, comprising:
Sample gene data corresponding with species gene data is extracted from comparison database, based on sample gene data to species
Gene data carry out duplicate removal screening, after being screened across the direct homologous gene of species.
9. the search method according to claim 1 for gene prod, which is characterized in that in the non-redundant database
The direct homologous gene keyword data of middle storage has uniqueness.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610687440.8A CN106295252B (en) | 2016-08-18 | 2016-08-18 | Search method for gene prod |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610687440.8A CN106295252B (en) | 2016-08-18 | 2016-08-18 | Search method for gene prod |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106295252A CN106295252A (en) | 2017-01-04 |
CN106295252B true CN106295252B (en) | 2019-05-07 |
Family
ID=57661318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610687440.8A Active CN106295252B (en) | 2016-08-18 | 2016-08-18 | Search method for gene prod |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106295252B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108428137A (en) * | 2017-02-14 | 2018-08-21 | 阿里巴巴集团控股有限公司 | Generate the method and device of abbreviation, verification electronic banking rightness of business |
CN110349632B (en) * | 2019-06-28 | 2020-06-16 | 南方医科大学 | Method for screening gene keywords from PubMed literature |
CN111540472B (en) * | 2020-05-18 | 2023-06-20 | 霓蝶(上海)医疗科技有限公司 | Intelligent risk assessment system and method for health activities |
CN111739585B (en) * | 2020-06-24 | 2022-10-18 | 胡嘉欣 | Information extraction method based on NCBI database and related equipment thereof |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1744080A (en) * | 2005-09-27 | 2006-03-08 | 南方医科大学 | Specific function-related gene information searching system and method for building database of searching workds thereof |
CN101201847A (en) * | 2007-12-26 | 2008-06-18 | 北京东方灵盾科技有限公司 | System and method for searching conventional medicament patent information |
CN101266601A (en) * | 2007-03-14 | 2008-09-17 | 沈诗昊 | Gene chip data search engine |
CN101539916A (en) * | 2008-03-17 | 2009-09-23 | 亿维讯软件(北京)有限公司 | Initial patent retrieving device, secondary patent retrieving device and patent retrieving system |
CN101738196A (en) * | 2009-12-10 | 2010-06-16 | 东软集团股份有限公司 | Method and device of navigation equipment for information retrieval |
CN102043812A (en) * | 2009-10-13 | 2011-05-04 | 北京大学 | Method and system for retrieving medical information |
CN104090890A (en) * | 2013-12-12 | 2014-10-08 | 深圳市腾讯计算机系统有限公司 | Method, device and server for obtaining similarity of key words |
CN105589936A (en) * | 2015-12-11 | 2016-05-18 | 航天恒星科技有限公司 | Data query method and system |
CN105630813A (en) * | 2014-10-30 | 2016-06-01 | 苏宁云商集团股份有限公司 | Keyword recommendation method and system based on user-defined template |
CN105740243A (en) * | 2014-12-08 | 2016-07-06 | 深圳华大基因研究院 | Method and device for constructing biological information database |
-
2016
- 2016-08-18 CN CN201610687440.8A patent/CN106295252B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1744080A (en) * | 2005-09-27 | 2006-03-08 | 南方医科大学 | Specific function-related gene information searching system and method for building database of searching workds thereof |
CN101266601A (en) * | 2007-03-14 | 2008-09-17 | 沈诗昊 | Gene chip data search engine |
CN101201847A (en) * | 2007-12-26 | 2008-06-18 | 北京东方灵盾科技有限公司 | System and method for searching conventional medicament patent information |
CN101539916A (en) * | 2008-03-17 | 2009-09-23 | 亿维讯软件(北京)有限公司 | Initial patent retrieving device, secondary patent retrieving device and patent retrieving system |
CN102043812A (en) * | 2009-10-13 | 2011-05-04 | 北京大学 | Method and system for retrieving medical information |
CN101738196A (en) * | 2009-12-10 | 2010-06-16 | 东软集团股份有限公司 | Method and device of navigation equipment for information retrieval |
CN104090890A (en) * | 2013-12-12 | 2014-10-08 | 深圳市腾讯计算机系统有限公司 | Method, device and server for obtaining similarity of key words |
CN105630813A (en) * | 2014-10-30 | 2016-06-01 | 苏宁云商集团股份有限公司 | Keyword recommendation method and system based on user-defined template |
CN105740243A (en) * | 2014-12-08 | 2016-07-06 | 深圳华大基因研究院 | Method and device for constructing biological information database |
CN105589936A (en) * | 2015-12-11 | 2016-05-18 | 航天恒星科技有限公司 | Data query method and system |
Also Published As
Publication number | Publication date |
---|---|
CN106295252A (en) | 2017-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106295252B (en) | Search method for gene prod | |
CN103491205B (en) | The method for pushing of a kind of correlated resources address based on video search and device | |
CN103902698B (en) | A kind of data-storage system and storage method | |
CN111460311A (en) | Search processing method, device and equipment based on dictionary tree and storage medium | |
CN106156082B (en) | A kind of ontology alignment schemes and device | |
CN104268280B (en) | A kind of Hierarchical storage and querying method based on key value database | |
Jin et al. | GBLENDER: towards blending visual query formulation and query processing in graph databases | |
CN107832047A (en) | A kind of non-api function argument based on LSTM recommends method | |
CN101882152A (en) | Portable learning machine and resource retrieval method thereof | |
CN112115265A (en) | Small sample learning method in text classification | |
CN104392171B (en) | A kind of automatic internal memory evidence analysis method based on data association | |
Lee et al. | Seeding for pervasively overlapping communities | |
CN104794130B (en) | Relation query method and device between a kind of table | |
Filipavicius et al. | Pre-training protein language models with label-agnostic binding pairs enhances performance in downstream tasks | |
JP5980520B2 (en) | Method and apparatus for efficiently processing a query | |
CN109471951A (en) | Lyrics generation method, device, equipment and storage medium neural network based | |
CN102541284B (en) | A kind of method and system of carrying out combination through target quantity in character input | |
CN103870460B (en) | One kind beautiful search method and system | |
CN111061972A (en) | AC searching optimization method and device for URL path matching | |
CN103500214B (en) | Word segmentation information pushing method and device based on video searching | |
CN107180098B (en) | Keyword eliminates method and device in a kind of information search | |
Li et al. | FACC: a novel finite automaton based on cloud computing for the multiple longest common subsequences search | |
CN113204676B (en) | Compression storage method based on graph structure data | |
Yang et al. | Large-scale metagenomic sequence clustering on map-reduce clusters | |
CN105094209B (en) | The restorative procedure and device of data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |