US20140089328A1 - Association of data to a biological sequence - Google Patents

Association of data to a biological sequence Download PDF

Info

Publication number
US20140089328A1
US20140089328A1 US13628967 US201213628967A US20140089328A1 US 20140089328 A1 US20140089328 A1 US 20140089328A1 US 13628967 US13628967 US 13628967 US 201213628967 A US201213628967 A US 201213628967A US 20140089328 A1 US20140089328 A1 US 20140089328A1
Authority
US
Grant status
Application
Patent type
Prior art keywords
data
sequence
biological
probe
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13628967
Inventor
James R. Kozloski
Clifford A. Pickover
Jacinta M. Wubben
Ruhong Zhou
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • G06F17/30522Query processing with adaptation to user needs
    • G06F17/3053Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/22Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or Single-Nucleotide Polymorphism [SNP] discovery or sequence alignment

Abstract

A computer assembly includes a processor configured to access data on a network and to perform a method. The method includes identifying, in the network, one or more references having a relevance level greater than a predetermined threshold. The one or more references are associated to one or more probe sequences corresponding to one or more biological sequences. The one or more probe sequences are ranked based on one or more criteria corresponding to a target biological sequence. The one or more probe sequences are assigned with a level of affinity to one or more segments of the target biological sequence based at least on the ranking of each of the one or more probe sequences.

Description

    BACKGROUND
  • [0001]
    The present disclosure relates to a simulated binding of data to a biological sequence, and in particular to identifying data that is relevant to a biological sequence, ranking the data according to its importance, and providing the data to a user according to the ranking.
  • [0002]
    Analysis of biological data, including biological sequences, may require large amounts of data stored on different computers to perform the analysis. Biological data being researched may be annotated by a program to refer to data, such as research publications, related to the biological data. This allows a researcher to see other data that is related to the present research of the biological data. While research articles and other publications are useful for analyzing biological data, other resources are also useful, such as analysis tools and software programs. In addition, over time, the amount of information regarding biological data being resourced grows. When biological data is annotated to refer to related publications, the annotations also increase, which may make it more difficult for a researcher to identify the important information related to present research.
  • SUMMARY
  • [0003]
    Exemplary embodiments include a computer assembly for associating data with a biological sequence. The computer assembly includes a processor configured to access data on a network and to perform a method. The method includes identifying, in the network, one or more references having a relevance level greater than a predetermined threshold. Each reference of the one or more references is associated with a probe sequence corresponding to a segment of a biological sequence. The method includes ranking one or more probe sequences based on one or more criteria and assigning the one or more probe sequences with a level of affinity to a segment of a target biological sequence based at least on the ranking of each probe sequence.
  • [0004]
    Embodiments further include a system for simulating annealing to a biological sequence. The system includes one or more network computers having stored therein data and a host computer. The host computer has stored therein a biological sequence. The host computer is connected to the one or more network computers via a communications network. The host computer is configured to identify data in the one or more network computers as relevant data that is relevant to the biological sequence and to associate the relevant data with a segment of the biological sequence. The host computer is further configured to rank the relevant data based on predetermined criteria to determine a level of affinity of the relevant data with the segment of the biological sequence.
  • [0005]
    Embodiments further include a computer program product for simulating annealing to a biological sequence. The computer program product includes a processor and a non-transitory computer readable medium having stored thereon code to perform a method. The method includes identifying, by the processor, references to data in a network as relevant references that are relevant to a biological sequence and associating the relevant references with a segment of the biological sequence. The method includes ranking the relevant references based on predetermined criteria to determine a level of affinity of the relevant references with the segment of the biological sequence.
  • [0006]
    Additional features and advantages are realized by implementation of embodiments of the present disclosure. Other embodiments and aspects of the present disclosure are described in detail herein and are considered a part of the claimed invention. For a better understanding of the embodiments, including advantages and other features, refer to the description and to the drawings.
  • BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
  • [0007]
    The subject matter which is regarded embodiments of the present disclosure is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The forgoing and other features, and advantages of the embodiments are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
  • [0008]
    FIG. 1 illustrates a network system according to embodiments of the present disclosure;
  • [0009]
    FIG. 2 illustrates a simulated annealing module according to embodiments of the disclosure;
  • [0010]
    FIG. 3 illustrates a user customization display according to embodiments of the disclosure;
  • [0011]
    FIG. 4A illustrates an annealing display according to an embodiment of the disclosure;
  • [0012]
    FIG. 4B illustrates an annealing display according to an embodiment of the disclosure;
  • [0013]
    FIG. 5 illustrates an annealing display according to another embodiment of the disclosure;
  • [0014]
    FIG. 6 illustrates a table according to embodiments of the disclosure;
  • [0015]
    FIG. 7 illustrates a flowchart of a method according to embodiments of the disclosure;
  • [0016]
    FIG. 8 illustrates a computer system according to embodiments of the disclosure; and
  • [0017]
    FIG. 9 illustrates a computer program product according to embodiments of the disclosure.
  • DETAILED DESCRIPTION
  • [0018]
    The large volume of data that may be annotated to a biological sequence may make it difficult for a researcher to identify important data. Embodiments of the present disclosure relate to displaying a simulated annealing of data and references to a biological sequence to allow researchers to quickly identify important information.
  • [0019]
    FIG. 1 illustrates a network system 100 according to an embodiment of the present disclosure. The system 100 includes a host computer 110 including a simulated annealing module 111, a biological sequence 112 (also referred to as a target biological sequence 112) and data 113. The simulated annealing module 111 is configured to analyze data and references to the data, to determine which data is relevant data 114 to the biological sequence 112, and to determine a level of affinity of the relevant data 114 to the biological sequence 112 based on predetermined ranking criteria. The host computer 110 may display the determined level of affinity by displaying the biological sequence 112, displaying relevant data 114, symbols representing the relevant data 114, or symbols representing references to the relevant data 114, and adjusting a distance of the relevant data 114 (or corresponding symbols) from the biological sequence 112 based on the level of affinity of the relevant data 114 with respect to the biological sequence 112.
  • [0020]
    The host computer 110 may be connected to a network 120. The network 120 may communicate with one or more network computers 130, which in the present specification and claims refer to computers connected to the network 120 to communicate via the network 120. The network computers 130 may include data 131, such as documents 132, analysis tools 133 and biographical data 134. While only a few types of data are illustrated for purposes of description, any type of data may be stored by the network computers 130. The simulated annealing module 111 may access the data 131 to determine which data 131 is relevant to the biological sequence 112. The simulated annealing module 111 may rank the relevant data 131 based on the predetermined ranking criteria to determine a level of affinity of the data 131 to the biological sequence 112.
  • [0021]
    The host computer 110 may also be connected to one or more storage devices 140, and the storage devices may store one or more references 141 pointing to data 142, and one or more biological sequences 143 that may be target biological sequences, or biological sequences against which data is compared to determine a level of affinity. For example, the storage 140 may contain a database of biological sequences 143 and a user of the host computer 110 may upload a biological sequence 143 from the storage 140 to the host computer 110 to allow the simulated annealing module 111 to perform an analysis of data, such as data 113, data 131 and data 142 with respect to the biological sequence 143, to determine the level of affinity of the data to the biological sequence 143.
  • [0022]
    In addition, the network 120 may access one or more of storage 170 and network computers 180 via a server 150 connected to the Internet 160. Alternatively, the host computer 110 may directly connect to the Internet 160. The storage 170 and network computer 180 may include data, references and biological sequences accessible by the host computer 110 to perform analysis.
  • [0023]
    In embodiments of the present disclosure, the biological sequence 112 or 143 may include any type of biological sequence, including deoxyribonucleic acid (DNA), ribonucleic acid (RNA), an amino acid sequence of a protein, or any other biological sequence. Data includes documents, files, stored biographical information of a person or information of an organization, stored publications, data regarding a number of queries of the simulated annealing module 111 or other systems, data regarding previous analysis performed on the biological sequence 112, analysis tools, algorithms or programs, medical treatments associated with the biological sequence 112, data regarding comments or reviews of publications or tools, or any other data. References include any pointer or address that indicates a location of data or provides additional information regarding the data. Examples include uniform resource locators (URLs), uniform resource names (URNs), hyperlinks, javascript pointers to data, XML pointers to data or any other type of reference to data.
  • [0024]
    An operation of the host computer 110 including the simulated annealing module 111 will be described below with reference to FIGS. 1 and 2. The simulated annealing module 111 may include a biological sequence identifier 206 to identify a target biological sequence 112. For example, a user accessing the host computer 110 may display the biological sequence 112 or data corresponding to the biological sequence 112 on a display device. Alternatively, the simulated annealing module 111 may automatically, or based on predetermined commands to identify a predetermined biological sequence or a predetermined class or group of biological sequences, search one or more of the host computer 110, storage 140 and 170, and network computers 130 and 180 to identify biological sequences to be target biological sequences 112. In the present specification and claims, a “target biological sequence” is defined as a biological sequence that is selected by a user or program to be subject to ranking of related data, and in some embodiments simulated annealing, as described in embodiments of the disclosure.
  • [0025]
    The simulated annealing module 111 may include a reference identifier 201, a reference generator 202, a relevance identifier 203 and a reference/data associator 204. The reference identifier 201 may search memory of a device, such as the host computer 110, of connected storage devices 140, of devices 130 connected to a network 120, or of devices 170 and 180 connected to the Internet 160 for references to data, such as URLs that refer to data at a particular location. In addition, in circumstances in which data does not correspond to a reference, or the reference is not in a format usable by the simulated annealing module 111, the reference generator 202 may search memory of a device, such as the host computer 110, of connected storage devices 140, of devices 130 connected to a network 120, or of devices 170 and 180 connected to the Internet 160 for data, such as documents, biographical data, data related to analysis tools, and any other data. The reference generator 202 may then generate a reference, such as a URL that points to a location of the data.
  • [0026]
    The relevance identifier 203 analyzes the data, such as the data pointed to by the searched references or the data identified by the reference generator 202, to determine whether the data meets a threshold level of relevance. The threshold level of relevance may be based on predetermined criteria, such as a similarity of the data to a target biological sequence, a source of the data, such as an organization supplying the data (e.g. university, company, etc.), an author of the data, a publisher of the data, and a type of operation performed by execution of the data (such as in the case of an analysis tool for analyzing biological sequences). The threshold level of relevance may also be based on a frequency with which the data is accessed or referenced, a frequency with which the data is accessed or referenced by predetermined classes, such as researchers, scientists, professional organizations, etc., or a frequency with which the data is associated with a target biological sequence. In other words, the threshold level of relevance may be related to a target sequence or may include criteria unrelated to the target sequence. The threshold level of relevance may be based on the content of the data, such as an identity of a person or organization that is the subject of biographical information data, or content of a document or file. In addition, the threshold level relevance may be based on usage of the data, such as how often the data is accessed or referenced or by whom the data is accessed or referenced.
  • [0027]
    Based on a determination that the data meets a threshold level of relevance, the reference/data associator 204 associates a reference, either identified or generated, with the data. For example, the reference and data, or information identifying the data, may be stored in a reference table 205. The probe generator 207 may generate a probe or probe sequence and the probe/reference associator 208 may associate the probe sequence with the reference, such as by adding the probe sequence to the reference table 205. In one embodiment, the probe represents a degree to which the reference, or the data associated with the reference, corresponds to a particular segment of a biological sequence to which the data pertains. The segment of the biological sequence to which the data pertains may be a portion less than the entire biological sequence, but in some examples the segment could correspond to the entire biological sequence. In one embodiment, the probe is identified by a sequence that is complementary to a sequence of a segment of the biological sequence to which the data pertains. For example, if a segment of the biological sequence to which the data pertains has a configuration of “GGGGAAAATT,” the probe may correspond to a complementary probe sequence, or “CCCCTTTTAA.” Accordingly, the host computer 110 may match references and data to portions of biological sequences pertaining to each reference according to a sequence indicated by a probe sequence. In other embodiments, the probe identifies spatially, numerically or graphically a portion of the biological sequence to which it pertains.
  • [0028]
    The rank calculator 209 may calculate a rank of each probe sequence, or each reference, or of the data corresponding to each reference and probe sequence. The ranking may be based on one or more criteria, and the criteria may be weighted so that different criteria affect the ranking more than other criteria. For example, a user may set up a profile in the host computer 110 or the simulated annealing module 111, and the user may indicate which indicia the user would prefer to be given the most weight. In one embodiment, the weighting criteria are analogous to biological characteristics of a probe and biological sequence in a biological annealing process.
  • [0029]
    One criterion, which may be analogous to a biological complementarity requirement, is a determination of a similarity, resemblance, or overlap of the data or the probe with a segment of the target biological sequence. For example, a document may explicitly describe the segment of the target biological sequence or an analysis tool may have been used to analyze the segment of the target biological sequence. Alternatively, a document may describe a similar, but not identical, biological sequence, which may result in a lower ranking.
  • [0030]
    Another criterion, which may be analogous to a binding affinity of a probe in a biological annealing, is a determination of importance or prestige of a probe, data or reference. The importance of the data may be determined based on information about the data that does not necessarily relate to the target biological information. For example, the importance of the data may be based on one or more of a number of citations the authors of a document have received, the identities of people or organizations that have referenced a document, a university or organization where the data was generated, such as where research was conducted, a type of analysis performed in a document, a name of an author of a document, or any other factors that may provide information regarding the prestige or importance of data in a field. When the data relates to an analysis tool, for example, the importance of the data may be determined based on a type of analysis performed by the tool, a university or organization where the analysis tool was developed, a creator of the analysis tool, or any other factors that may provide information regarding the prestige or importance of the analysis tool.
  • [0031]
    Another criterion, which may be analogous to probe mobility in a biological annealing, is a determination of a popularity of data. The determination may take into account a frequency with which the data is accesses or cited, such as a frequency with which a document is cited in other publications or a frequency with which an analysis tool is used in a field. In an embodiment in which the data is biographical information, such as information about a researcher, the determination may consider a frequency with which the researcher is cited. In other words, while determining the importance or prestige of the data relates to factors that are not directly associated with the data (such as a prestige of the source of the data), the popularity of the data may relate to the data itself.
  • [0032]
    Another criterion, which may be analogous to occupancy constraints in a biological annealing, is a determination of a historical applicability of data to a target biological sequence, to the probe or to other probe sequences. For example, the determination may be based on a frequency with which an analysis tool has been used to analyze a target segment of the biological sequence, or a frequency with which a document has been cited in reference to the target segment of the biological sequence.
  • [0033]
    Although a few examples of ranking criteria have been provided, embodiments of the present disclosure encompass any ranking criteria including prior uses of a tool, prior citations of a document, a quality of citations to the data, affiliations of an author of data, ease-of-use of a tool, cost to implement an analysis tool, number of software or tool citations, a date of citations or references to the data, votes or other determinations of importance of data by crowd sourcing, content of user comments about the data, etc. In addition, in one embodiment a stochastic element may be introduced to the rankings to allow for optimization away from local minima.
  • [0034]
    In one embodiment, the probe sequence includes or has attached to it the data associated with the ranking criteria. When a target biological sequence is identified, it may be compared with all of the available probe sequences, and the available probe sequences may be ranked based on a similarity with segments of the target biological sequence. Accordingly, in one embodiment, the data and references identified in the system are not directly compared with data associated with the target biological sequence. Instead, a probe sequence having stored therein, attached thereto, or which inherently includes the ranking criteria data is compared to segments of the target biological sequence.
  • [0035]
    The probe sequences may be generated by systems or users that generated the data and/or references, or the probe sequences may be generated by a system or user that identifies a target biological sequence. For example, in one embodiment, a system that identifies a target biological sequence searches a network for previously-generate probe sequences and performs the ranking operation. In another embodiment, the system that identifies the target biological sequence may search the network, identify relevant data, and generate the probe sequences corresponding to the relevant data. Then the data, references or generated probe sequences may be compared to predetermined criteria and to the target biological sequence. In yet another embodiment, a system may combine both analysis of pre-generated probe sequences with the generation of new probe sequences based on newly-identified data or references in a network.
  • [0036]
    Once the data, references or probe sequences have been ranked by the rank calculator 209, the annealing display generator 210 generates a graphical display of the ranking. The graphical display may display an icon or other representation of the data, reference, or probe and of the target biological sequence, or one or more target segments of the biological sequence. The annealing display generator may display the ranking by displaying icons associated with data having a higher ranking as being located closer to a corresponding segment of the target biological sequence, while an icon associated with data having a lower ranking are located farther from the segment of the target biological sequence.
  • [0037]
    In embodiments of the present disclosure, the data may be analyzed by the simulated annealing module 111 to determine relevant data and to rank the data by analyzing one or more of key words in the data, frequency of keywords, groups of key words, frequency of groups of keywords, metadata associated with the data, or any other content of the data or content related to the data.
  • [0038]
    According to embodiments of the present disclosure, data, references and probe sequences may be competitively ranked to ensure that data, references and probe sequences determined to be of highest interest to a user are more closely associated with a target biological sequence being analyzed by the user.
  • [0039]
    In one embodiment, in addition to annealing one or more probe sequences to a target biological sequence, the simulated annealing module 111 may anneal one or more probe sequences to one or more other probe sequences. For example, if one probe sequence represents an analysis tool, another probe sequence representing a program or tool for improving the efficiency of the analysis tool may be simulated as being annealed to the first probe sequence. In another example, if a first probe sequence represents a software application, one or more probe sequences representing journal citations including formulas or analysis using the software application may be simulated as being annealed to the first probe sequence.
  • [0040]
    In embodiments of the present disclosure, a user may determine settings, or may generate a profile, to adjust or alter the ranking and display of data, references, and probe sequences. FIG. 3 illustrates a display 300 or graphical user interface (GUI) 300 which may be displayed on an electronic display device, such as a computer monitor, to allow a user to set preferred weights. The display 300 includes rank criteria 301 a, 301 b, 301 c and 301 d. In FIG. 3, the rank criteria include “similarity to target sequence” 301 a, “importance of reference” 301 b, “popularity of reference” 301 c, and “historical applicability to target sequence” 301 d. However, embodiments of the present disclosure encompass any criteria, including pre-set criteria or criteria generated by a user.
  • [0041]
    FIG. 3 further illustrates sub-ranking icons 302 a to 302 d, which may allow a user to further specify ranking preferences. For example, under the sub-ranking icon 302 b, a user may specify that in determining an importance of a reference, the organization with which an author or creator is affiliated is more important, and receives a higher weight, than a total number of citations to the author. A user may also set minimum standards, such as a minimum number of citations to data or a reference that is required for the data or reference to obtain any ranking.
  • [0042]
    The display 300 may further include fields 303 a to 303 d that are able to be modified by a user to adjust the weighting desired by a user. In one embodiment, the rank calculator 209 utilizes an algorithm to combine a user's selected weight of one or more criteria in combination with information contained within the relevant data or references, or metadata associated with the data or references, to calculate a final ranking of the data, reference, or probe sequence.
  • [0043]
    FIGS. 4A and 4B illustrate displays 400 a and 400 b of a icons 402, 403 and 404 representing data, references or probe sequences annealing to target biological sequence 401 based on a ranking of the data, references or probe sequences. Alternatively, in addition to annealing to the target biological sequence 401, one or more of the icons 402, 403 and 404 may be annealed to other icons among 402, 403 and 404, representing the annealing of a probe sequence to another probe sequence that is annealed to the target biological sequence 401. Referring to FIG. 4A, icons 402, 403 and 404 having different visual characteristics, such as different cross-hatching, different colors, different shapes or different graphic representations may correspond to different types of data or references. For example, an icon 402 having a first type of cross-hatching may correspond to a document, an icon 403 having a second type of cross-hatching may correspond to an application and an icon 404 having a third type of cross-hatching may correspond to biographical information about a person, such as a researcher. Other types of data that may be represented by the icons 402, 403 and 404 include data about an organization, such as a company or university, computer program information, project information about a research project, information about web pages that may contain relevant information, etc.
  • [0044]
    The icons are displayed to be vertically (in FIGS. 4A and 4B) aligned with a segment of the target biological sequence 401 according to the probe sequence associated with the data and reference represented by the icons 402, 403 and 404. In addition, the icons are displayed as being a distance away from a segment of the target biological sequence 401 (in a horizontal direction in FIGS. 4A and 4B) based on a ranking of the data or reference associated with the icons. For example, the icons labeled 402 and 404 may be associated with the same or a very similar segment of the biological sequence, as indicated by the close vertical alignment of icons 402 and 404. However, icon 404 may be associated with data having a higher ranking than icon 402, as illustrated by icon 404 being located closer to the target biological sequence than icon 402 in the horizontal direction.
  • [0045]
    In one embodiment, a user may retrieve the data represented by the icons 402, 403 and 404, or may be provided with information regarding where the data is located, by selecting the icons 402, 403 or 404 with a cursor, touch, or any other user interface. In one embodiment, different ranking characteristics may change an appearance of the icons 402, 403 and 404. For example, an icon representing data that is referenced often may have a larger shape than data that is referenced seldom. An icon representing data that is available by clicking an icon may have a different outline than data that is not. An icon representing a person may have an image of the person. An icon representing a product, such as an analysis tool or program, may have an icon or image associated with the tool or program, such as a trademark.
  • [0046]
    In one embodiment, if a segment or adjacent segments of the target biological sequence 401 include a relatively large number of icons, the display 400 a may generate a blob 405. When a user moves a cursor over the blob 405 or performs any other action for selecting the blob 405, the individual icons may be shown and selected.
  • [0047]
    FIG. 4B illustrates a display 400 b of the same target biological sequence 401 as in FIG. 4A, but the ranking preferences are different corresponding, for example, to preferences selected by a different user. Accordingly, the icons 402, 403 and 404 may be arranged differently and may have different numbers than in FIG. 4A. In embodiments of the present disclosure, a user may modify preferences of the information that the user considers important to personalize information displayed to the user related to a target biological sequence 401.
  • [0048]
    FIG. 5 illustrates a display 500 according to another embodiment of the present disclosure. In FIG. 5, the target biological sequence 501 is displayed by letters representing, for example, nucleotides, and corresponding probe sequences 502 are represented by complementary letters, or nucleotides that may bond with the nucleotides of the target biological sequence 501. Icons 503, 504, 505 and 506 represent different types of data, such as publications, analysis tools, biographical information, and web page information. The icons 503, 504, 505 and 506 may be located in a horizontal direction (in FIG. 5) based on a segment of the target biological sequence 501 to which the data represented by the icons 503, 504, 505 and 506 is most closely related. The icons 503, 504, 505 and 506 may be separated from the target biological sequence 501 by a distance determined by the ranking of the data or reference associated with the icon 503, 504, 505 and 506.
  • [0049]
    As illustrated in FIG. 5, the icons 503 to 506 may contain information related to the data that the icon represents. For example, the icons may contain numbers to indicate a number of citations to the data by particular sources. While examples of displays have been provided for purposes of description, embodiments encompass any type of display in which a user may see an importance of data relative to a target biological sequence based on a distance of an icon representing the data from the target biological sequence. In one embodiment, a probe sequence may be further bonded to one or more additional probe sequences. For example, a user may move a cursor over an icon or select the icon, and one or more additional linked icons may appear, corresponding to additional data that is related to the data represented by the selected icon. In one embodiment, the probe sequence may be treated as a target biological sequence, and data may be analyzed and ranked with reference to the probe sequence in the same manner as for the original biological sequence.
  • [0050]
    FIG. 6 illustrates an example of a table 600 according to an embodiment of the present disclosure. The table 600 may correspond to the reference table 205 of FIG. 2, for example. The table 600 associates a reference, such as a URL, URN, or other address, link or locator, with relevant data and a probe sequence. Examples of relevant data have been discussed previously, and in FIG. 6 the probe sequence corresponds to a biological sequence that is complementary to a segment of a target biological sequence. The table 600 may further include icon information for displaying an icon representing the data or reference, or any other information to be associated with the relevant data and reference. While FIGS. 2 and 6 illustrate tables to associate data, references to the data and probe sequences, embodiments of the present disclosure encompass any data structures for associating data, such as arrays, pointers, or any other types of data structures with which a person or system could associate data with references to the data and with probe sequences.
  • [0051]
    FIG. 7 illustrates a flow diagram of a method according to an embodiment of the present disclosure. In block 701 a reference, such as an address or pointer to data may be found by searching memory in a computer, searching storage devices, searching devices connected to a host device connected to a network, such as the Internet, etc. In addition, references to data found in one or more devices may be generated when no previous reference is found, or when a particular type or format of reference is desired.
  • [0052]
    In block 702, data is associated with the reference. For example, an entry may be formed in a table or another data structure may be formed to associated with reference with the data to which the reference points. In block 703, it may be determined whether the data is relevant. In other words, a threshold determination of relevance of the data may be made. The threshold level of relevance may be based on predetermined criteria, such as a similarity of the data to a target biological sequence, a source of the data, such as an organization supplying the data (e.g. university, company, etc.), an author of the data, a publisher of the data, and a type of operation performed by execution of the data (such as in the case of an analysis tool for analyzing biological sequences). The threshold level of relevance may also be based on a frequency with which the data is accessed or referenced, a frequency with which the data is accessed or referenced by predetermined classes, such as researchers, scientists, professional organizations, etc., or a frequency with which the data is associated with a target biological sequence. In other words, the threshold level of relevance may be related to a target sequence or may include criteria unrelated to the target sequence. The threshold level of relevance may be based on the content of the data, such as an identity of a person or organization based on stored biographical information, or content of a document or file. In addition, the threshold level relevance may be based on usage of the data, such as how often the data is accessed or referenced or by whom the data is accessed or referenced.
  • [0053]
    If it is determined in block 703 that the data does not meet the threshold level of relevance, the process with respect to that data ends in block 704. On the other hand, if the data is determined to be sufficiently relevant, the data and reference may be associated with a probe sequence in block 705. The probe sequence may identify at least a portion of a target biological sequence to which the data pertains. In one embodiment, the probe sequence is represented by a complementary sequence to a corresponding segment of the target biological sequence. For example, if the biological sequence is a series of nucleotides, the probe sequence may be a complementary series of nucleotides. In another embodiment, the probe sequence is merely data identifying the portion of the target biological sequence to which the data most closely pertains. In embodiments of the present disclosure, the data may pertain to one segment of the target biological sequence or to more than one segment.
  • [0054]
    In block 706, the data is ranked based on predetermined criteria to determine an affinity or bond between the data, reference, or probe sequence and a segment of the target biological sequence. The ranking may be based on one or more criteria, and the criteria may be weighted so that different criteria affect the ranking more than other criteria. Examples of ranking criteria include a similarity of the data to the target biological sequence, a relevance of the content of the data to the target biological sequence, an importance or prestige of the data or reference, a popularity of the data or reference, and a historical applicability of the data or reference to target biological sequences.
  • [0055]
    In one embodiment, a user may add criteria, remove criteria, and adjust a weight of criteria used to rank relevant data, references to the data and probe sequences associated with the data. In embodiments of the present disclosure, different users may generate different profiles or may otherwise indicate different preferences for ranking information related to a target biological sequence. In block 707, the relevant data may be bound to the target biological sequence, or segments of the target biological sequence, based on the ranking. In particular, the target biological sequence may be displayed and icons may be displayed with the target biological sequence representing the data, references and probe sequences.
  • [0056]
    A graphical display may display an icon or other representation of the data, reference, or probe and of the target biological sequence, or one or more target segments of the biological sequence. The ranking of the data, references or probe sequences may be displayed by displaying icons associated with data having a higher ranking as being located closer to a corresponding segment of the target biological sequence, while icons representing data having a lower ranking are located farther from the segment of the target biological sequence.
  • [0057]
    FIG. 8 illustrates a block diagram of a computer system 800 according to another embodiment of the present disclosure. The computer 800 may correspond to the host computer 110 of FIG. 1, for example. The methods described herein can be implemented in hardware, software (e.g., firmware), or a combination thereof. In an exemplary embodiment, the methods described herein are implemented in hardware as part of the microprocessor of a special or general-purpose digital computer, such as a personal computer, workstation, minicomputer, or mainframe computer. The system 800 therefore may include general-purpose computer or mainframe 801 capable testing a reliability of a base program by gradually increasing a workload of the base program over time.
  • [0058]
    In an exemplary embodiment, in terms of hardware architecture, as shown in FIG. 8, the computer 801 includes one or more processors 805, memory 810 coupled to a memory controller 815, and one or more input and/or output (I/O) devices 840, 845 (or peripherals) that are communicatively coupled via a local input/output controller 835. The input/output controller 835 can be, for example, one or more buses or other wired or wireless connections, as is known in the art. The input/output controller 835 may have additional elements, which are omitted for simplicity in description, such as controllers, buffers (caches), drivers, repeaters, and receivers, to enable communications. Further, the local interface may include address, control, and/or data connections to enable appropriate communications among the aforementioned components. The input/output controller 835 may include a plurality of sub-channels configured to access the output devices 840 and 845. The sub-channels may include, for example, fiber-optic communications ports.
  • [0059]
    The processor 805 is a hardware device for executing software, particularly that stored in storage 820, such as cache storage, or memory 810. The processor 805 can be any custom made or commercially available processor, a central processing unit (CPU), an auxiliary processor among several processors associated with the computer 801, a semiconductor based microprocessor (in the form of a microchip or chip set), a macroprocessor, or generally any device for executing instructions.
  • [0060]
    The memory 810 can include any one or combination of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, etc.)) and nonvolatile memory elements (e.g., ROM, erasable programmable read only memory (EPROM), electronically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), tape, compact disc read only memory (CD-ROM), disk, diskette, cartridge, cassette or the like, etc.). Moreover, the memory 810 may incorporate electronic, magnetic, optical, and/or other types of storage media. Note that the memory 810 can have a distributed architecture, where various components are situated remote from one another, but can be accessed by the processor 805.
  • [0061]
    The instructions in memory 810 may include one or more separate programs, each of which comprises an ordered listing of executable instructions for implementing logical functions. In the example of FIG. 8, the instructions in the memory 810 include a suitable operating system (O/S) 811. The operating system 811 essentially controls the execution of other computer programs and provides scheduling, input-output control, file and data management, memory management, and communication control and related services.
  • [0062]
    In an exemplary embodiment, a conventional keyboard 850 and mouse 855 can be coupled to the input/output controller 835. Other output devices such as the I/O devices 840, 845 may include input devices, for example but not limited to a printer, a scanner, microphone, and the like. Finally, the I/O devices 840, 845 may further include devices that communicate both inputs and outputs, for instance but not limited to, a network interface card (NIC) or modulator/demodulator (for accessing other files, devices, systems, or a network), a radio frequency (RF) or other transceiver, a telephonic interface, a bridge, a router, and the like. The system 800 can further include a display controller 825 coupled to a display 830. In an exemplary embodiment, the system 800 can further include a network interface 860 for coupling to a network 865. The network 865 can be an IP-based network for communication between the computer 801 and any external server, client and the like via a broadband connection. The network 865 transmits and receives data between the computer 801 and external systems. In an exemplary embodiment, network 865 can be a managed IP network administered by a service provider. The network 865 may be implemented in a wireless fashion, e.g., using wireless protocols and technologies, such as WiFi, WiMax, etc. The network 865 can also be a packet-switched network such as a local area network, wide area network, metropolitan area network, Internet network, or other similar type of network environment. The network 865 may be a fixed wireless network, a wireless local area network (LAN), a wireless wide area network (WAN) a personal area network (PAN), a virtual private network (VPN), intranet or other suitable network system and includes equipment for receiving and transmitting signals.
  • [0063]
    When the computer 801 is in operation, the processor 805 is configured to execute instructions stored within the memory 810, to communicate data to and from the memory 810, and to generally control operations of the computer 801 pursuant to the instructions.
  • [0064]
    In an exemplary embodiment, the methods described herein can be implemented with any or a combination of the following technologies, which are each well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
  • [0065]
    In embodiments of the present disclosure, the simulated annealing module 111 may comprise program code stored in the memory 810 and executed by the processor 805. The data and references pointing to the data may be stored in the computer 801 or may be stored on other computers, servers, databases, or other network devices connected to the computer 801 via a network. The simulated annealing module 111 may further include hardware components, such as processors, memory and logic chips or structures for implementing the simulated annealing.
  • [0066]
    As described above, embodiments can be embodied in the form of computer-implemented processes and apparatuses for practicing those processes. An embodiment may include a computer program product 900 as depicted in FIG. 9 on a computer readable/usable medium 902 with computer program code logic 904 containing instructions embodied in tangible media as an article of manufacture. Exemplary articles of manufacture for computer readable/usable medium 902 may include floppy diskettes, CD-ROMs, hard drives, universal serial bus (USB) flash drives, or any other computer-readable storage medium, wherein, when the computer program code logic 904 is loaded into and executed by a computer, the computer becomes an apparatus for practicing the embodiments. Embodiments include computer program code logic 904, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code logic 904 is loaded into and executed by a computer, the computer becomes an apparatus for practicing the embodiments. When implemented on a general-purpose microprocessor, the computer program code logic 904 segments configure the microprocessor to create specific logic circuits.
  • [0067]
    As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • [0068]
    Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • [0069]
    Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • [0070]
    Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • [0071]
    Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the present disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • [0072]
    These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • [0073]
    The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • [0074]
    The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • [0075]
    The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention to the particular embodiments described. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one more other features, integers, steps, operations, element components, and/or groups thereof.
  • [0076]
    The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the embodiments of the present disclosure.
  • [0077]
    While embodiments of the present disclosure have been described above, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow.

Claims (25)

  1. 1. A computer assembly for associating data with a target biological sequence, comprising:
    a processor configured to access data on a network and to perform a method, the method comprising:
    identifying, in the network, one or more references having a relevance level greater than a predetermined threshold, said references being at least one of a pointer and an address indicating a location of data or providing information regarding the data;
    associating each reference of the one or more references to one or more probe sequences corresponding to one or more biological sequences;
    ranking the one or more probe sequences based on one or more criteria corresponding to a target biological sequence; and
    assigning the one or more probe sequences with a level of affinity to one or more segments of the target biological sequence based at least on the ranking of each of the one or more probe sequences.
  2. 2. The computer assembly of claim 1, wherein the references are uniform resource locators (URLs).
  3. 3. The computer assembly of claim 1, wherein associating the one or more probe sequences to the one or more references having a relevance level greater than a predetermined threshold includes analyzing the one or more references to detect the presence of one or more of key words, phrases, symbols and sources of the one or more references.
  4. 4. The computer assembly of claim 1, wherein associating each reference of the one or more references to one or more probe sequences includes associating each reference with a biological sequence that is complementary to a biological sequence referenced by the reference.
  5. 5. The computer assembly of claim 1, wherein ranking the one or more probe sequences comprises determining a similarity between the one or more probe sequences and the one or more segments of the target biological sequence, and
    ranking the one or more probe sequences further comprises at least one of determining an importance of a source of each reference of each probe sequence, determining a popularity of each reference of each probe sequence, and determining a historical applicability of each reference of each probe sequence to the one or more segments of the target biological sequence.
  6. 6. The computer assembly of claim 5, wherein determining the similarity between the one or more probe sequences and the one or more segments of the target biological sequence includes determining a match between the one or more probe sequences and a complement of the one or more segments of the target biological sequence.
  7. 7. The computer assembly of claim 6, wherein the method further comprises associating the one or more probe sequences with at least one of a document, an analysis tool, and biographical information of a person.
  8. 8. The computer assembly of claim 7, wherein determining an importance of a source of each reference includes at least one of determining a number of citations of an author of the document, determining an organization to which an author of the document belongs, and determining a type of analysis performed by the analysis tool,
    determining a popularity of each reference includes at least one of determining a number of citations of the document and a frequency of use of the analysis tool, and
    determining a historical applicability of each reference includes at least one of determining a frequency with which the document has been cited in association with the segment of the target biological sequence and determining a frequency with which the analysis tool has been used to analyze the segment.
  9. 9. The computer assembly of claim 1, wherein the one or more probe sequences includes at least two probe sequences corresponding to a same segment of the target biological sequence, and
    assigning the at least two probe sequences a level of affinity to the segment of the target biological sequence includes competitively comparing the at least two probe sequences such that a reference having a higher ranking is assigned a higher level of affinity than a reference having a lower ranking.
  10. 10. The computer assembly of claim 1, further comprising a display,
    wherein the method further comprises displaying a graphical representation of the segment of the target biological sequence on the display, and displaying a graphical representation of the level of affinity of each probe sequence to the segment of the target biological sequence on the display by adjusting a physical distance of a graphical representation of the probe sequence from the graphical representation of the segment based on the level of affinity of the probe sequence.
  11. 11. A system for simulating annealing to a biological sequence, comprising:
    one or more network computers having stored therein data; and
    a host computer having stored therein a biological sequence, the host computer connected to the one or more network computers via a communications network, the host computer configured to identify data in the one or more network computers as relevant data that is relevant to the biological sequence, to perform one of identifying references to the data in the one or more network computers and generating references to the data in the one or more network computer, said references being at least one of a pointer and an address indicating a location of the data associate the relevant data with a segment of the biological sequence, and rank the relevant data based on predetermined criteria applied to functions of the associated segment of the biological sequence to determine a level of affinity of the relevant data with the segment of the biological sequence.
  12. 12. The system of claim 11, wherein the host computer is configured to search the network for uniform resource locators (URLs) pointing to the data stored in the one or more network computers, to associate the URLs with the data.
  13. 13. The system of claim 11, wherein the host computer is configured to associate the relevant data with one or more probe sequences, the one or more probe sequences corresponding to one or more respective segments of the biological sequence.
  14. 14. The system of claim 13, wherein the host computer is configured to competitively rank the one or more probe sequences corresponding to a same segment of the biological sequence, such that a probe sequence having a higher ranking has a higher level of affinity to the segment of the biological sequence than a probe sequence having a lower ranking.
  15. 15. The system of claim 13, wherein the host computer is configured to rank the one or more probe sequences based on a correspondence between the one or more probe sequences and the segment of the biological sequence, and
    the host computer is configured to further rank the one or more probe sequences based on at least one of an importance of a source of data associated with the one or more probe sequences, a popularity of the data associated with the one or more probe sequences, and a historical applicability of the data associated with the one or more probe sequences to the segment of the biological sequence.
  16. 16. The system of claim 15, wherein the data includes an analysis tool relevant to the segment of the biological sequence,
    the importance of the source of data associated with the one or more probe sequences is based on at least one of a source of the analysis tool and a type of analysis performed by the analysis tool,
    a popularity of data associated with the one or more probe sequences is based on a frequency of use of the analysis tool, and
    a historical applicability of data associated with the one or more probe sequences is based on a frequency with which an analysis tool has been used to analyze the segment of the biological sequence.
  17. 17. The system of claim 15, wherein the data includes a document relevant to the segment of the biological sequence,
    the importance of the source of data associated with the one or more probe sequences is based on at least one of a number of citations of an author of the document and an organization with which the author of the document is associated,
    a popularity of data associated with the one or more probe sequences is based on at least one of a number of citations to the document, and
    a historical applicability of data associated with the one or more probe sequences is based on a frequency with which the document has been cited in association with the segment of the biological sequence.
  18. 18. The system of claim 11, further comprising a display,
    wherein the host computer is configured to display a graphical representation of the segment of the biological sequence on the display and a graphical representation of the level of affinity of relevant data to the segment of the biological sequence on the display by adjusting a physical distance of a graphical representation of the relevant data from the graphical representation of the segment based on the level of affinity of the reference.
  19. 19. The system of claim 11, wherein the host computer is further configured to associate the relevant data with a first probe sequence, simulate annealing of the first probe sequence with the segment of the biological sequence based on the determined level of affinity of the relevant data with the segment of the biological sequence, and simulate annealing of a second probe sequence with the first probe sequence based on a determined level of affinity of the second probe sequence with the first probe sequence.
  20. 20. A computer program product for simulating annealing to a biological sequence, comprising:
    a processor; and
    a non-transitory computer readable medium having stored thereon code to perform a method, comprising:
    identifying, by the processor, references to data in a network as relevant references that are relevant to a biological sequence, said references being at least one of a pointer and an address indicating a location of data or providing information regarding the data;
    associating, by the processor, the relevant references with a segment of the biological sequence; and
    ranking, by the processor, the relevant references based on predetermined criteria to determine a level of affinity of the relevant references with the segment of the biological sequence.
  21. 21. The computer program product of claim 20, wherein the method comprises the relevant references with one or more probe sequences, the one or more probe sequences corresponding to one or more respective segments of the biological sequence; and
    the host computer is configured to competitively rank the one or more probe sequences corresponding to a same segment of the biological sequence, such that a probe sequence having a higher ranking has a higher level of affinity to the segment of the biological sequence than a probe sequence having a lower ranking.
  22. 22. The computer program product of claim 20, wherein ranking the relevant references includes ranking the one or more probe sequences based on a correspondence between the one or more probe sequences and the segment of the biological sequence, and
    ranking the one or more probe sequences further includes ranking the one or more probe sequences based on at least one of an importance of a source of data associated with the one or more probe sequences, a popularity of the data associated with the one or more probe sequences, and a historical applicability of the data associated with the one or more probe sequences to the segment of the biological sequence.
  23. 23. The computer program product of claim 20 wherein the data includes at least one of a document, an analysis tool, and biographical information of a person,
    determining an importance of a source of each reference includes at least one of determining a number of citations of an author of the document, determining an organization to which an author of the document belongs, and determining a type of analysis performed by the analysis tool,
    determining a popularity of each reference includes at least one of determining a number of citations of the document and a frequency of use of the analysis tool, and
    determining a historical applicability of each reference to the segment includes at least one of determining a frequency with which the document has been cited in association with the segment of the biological sequence and determining a frequency with which the analysis tool has been used to analyze the segment.
  24. 24. The computer program product of claim 20, wherein the references correspond to a same segment of the biological sequence, the method further comprising:
    determining a level of affinity of the relevant references with the segment of the biological sequence includes competitively comparing the relevant references such that a reference having a higher ranking is assigned a higher level of affinity than a reference having a lower ranking.
  25. 25. The computer program product of claim 20, the method further comprising:
    displaying a graphical representation of the segment of the biological sequence, and displaying a graphical representation of the level of affinity of each reference to the segment of the biological sequence on the display by adjusting a physical distance of a graphical representation of the reference from the graphical representation of the segment based on the level of affinity of the reference.
US13628967 2012-09-27 2012-09-27 Association of data to a biological sequence Abandoned US20140089328A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13628967 US20140089328A1 (en) 2012-09-27 2012-09-27 Association of data to a biological sequence

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13628967 US20140089328A1 (en) 2012-09-27 2012-09-27 Association of data to a biological sequence
US13689157 US9311360B2 (en) 2012-09-27 2012-11-29 Association of data to a biological sequence
CN 201310445223 CN103699558B (en) 2012-09-27 2013-09-26 Method and apparatus for data associated with the biological sequences

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13689157 Continuation US9311360B2 (en) 2012-09-27 2012-11-29 Association of data to a biological sequence

Publications (1)

Publication Number Publication Date
US20140089328A1 true true US20140089328A1 (en) 2014-03-27

Family

ID=50339940

Family Applications (2)

Application Number Title Priority Date Filing Date
US13628967 Abandoned US20140089328A1 (en) 2012-09-27 2012-09-27 Association of data to a biological sequence
US13689157 Active 2033-02-26 US9311360B2 (en) 2012-09-27 2012-11-29 Association of data to a biological sequence

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13689157 Active 2033-02-26 US9311360B2 (en) 2012-09-27 2012-11-29 Association of data to a biological sequence

Country Status (2)

Country Link
US (2) US20140089328A1 (en)
CN (1) CN103699558B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9977876B2 (en) 2012-02-24 2018-05-22 Perkinelmer Informatics, Inc. Systems, methods, and apparatus for drawing chemical structures using touch and gestures
US9535583B2 (en) * 2012-12-13 2017-01-03 Perkinelmer Informatics, Inc. Draw-ahead feature for chemical structure drawing applications
US9430127B2 (en) * 2013-05-08 2016-08-30 Cambridgesoft Corporation Systems and methods for providing feedback cues for touch screen interface interaction with chemical and biological structure drawing applications
US9751294B2 (en) 2013-05-09 2017-09-05 Perkinelmer Informatics, Inc. Systems and methods for translating three dimensional graphic molecular models to computer aided design format
US9436353B2 (en) * 2014-03-25 2016-09-06 Toyota Motor Engineering & Manufacturing North America, Inc. Systems and methods for providing a dynamic application menu

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5866363A (en) * 1985-08-28 1999-02-02 Pieczenik; George Method and means for sorting and identifying biological information
US6600996B2 (en) * 1994-10-21 2003-07-29 Affymetrix, Inc. Computer-aided techniques for analyzing biological sequences
GB9914210D0 (en) * 1999-06-17 1999-08-18 Danisco Promoter
CN1390332A (en) * 1999-09-14 2003-01-08 伊拉根生物科学公司 Graphical user interface for display and analysis of biological sequence data
US6941317B1 (en) * 1999-09-14 2005-09-06 Eragen Biosciences, Inc. Graphical user interface for display and analysis of biological sequence data
US20020183936A1 (en) * 2001-01-24 2002-12-05 Affymetrix, Inc. Method, system, and computer software for providing a genomic web portal
US7085766B2 (en) * 2000-03-09 2006-08-01 The Web Access, Inc. Method and apparatus for organizing data by overlaying a searchable database with a directory tree structure
US6741986B2 (en) * 2000-12-08 2004-05-25 Ingenuity Systems, Inc. Method and system for performing information extraction and quality control for a knowledgebase
WO2002045323A3 (en) * 2000-12-01 2003-02-06 Peter Karp Data relationship model
US7133780B2 (en) * 2001-04-19 2006-11-07 Affymetrix, Inc. Computer software for automated annotation of biological sequences
WO2002103030A3 (en) * 2001-06-14 2003-07-03 Rigel Pharmaceuticals Inc Multidimensional biodata integration and relationship inference
WO2003017138A1 (en) * 2001-08-21 2003-02-27 Institute Of Medicinal Molecular Design. Inc. Biological sequence information reading method and storing method
US7371580B2 (en) * 2001-08-24 2008-05-13 Agilent Technologies, Inc. Use of unstructured nucleic acids in assaying nucleic acid molecules
US7650343B2 (en) * 2001-10-04 2010-01-19 Deutsches Krebsforschungszentrum Stiftung Des Offentlichen Rechts Data warehousing, annotation and statistical analysis system
US6836733B1 (en) * 2002-01-22 2004-12-28 Vizx Labs, Llc Biological sequence pattern probe
US20040030504A1 (en) * 2002-04-26 2004-02-12 Affymetrix, Inc. A Corporation Organized Under The Laws Of Delaware System, method, and computer program product for the representation of biological sequence data
US20040012633A1 (en) * 2002-04-26 2004-01-22 Affymetrix, Inc., A Corporation Organized Under The Laws Of Delaware System, method, and computer program product for dynamic display, and analysis of biological sequence data
JP2004178315A (en) * 2002-11-27 2004-06-24 Hitachi Software Eng Co Ltd Data distribution method, data search method, and data search system
WO2005003308A3 (en) * 2003-06-25 2006-08-31 Pankaj Agarwal Biological data set comparison method
CA2535400A1 (en) * 2003-08-12 2005-02-24 Cognia Corporation An advanced databasing system for chemical, molecular and cellular biology
US7623997B2 (en) * 2004-07-02 2009-11-24 The United States Of America As Represented By The Secretary Of The Navy Computer-implemented biological sequence identifier system and method
US20060210967A1 (en) * 2004-07-02 2006-09-21 Agan Brian K Re-sequencing pathogen microarray
US7424371B2 (en) * 2004-12-21 2008-09-09 Helicos Biosciences Corporation Nucleic acid analysis
EP2100246A4 (en) * 2006-11-17 2010-01-20 Motif Biosciences Inc Biometric analysis of populations defined by homozygous marker track length
WO2008138087A3 (en) * 2007-05-15 2010-06-10 Fundação De Amparo À Pesquisa Do Estado De São Paulo-Fapesp Ternary matrix adapter and retrieval method for molecular biological information
US7809765B2 (en) * 2007-08-24 2010-10-05 General Electric Company Sequence identification and analysis
WO2011109864A2 (en) * 2010-03-08 2011-09-15 National Ict Australia Limited Performance evaluation of a classifier
WO2011139797A3 (en) * 2010-04-27 2012-01-26 Bruestle, Jeremy Method and system for analysis and error correction of biological sequences and inference of relationship for multiple samples
US8412462B1 (en) * 2010-06-25 2013-04-02 Annai Systems, Inc. Methods and systems for processing genomic data
JP5825790B2 (en) * 2011-01-11 2015-12-02 日本ソフトウェアマネジメント株式会社 Nucleic Acids information processing apparatus and processing method
WO2012122546A3 (en) * 2011-03-09 2013-01-03 Lawrence Ganeshalingam Biological data networks and methods therefor
US20130218733A1 (en) * 2011-05-05 2013-08-22 Carlo RAGO Method and system for data management and monetization
CN102637244B (en) * 2011-12-31 2016-04-20 苏州金唯智生物科技有限公司 Biological Sequence Analysis platform and its use
US9920361B2 (en) * 2012-05-21 2018-03-20 Sequenom, Inc. Methods and compositions for analyzing nucleic acid

Also Published As

Publication number Publication date Type
US9311360B2 (en) 2016-04-12 grant
CN103699558B (en) 2017-04-05 grant
US20140089329A1 (en) 2014-03-27 application
CN103699558A (en) 2014-04-02 application

Similar Documents

Publication Publication Date Title
Bauer et al. Ontologizer 2.0—a multifunctional tool for GO term enrichment analysis and data exploration
Whittemore Combining evidence in nursing research: methods and implications
Culman et al. T-REX: software for the processing and analysis of T-RFLP data
Gajria et al. ToxoDB: an integrated Toxoplasma gondii database resource
Levine CrimeStat III: a spatial statistics program for the analysis of crime incident locations (version 3.0)
Clamp et al. Ensembl 2002: accommodating comparative genomics
Rustici et al. ArrayExpress update—trends in database growth and links to data analysis tools
Suthram et al. eQED: an efficient method for interpreting eQTL associations using protein networks
Chen Using RepeatMasker to identify repetitive elements in genomic sequences
Lee et al. Web Apollo: a web-based genomic annotation editing platform
Adzhubei et al. A method and server for predicting damaging missense mutations
US8498984B1 (en) Categorization of search results
US20060020398A1 (en) Integration of gene expression data and non-gene data
US20090006442A1 (en) Enhanced browsing experience in social bookmarking based on self tags
Cao et al. Facetatlas: Multifaceted visualization for rich text corpora
US20130246404A1 (en) Display of Dynamic Interference Graph Results
Downey et al. Models of Searching and Browsing: Languages, Studies, and Application.
Kim et al. PubChem substance and compound databases
Pafilis et al. Reflect: augmented browsing for the life scientist
Raney et al. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser
Shen et al. BarleyBase—an expression profiling database for plant genomics
Howe et al. WormBase 2016: expanding to enable helminth genomic research
Su et al. GLay: community structure analysis of biological networks
Yang et al. The I-TASSER Suite: protein structure and function prediction
Haas et al. The Protein Model Portal—a comprehensive resource for protein structure and model information

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KOZLOSKI, JAMES R.;PICKOVER, CLIFFORD A.;WUBBEN, JACINTAM.;AND OTHERS;SIGNING DATES FROM 20120917 TO 20120926;REEL/FRAME:029039/0396