WO2002017190A9 - Method and system for sharing biological information - Google Patents

Method and system for sharing biological information

Info

Publication number
WO2002017190A9
WO2002017190A9 PCT/US2001/025956 US0125956W WO0217190A9 WO 2002017190 A9 WO2002017190 A9 WO 2002017190A9 US 0125956 W US0125956 W US 0125956W WO 0217190 A9 WO0217190 A9 WO 0217190A9
Authority
WO
WIPO (PCT)
Prior art keywords
biological information
stored
computer
network
sharing
Prior art date
Application number
PCT/US2001/025956
Other languages
French (fr)
Other versions
WO2002017190A1 (en
Inventor
Chester Hedgepath Iii
David Sullivan Shin
Jay Raman Venkatesan
Original Assignee
Varro Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Varro Technologies Inc filed Critical Varro Technologies Inc
Priority to AU2001290545A priority Critical patent/AU2001290545A1/en
Publication of WO2002017190A1 publication Critical patent/WO2002017190A1/en
Publication of WO2002017190A9 publication Critical patent/WO2002017190A9/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/104Grouping of entities

Definitions

  • the present invention pertains to the sharing of information related to biological research, and more particularly, to the sharing of genetic sequence information accumulated on individual computers through client computer to client computer file transfers.
  • DNA is a molecule that carries genetic information, encoded in a linear sequence of nucleotide bases. There are four such bases, generally abbreviated A, C, T and G, so that DNA is generally represented as a long sequence drawn on a four letter alphabet. Acting in concert with the environment, these DNA molecules (called the genome or genotype) determine the structure, function and behavior (called the phenotype) of all living things.
  • Public and private databases are consistent in that each database includes only a limited amount of biological information, necessitated by the limited storage capacity of the database operator, the requirements for submitting information for the database, and proprietary interests in maintaining a database for information owned by the database host.
  • the host database includes a description of material available on individual member's hard drives, and the address of the member (which can be a floating IP address), such that other members may search for specific music by querying the host database for storage locations of music meeting parameters identified by the searching member.
  • peer-to-peer sharing software allows an individual member to generate a query which is passed to the computer of each member in the peer-to-peer network, where the query is compared to material shared on the member computers.
  • the use of sharing technology has several disadvantages.
  • the sharing technologies create security risks for both the requesting party and the sharing party in a transaction.
  • the requesting member is able to place data on the sharing member's hard drive during the transfer, resulting in the potential for computer viruses to be distributed through the sharing network. Such viruses can also be transferred from the sharing member's computer to the requesting member's computer.
  • the existing sharing technology is also limited in that a requesting member does not receive any quantification or qualification of the accuracy of the information shared.
  • major biological info ⁇ nation databases screen submissions for accuracy, information shared by members in a peer-to-peer sharing network, either with or without a centralized database, becomes available for sharing without undergoing peer review. This is both a benefit and a disadvantage, in that the reduced review requirement leads to less effort being required to share the data, while at the same time allowing inaccurate data to become published.
  • the present invention is a system and method for sharing biological information over a network.
  • the system includes members who are communicably connected over the network to a network host. At least one of the members has biological information stored on a computer that they are willing to share over the network.
  • the network host has a database, which contains criteria identifying the biological information stored on member's computers. Associated with the criteria identifying the biological information stored on member's computers are references which identify where the information is stored.
  • the network host provides a search tool for the members, so that the members may search the biological information shared by other members to find biological info ⁇ nation of interest. Once a searching member has identified biological information of interest, a copy of the information may be transfe ⁇ ed from the computer on which the information is stored onto the computer of the searching member.
  • the members are provided with software for installation on their computers.
  • the software assists in formatting biological information which individual members are willing to share, as well as reports to the network host what biological information has been shared by each member.
  • the software further comprises features for enhancing the security of individual members associated with the network.
  • the software in one embodiment includes a virus scanner which determines whether biological information made available for sharing is infected with a computer virus, and if infected reports the infection to the network host.
  • the software encrypts the biological information being transfe ⁇ ed so that it may not be intercepted during transmission.
  • digital rights management technologies are used to limit the use of the biological information by the searching member who receives the information.
  • the invention is also embodied in a method for exchanging biological information.
  • the method is based on providing at least one host computer, and subscribing several members to form a biological information exchange network.
  • Each member has a network device capable of storing biological information, and communicating with the host computer.
  • a database is provided on the host computer. The database stores information describing biological information made available for sharing on the individual member's network devices and information regarding where shared biologic material is stored.
  • the host computer receives search parameters from a member seeking biological information, and searches the criteria provided by the subscribing members describing biological information which is available for sharing. In one embodiment, the host computer also determines whether biological information which meets the search criteria is stored on a member's network device that is communicably connected to the host computer, and generates a list of biological information available for sharing for the searching member, wherein the list contains only references to biological information on network devices that are communicably connected to the host computer. In this embodiment, only member-to-member transfers of copies of biological information are enabled. In another embodiment, the one host computer generates a list of all biological information which is shared, and allows delayed transfer of a copy of the biological information when both searching and sharing members are communicably connected to the network host. Alternately, the host computer receives a copy of the biological information when the sharing member is communicably connected to the host computer, and transfers the copy to the searching member when the searching member is communicably connected with the host computer.
  • the present invention is also embodied in a business which provides a biological information exchange.
  • the biological information exchange charges a fee for membership in the biological information exchange.
  • the fee may be a flat fee based on membership in the exchange network, or based on the number of files transfe ⁇ ed by an individual searching member.
  • Figure 1 shows a system diagram of a system embodying the present invention.
  • Figures 2 A-2B show a database structure as used with the present embodiment
  • FIGS 2C-2D show an alternative database structure consistent with the present invention.
  • Figure 3 shows an illustrative embodiment of a client side software member interface, showing a search entry display.
  • Figure 4 shows an illustrative embodiment of a client side software member interface showing a file transfer display.
  • Figure 5 shows a process flowchart for providing a genetic information sharing system.
  • FIG. 1 a diagram of a genetic Biological Information sharing system 100 according to the present invention, implemented to show genetic sequence information being exchanged.
  • the genetic sequence sharing server 102 (hereafter "GSSS") is connected to a network 104 such as the Internet or a local intranet via a communications connection.
  • the GSSS 102 of the embodiment illustrated is connected to the Internet.
  • Such connections are commonly known, and may include dial-up connections, cable or DSL interfaces, or a dedicated interface such as a Tl connection.
  • the GSSS 102 acts as a repository or database of references to members 106 who have indicated a willingness to share genetic information, and also hosts server side software 108 (hereafter "SSS”) which controls upkeep and access to the database 110.
  • SSLS server side software
  • the GSSS 102 is generally a computer which includes memory 112, a processor
  • the network is the Internet, allowing researchers to become members of the system without requiring significant investment in communications infrastructure.
  • the GSSS 102 is described in the illustrated embodiment as a single server, a cluster of servers 102a, 102b, ... 102n may be used to provide adequate system performance if the number of members 106 so requires. In such a configuration, each server is a copy of each other, with one server being designated as the master server.
  • the memory of the GSSS 102 stores sharing system software 108, described below.
  • a database 110 of information regarding available genetic sequences is also connected to the GSSS 102, either resident in memory or housed on a separate machine.
  • the database 110 may use a relational database for storing information related to individual shared genetic sequences, as described below.
  • the database 110 may be a central database, with multiple sharing servers 102a, 102b... 102n able to access the database 110, or it may include mi ⁇ ored copies of a database, with a copy stored on each server 102a, 102b ....102n.
  • Client computers 118 (hereafter "CC's") associated with members 106 who desire to share or search for shared genetic sequence information are also connected to the CC's.
  • the Internet connection allows the client computers to communicate with the GSSS 102.
  • the client computers 118 may be associated with an individual researcher, or with a research establishment 124.
  • each client computer 118 includes client side software 122 (hereafter "CSS").
  • Client computers 118 used for sharing genetic sequences also may include memory 124 in which genetic sequence information to be shared is stored.
  • a library 126 of such material may be created in the memory 124 to provide a boundary between shared and non-shared files on a member's computer. Such a boundary may be created by using a sub-directory or folder, or may involve the creation of a separate logical drive dedicated to shared information.
  • the CSS 122 allows each CC 118 to either search for or share genetic information, however different versions of the CSS 122 may be made to be limited to searching only, sharing only, or searching and sharing capable.
  • Each version of CSS 122 which is capable of sharing genetic information includes a formatting tool for converting genetic information on a member's computer into a standardized format, ensuring that other members will be able to access the data.
  • the server side database 110 includes catalog entries 202 for searching for shared biological information.
  • Each unit of biological information shared by a member may be represented by a catalog entry 202a, 202b, ... 204n in the database on the GSSS.
  • the catalog records may also include a flag indicating whether or not a sharing member is presently available on the network.
  • the entries in the database are associated with member information records 204a, 204b, ... 204n.
  • the biological information 206 stored in a member's library may include a tag name 208 for the info ⁇ nation, the biological information itself 210, a chromatograph file 212, research notes 214, journal articles 216 associated with the biological information, and other associated information 218 associated with the shared biological information.
  • the information may include multiple files associated with the information, or may utilize a information file which includes one or more of the above associated material.
  • the catalog entries 202a, 202b, ... 204n stored in the GSSS database may include the tag name 206 selected by the sharing member to describe the biological information, a reference ID number 220, an ID field for identifying the sharing member 222, an accession number 224, the tissue type 226 from which the information was generated, the organism type 228 from which the biological information was derived, and a sequence 230 associated with the biological information.
  • the GSSS database 110 may also contain information associated with the sharing member.
  • This information may include the member ID 222, a reference ID for the member 232, information concerning a member's affiliation 234 with a research organization, reputational information 236 associated with the sharing member, the member's transfer record 238, subscription information 240 associated with the member, and the member's Internet address 242 when the Internet is used as the network communicably connecting the member with the GSSS.
  • Member records may also be established for member's who do not share any information.
  • the member records are preferably separate records which are linked to catalog entries, allowing a single member record to be associated with multiple catalog entries, reducing the amount of memory required to store the member's records.
  • elements of the catalog info ⁇ nation may be stored in individual member's libraries where associated shared biological information is stored, further reducing the amount of memory required for the GSSS.
  • the database 110 of the illustrated embodiment stores cataloging information of genetic sequences as the genetic sequences are shared.
  • the database 110 also maintains a flag 242 associated with shared genetic sequences which indicates whether the shared sequence is presently available for file transfer.
  • the determining factor as to whether a sequence is presently available for file transfer is the existence of an open connection between the GSSS 102 and the CC 118 on which the sequence is stored.
  • the GSSS 102 server updates the database 110 to show that genetic sequences shared by that member 106 are available for transfer.
  • the GSSS 102 updates the database 110 to show that the genetic sequences shared by that member 106 are not presently available.
  • the SSS 108 has several functions. As shown in Figure 1, to accomplish these functions, the software includes several modules, with each module responsible for a given function.
  • a sharing coordinator module 128 interacts with the individual modules to direct the individual modules to accomplish a required task.
  • the individual modules include a search module 130, a module for receiving new catalog information 132 on newly shared files, indexing 134 the database to reflect file availability, subscribing new members 136, distributing copies of CSS 138, and member authentication modules 140.
  • Other modules, as described below, may be implemented as required.
  • a principal function of the SSS 112 is to control a repository for references to shared genetic information stored on individual CC's.
  • the GSSS 102 is notified by a transmission from the CC 118 to the GSSS 102 of the catalog information for the newly shared files.
  • the GSSS 102 also maintains a module for administering member subscriptions 136, such as described below.
  • the administration of subscriptions allows the GSSS 102 to maintain access control over the searching capabilities, and over requested transfers of genetic sequence information.
  • the GSSS 102 may also include an authentication module 140, which allows the GSSS 102 to confirm the validity of a requested CC-to-CC transfer of genetic sequence information.
  • the implementation of an authentication cycle also increases security regarding the accessing of genetic sequence data.
  • the client side software (CSS) 122 is responsible for allowing a member to interface with the GSSS 102 through a CC 118, and to facilitate the transfer of information to and from the member's CC 118. To accomplish this, the CSS 122 also includes an interface generating module 146.
  • Figure 3 shows a member graphical user interface display 300 for illustrative purposes.
  • the interface has four primary function groups which a member may access through the illustrated interface. These are a chat 302 function, a set of library functions 304, a search function 306, and a data transfer function 308. Each function may be invoked by clicking on the tab representing each function.
  • the chat function 302 allows a member to enter into a conversation with another member over the Internet.
  • the presently illustrated embodiment uses a typed chat function, wherein the members are able to enter messages using the keyboard of their associated client computer 118.
  • the chat function may also be implemented by Internet telephony, or Internet videoconferencing.
  • the chat function allows researchers to co ⁇ espond regarding shared genetic sequence information. For example, a researcher who has identified what he or she believes to be a relevant genetic sequence may have questions regarding how the sequence was derived. The researcher may contact the member who has shared the data to inquire how the information was derived. Since the database 110 already has implemented a signal regarding the availability of the sharing member, the availability flag may also be used to only present a sharing member's contact address when the sharing member is connected to the GSSS 102.
  • the library function 304 allows a member to set up a library 126 for storing genetic sequences which the member is willing to share, assists the member in creating formatted copies of the genetic sequences which are to be shared, assists the member in providing cataloging information regarding genetic sequences to be shared, and allows the member to remove a shared genetic sequence from the library, thus removing its availability for sharing.
  • the searching function 306 allows a member to define parameters for a genetic sequence which the member is searching for.
  • the search may be for a specific sequence string, and may also may be further refined based on other search criteria.
  • Search criteria illustrated in Figure 2A include gene name 312, member code 314, accession number, organism, and tissue type.
  • the member code may be used to search for shared genetic sequences shared by a specific researcher.
  • Other search parameters may include, but are not limited to, dates when a genetic sequence was initially shared and researcher type information, such as whether the source of the shared file is a private or public organization, and whether the source of the genetic sequence places additional conditions on the use of the genetic sequence.
  • the presently prefe ⁇ ed search method compares a sought after sequence to sequences associated with shared biological information.
  • a searching member may specify a tolerance for the search, where the tolerance is based on the number of positions in a sequence which differ from the sought after sequence. The tolerance may be described in a percentage of positions which match the search criteria.
  • Searches may be performed using the GSSS database only, the GSSS database in conjunction with direct review of shared material, or direct review of shared material only.
  • the GSSS reviews a member's search parameters against the catalog information stored in the database, and returns hits to the member based on the limited information stored in the database. This may result in less than optimal search results, as additional characteristics of the shared information may not be apparent from the catalog information.
  • Direct review of shared information may utilize the transmission of the search parameters to sharing member's computers, such that the CSS installed on the machine may perform a comparison of the parameters to the material shared in the member's library.
  • the sequence and accession number may be stored in a member's library, reducing the amount of information which needs to be stored in the database 110. This method allows for richer searching, as more information than can be efficiently stored on the GSSS or database can be examined, using the distributed processing power of each sharing member's computer.
  • the CC reports to the GSSS whether any material meeting the search parameters is stored on the computer, and if so, the identity of the shared information which meets the parameters.
  • the CC may report only positive results to reduce the amount of information transmitted to the GSSS.
  • the search method may also be a combination of database and direct searching, utilizing the database to na ⁇ ow the searched files or libraries, and thus compromising on the required data transmissions while still allowing the richer direct comparison results.
  • the search parameters for genetic sequences may include a known string of values, wherein a member seeks genetic sequences containing the string.
  • a searching member may also tolerate a threshold level of matching, such as, for example, that 70% or 80% of the values in the string are matched in a shared file. This threshold value may be specified by a searching member, and used in conjunction with other search parameters to na ⁇ ow or broaden a search.
  • the transfer function 308 is shown in Figure 4 as embodied in a graphical user interface between a member and the member's associated client computer.
  • the transfer interface 400 may include, but is not limited to, a display of the catalog information 402 associated with the source of a shared file being transfe ⁇ ed, a summary value for reputational information 404, and statistical information 406 regarding the searching member's activities.
  • a genetic sequence to be downloaded may appear in a Sequence Alignment Window 408 along with any overlapping sequences that are present in the searching member's library 410, 412 [Sequence 1 and Sequence 2 in figure]. These genetic sequences appear in a window along with a copy of the search string 414.
  • the search string 416 [represented by the black bar in the diagram] is displayed within the context of the entire sequence to show the member the direct area of overlap.
  • the similarity between the search string and the resultant sequence may be expressed with a separate value 418.
  • Sequence 2 (element number 412) represents an additional sequence in the member's library which overlaps with the downloaded sequence but not with any other sequence in the library. In this way, the member is able to build a larger contiguous genetic sequence (e.g., the member has sequence information spanning base pairs 200-1500, instead of just 200-1322 if Sequence 2 is absent).
  • the sequence overlap presentation may also be dependant on a tolerance defined by the searching member, such that individual files stored in a member's library may he compared to the biological information being considered for transfer, with each file defining biological information in the sharing member's library being compared to the information being considered, and an overlap value determined for each comparison between a file being considered for transfer and an individual file in the searching member's library.
  • the searching member After reviewing the sequence specifications, member specifications and viewing the alignment of the sequence as it compares to the sequences in the member's library, the searching member will have the option of transferring a copy of the sequence information from the sharing member's CC to their own library by selecting a complete transfer button 420. If the searching member is not satisfied with the sequence information, the searching member may decide to cancel the transfer by selection cancel transfer button 422.
  • a library maintenance function associated with client side software may be implemented to allow a member to add biological information files to the member's library.
  • This addition function can designate information as either shared or closed. Files that a member does not desire to share may be designated as closed to allow the member to designate files for comparison purposes during search and transfer functions.
  • the CSS 122 may also act as a firewall between members and the GSSS 102 on the Internet 104. Such a firewall is useful to prevent members from providing information for sharing which may be harmful to other members, or to the GSSS 102.
  • the illustrated embodiment includes a computer- virus scanning routing 148 (Shown in Figure 1) which is executed when genetic sequence files are formatted for sharing. The scanning routine works simultaneously with the formatting routine, such that any genetic sequence that is reformatted to be shared is also virus-scanned. If the scanner 148 detects the potential presence of a computer virus in the file, the scanner 148 may mark the file as potentially infected, and report the condition to the GSSS 102.
  • references to a shared file which is marked as potentially infected will not be added to the database, thus preventing the possibly infected file from being unwisely transfe ⁇ ed. Additional actions may also be initiated upon discovery of a potentially infected file, such as quarantining the member who attempted to share the file, and/or requiring all shared files in the member's library to be rescanned.
  • a member authentication routine 150 may also be employed to ensure that copy transfer requests, when the requests are issued client-to-client, are valid requests from members of the network of members.
  • the member authentication check 150 functions by verifying that the received transfer request is from a member who received the address by conducting a valid search utilizing the GSSS 102.
  • the GSSS 102 Each time a member runs a search, the GSSS 102 generates the list of shared genetic sequences that meet the searcher's criteria. The member selects a link to a sharing member's client computer that houses the shared genetic sequence. By logging each link that the searching member actuates, the OSSS 102 creates a log of valid transfers created by the GSSS 102.
  • a transfer request made to a sharing member's could be made without the use of the GSSS 102,and thus be an invalid transfer request.
  • the sharing member's CC 118 may query the GSSS 102 as to whether the received transfer order was generated through the GSSS 102. If the request was not made through the GSSS 102, the sharing member's CC 118 may be so notified, and instructed to refuse the transfer request.
  • An alternate embodiment of the authentication routine may rely on the use of certificates as a means of verifying that the searching member is a member of the genetic sequence sharing network.
  • Certificates are identification data blocks issued by certificate authorities, where the data blocks may be encrypted using asymmetrical encryption based on private-key, public key methods.
  • the client side software may either decrypt the certificate itself, or reflect the certificate to the GSSS 102 for the GSSS 102 to decrypt. If the certificate is not valid, the transfer may be refused.
  • the present system allows genetic information to be shared without requiring the rigor of scrutiny before the data is shared.
  • Present methods of publicly publishing genetic information require the data to be scrutinized before it is published, for academic and liability reasons. As such, a researcher must first format the data in a form acceptable to a reviewing body, answer questions regarding the derivation of the data, and then await a publication decision from a publishing board.
  • This distinction allows genetic information to be published without requiring a researcher who derived the data to expend resources to publish the information, thus allowing more genetic information to be made available for the same level of effort.
  • the lack of scrutiny of the genetic sequence being shared results in some limitations as to the value of the genetic sequence.
  • the first is a peer review reference, wherein comments regarding the accuracy or inaccuracy of a specific member's shared genetic sequences may be received, stored, and then indexed for review in association with all of the shared genetic sequences of that specific member.
  • the second method maybe to provide a chromatograph image file from which a genetic sequence is derived along with the shared genetic sequence file itself during transfer.
  • the second method may be expanded to include additional reference files with the shared information, such as articles written regarding the biological information, chromagraph or other image files, files containing researcher notes regarding the shared information, or other files which may be of use to a searching member.
  • shared information and/or a shared file may hereinafter refer to either the biological information itself or the joined biological info ⁇ nation and associated files.
  • a chromatograph is a graphical representation of raw sequencer results.
  • the chromatograph allows a researcher to compare a given genetic sequence against the sequence's underlying chromatograph to review the resultant sequence data, in effect is double-checking the derived genetic sequence.
  • a copy of the chromatograph from which the sequence was derived may be attached to a transferable or transfe ⁇ ed genetic sequence. Since it is difficult to falsify chromatographs, the attachment of the chromatograph to a sequence file provides a measure of authentication of the validity and accuracy of the genetic sequence.
  • the transmission of genetic information files from one member's computer to another member's computer preferably takes place between the two computers as a means of reducing traffic through the GSSS 102.
  • the client side software may include encryption and digital rights management technologies.
  • a variety of encryption technologies may be implemented, however the system of the present illustration is based on a limited public-key encryption system such that the client side software may be distributed to researchers in foreign countries without breaching encryption export limitations. If the network of researchers is intended to be entirely within the United States, higher levels of encryption may be utilized.
  • the encryption algorithm encrypts data being transfe ⁇ ed from a sharing member's computer, and another member's client side software decrypts the transfe ⁇ ed data when received.
  • the client soft software modules may use secure socket layer protocols, transport layer security protocols, and/or other forms of public-key encryption.
  • digital rights management techniques may be incorporated into the genetic sequence files, such that access to the genetic sequence, and/or the ability of a member to copy or further share a genetic sequence, is controlled by a member who shares the original genetic sequence.
  • the implementation of such technology allows a member to control access to a shared genetic code, even after the code has been transfe ⁇ ed to a searching member. This allows private owners of genetic sequences, who desire to charge for access to the sequence, to limit a transferal of a genetic sequence to only the recipient of the sequence.
  • embedded icons can be inserted into shared biological information to determine whether a file has been modified since it was originally transfe ⁇ ed, and also who the original source of the file was.
  • the transfers may be also tracked for billing purposes.
  • Transfer records documenting individual file transfers may also be associated with individual members, tracking both transfers to and transfers from, to allow further historical analysis of shared biological information.
  • the GSSS 102 may provide support for the transfer of shared genetic sequences from sharing member's CC's 118 when the CC's 118 are not connected to the Internet. Such a delayed transfer may be accomplished by the GSSS 102 presenting all files that meet a searching member's criteria, not just those belonging to a presently connected member.
  • the GSSS 102 may generated a file transfer request to the sharing member, with the GSSS 102 as the recipient of the file transfer, followed by transfer of the file from the GSSS 102 to the searching member when the searching member is connected to the GSSS 102.
  • Such an implementation requires the GSSS 102 to maintain available memory for the temporary storage of delayed transfer files.
  • transfer requests from sharing members that are connected to the GSSS 102 would occur CC-to-CC, as utilized in the embodiment described above.
  • a searching member could apply sequence comparison tools as described above, allowing a right of rejection of a file that differs from what the searching member believed himself to be requesting.
  • the above system may be implemented in a business activity which facilitates the sharing of genetic information.
  • Such an activity is premised on the providing 502a GSSS 102 as described above.
  • the business activity may subscribe users 504 to the sharing service.
  • the subscriptions may be fee based or free.
  • Fee based subscriptions may also accommodate varying rates, with discounted rates provided for members who contribute shared genetic information, while members who only search for genetic information are required to pay a frill rate.
  • the subscription may be a flat-rate subscription, or may implement different rates based on usage.
  • Usage-based subscription rates may be based on a fixed amount per search request. Additionally, free searches could be provided to a member in exchange for the member making a unit of genetic information available for sharing.
  • the usage-based subscription rate may also be combined with a fixed fee, such that upon payment of the fixed fee, a set number of searches are free to the member.
  • Such a hybrid system may also be implemented by varying the fixed portion of a hybrid fixed/usage fee inversely with a per- search rate. As such, a member who desires to make frequent searches may pay a high fixed fee, but a low per-search fee, resulting in lower overall costs.
  • client side software is distributed 506.
  • the client side software provides a member with the ability to format genetic information for sharing, and to provide information related to cataloging the shared genetic information.
  • the software distribution may either be by physical delivery of a storage medium, or by providing a down load site for a member to down-load the software from.
  • the member may be instructed to install 508 the software on the client computer associated with the member.
  • the software distribution may also involve the distribution of a security such as a password, certificate, or encryption key.
  • the security code may also enable the client side software. Such a security code may also be distributed from the client side software.
  • the member may be prompted to format 512 and provide cataloging data for information that the member has a willingness to share.
  • the client computer under the direction of the client side software, reports 514 the cataloging information and identity of the member to the GSSS 102.
  • the information sharing receiver receives 516 the information, and stores 518 the information in a database.
  • the GSSS 102 also may associate a link 520 to the member's CC 118 with the genetic information, such that later retrieval of the cataloged information by a searcher results in the GSSS 102 being able to identify the path to the shared data.
  • the GSSS 102 may associate a reference 522 to reputational information associated with the member to the cataloged information, such that members may consider the reputational information of the source of genetic information identified on a search before requesting a copy of the genetic information.
  • files may only be shared while a sharing member is connected to the GSSS 102.
  • the GSSS 102 indexes 524 the database to reflect the availability of the information for sharing.
  • the availability of information for sharing may be indicated by the setting of a flag associated with a member or a file.
  • the server is notified of the unavailability of the information by the log off procedure, and updates the database to remove files shared by the member who logged
  • each server in such a configuration contains a copy of the database for searching.
  • one server is designated as the master server, and is responsible for re-indexing the data-bases maintained by each server, and publishing 526 the re-indexed information to the other servers.
  • the system may maintain the catalog information in memory when a member logs off and only remove a reference signifying the availability of the shared files, such that a later log-on only requires entry of new shared files, and the re-setting of a flag indicating that a member is connected to the network.
  • the system may also send test communications to members to determine if they are still connected to the network in advance of file transfer requests, or may merely monitor transfer requests for failed transfer requests, and update the availability information in the database based on a communications failure.
  • An alternate embodiment of the system may rely on the use of standard Internet browsers for viewing generated display pages on the GSSS 102, obviating the need to distribute client side software to members who only intend to search.
  • the information sharing software may issue an electronic request to the client computer on which the information is shared for the client computer to transmit a copy of the genetic information directly to the requesting member, for example using FTP (file transfer protocol) or HTTP (hypertext transfer protocol).
  • the GSSS 102 acquires parameters 530 describing the genetic information sought by a member. This information is compared 532 to the catalog information in the database. A list of shared information which meets the search criteria is generated 534 for the searching member, and transmitted 536 to the searching member's computer.
  • the searching member desires to further examine shared information listed in the search results, as shown in Figure 3, the user may identify information for further examination by selecting the information on the search list. Selecting the information on the search list transfers 540 the searching member to the transfer graphical user interface, shown in Figure 3.
  • the GSSS 102 in conjunction with the client side software may also generate 542 a sequence comparison display as shown in Figure 4. The searching member may then decide to transfer the file, to examine a different file, or to conduct a new search. If it is determined 546 that the searching member desires to begin a new search, the process reverts to step 528. If it is determined that the member desires to view further information on different shared information, the process reverts to step 536.
  • a searching member desires to obtain a copy of shared information identified as meeting that member's search criteria
  • the searching member merely selects the address link for the desired shared information from the list of shared information meeting the member's search criteria.
  • a list of reputational information associated with a source of shared information is generated 542 and transmitted to the searching member.
  • the location of the shared information may be implemented as a blind link to the location, such that a member must use the link to identify the destination.
  • a GSSS 102 would allow a GSSS 102 to base usage fees on a per transfer basis, rather than on a per search or fiat fee basis.
  • the use of blind links would also allow copy fees for a sharing member to be collected if required by a sharing member.
  • the actual transfer 546 of the file may be implemented by peer-to-peer sharing, wherein the file is transfe ⁇ ed from the sharing member's computer to the searching member's computer directly, or by transferring the file to the GSSS 102, and from there to the searching member's computer.
  • a hybrid system may also be implemented, where shared information from a member who is presently connected may be transfe ⁇ ed peer-to-peer, while delayed file requests (because the sharing member is not presently connected to the net) may be handled by uploading the file to the GSSS 102 when the sharing client is available, then transfe ⁇ ed to the searching member's computer when the searching member is available.
  • the system may incorporate an member authentication procedure to confirm for a sharing member that the persons requesting the genetic information is authorized to receive the material.
  • the sharing member's computer receives the request, and identifies the requestor.
  • the sharing member's computer transmits an authorization or member validation query, including the identity of the requestor, to the GSSS 102. If the GSSS 102 confirms that the requestor is a valid member, then the GSSS 102 transmits an approval or authorization to the sharing member's computer.
  • the inclusion of the authorization check provides a redundant means for the GSSS 102 to track the genetic info ⁇ nation file transfer requests made by a searching member for fee calculation purposes.
  • the genetic information sharing system may be propagated between peer computers, wherein search routines are based on echoed queries from member to member of a pool of member's who desire to share genetic information.
  • search routines are based on echoed queries from member to member of a pool of member's who desire to share genetic information.
  • the peer-to-peer implementation relies only on software agents installed on member's computers.
  • the software agent is responsible for cataloging the information available to be shared on the member computer, and searching the cataloged information in accordance with a search request.
  • the software agent may store cataloged information prior to receiving a search request, or may store the catalog information in individual shared information files, and search the files upon a search request. Additionally, the software agent is capable of forwarding the search request to other computers known to be part of the network, and to have the software installed thereon.
  • a member When a member wants to search for genetic information, he generates a search request on his or her computer. Next, he or she determines the address of at least one other computer that is part of a network of computers associated with members who are involved in the sharing of genetic information. This determination may be accomplished by the member's computer connecting to a known Internet location to obtain the address of an associated computer that is presently connected to the Internet, or by reference to a previously known address or group of addresses.
  • the search request is then transmitted via the Internet to the at least one other computer address. Since the computer located at the address to which the transmission was sent is responsible for searching itself for shared information meeting the search criteria and forwarding the request to other computers associated with the sharing group, the initial transmission must be sent to a computer that is turned on and connected to the Internet. The initial transmission of the search request to multiple addresses will increase the likelihood of the message being transmitted to an active computer.
  • Each computer receiving the message searches itself for shared information meeting the search criteria. If the receiving computer determines that it has a file or files containing information which meets the search criteria, the receiving computer may generate a message to the search computer that the receiving computer has files meeting the search criteria, or alternately send the files meeting the search criteria.
  • the receiving computer forwards the search request to at least one other computer associated with the genetic information sharing association.
  • Search requests may be controlled in several ways.
  • a simple time-out counter may be included in the original search request. Each time a computer searches for shared information, the counter is incremented. Once a search request has been forwarded a predetermined number of times, the receiving computer will search its own files for shared genetic information that meets the search criteria, and then dispose of the message without forwarding it.
  • the search request may be forwarded from a receiving computer to only a single forwarding address, wherein a receiving computer that has shared genetic information which meets the search criteria transmits the shared material to the searching member's computer, then discards the search message.
  • a further embodiment of the system for sharing biological information may be implemented using a hybrid system that uses a server for storing addresses for member's who have biological information available for sharing.
  • the hybrid system thus only stores member information in the server side database.
  • Search routines using a hybrid system can either rely on client side software to perform searches on shared information, or may use server side software to search libraries on member's computers to which the GSSS has been granted access authority.
  • Client side software for this implementation receives a search request from the GSSS, and then searches libraries contained on the member computer for info ⁇ nation which meets the search criteria. If shared information meeting the criteria is found, the member computer reports the presence of the information to the GSSS, which then compiles a list of information meeting the criteria for the searching member.
  • the hybrid system can also use the GSSS to perform searches on information stored on individual member's computers. This system relies on access to the shared information on member's computers.
  • the GSSS uses this authority to access the shared information, and search the information for information meeting a set of search criteria. Such a search may be accomplished by transfe ⁇ ing from the GSSS to the member computers a self-executing program, such as a Java applet or web spider, to shared libraries.
  • the executable program searches information stored in the library for information meeting search criteria, then reports to the GSSS when information meeting the search criteria is found. Once the self-executing program reports back to the GSSS regarding discovered information, or if no information meeting the criteria identified, the self-executing program may delete itself.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention is a system and method for sharing biological information such as genetic sequences. The system (100) is based on providing a host computer (102) which maintains a database of biological information available for sharing by members (106) of a biological information sharing network. The host computer (102) maintains a database of information available for sharing. Sharing is accomplished by transferring biological information from one member's computer (118) to another's. Software is preferably provided to the members (106) to assist in the formatting of biological information to be shared. The software may also have a virus scanning capability to prevent infected information files from being shared. Provisions may also be supplied for encrypting information being transferred from one member to another, or for using digital rights management technology for limiting the ability of the receiving member to re-share the biological information.

Description

METHOD AND SYSTEM FOR SHARING BIOLOGICAL INFORMATION
Field of the Invention The present invention pertains to the sharing of information related to biological research, and more particularly, to the sharing of genetic sequence information accumulated on individual computers through client computer to client computer file transfers.
Background of the Invention Research into genetic structures has provided a large mass of biological information, including but not limited to genetic sequence information, protein/peptide sequences, antibody data, cell line, ribozyme sequences, anti-sense oligo sequences, transgenic data, chemical structure/database information, and molecular beacon data.
Much of the biological infoπnation is characterized by a sequence of values defining the characteristics of the biological information. Deoxyribonucleic acid (DNA) is a molecule that carries genetic information, encoded in a linear sequence of nucleotide bases. There are four such bases, generally abbreviated A, C, T and G, so that DNA is generally represented as a long sequence drawn on a four letter alphabet. Acting in concert with the environment, these DNA molecules (called the genome or genotype) determine the structure, function and behavior (called the phenotype) of all living things.
Knowledge of these sequences is useful for many purposes, including reducing the expense associated with the testing of new drugs, and efforts to understand diseases such as cancer and AIDS. This mass of knowledge, however, is distributed among many researchers and publications.
Several public databases, such as the dbGSS database provided by the National Center for Biotechnology Information , exist for publishing information related to genomics. These databases require an extensive review of biological information before the biological information may be posted to the database, resulting in delays before the biological information becomes available to other researchers. Also, the database host must be able to store all of the information submitted by the submitters of the biological information.
Other databases are run by private organizations, with the purpose of selling biological information for which the private organization has acquired proprietary rights. These private databases also require large storage capacities to enable the host computer to store all of the biological information referenced in the database.
Public and private databases are consistent in that each database includes only a limited amount of biological information, necessitated by the limited storage capacity of the database operator, the requirements for submitting information for the database, and proprietary interests in maintaining a database for information owned by the database host.
Recently, as a technique for sharing music files, programs have been developed which allow a host database to direct searchers to member hard drives to copy sought after music files. As such, the actual transfer of music files occurs between individual members of the network, commonly referred to as peer-to-peer sharing. The host database includes a description of material available on individual member's hard drives, and the address of the member (which can be a floating IP address), such that other members may search for specific music by querying the host database for storage locations of music meeting parameters identified by the searching member.
As an evolutionary step beyond the use of a host database for sharing music files, software has been developed which allows full peer-to-peer sharing of information stored on individual member hard-drives, without requiring the intermediate step of searching a hosted database of characteristics identify available material to identify the locations of the material. The peer-to-peer sharing software allows an individual member to generate a query which is passed to the computer of each member in the peer-to-peer network, where the query is compared to material shared on the member computers.
The use of sharing technology has several disadvantages. The sharing technologies create security risks for both the requesting party and the sharing party in a transaction. The requesting member is able to place data on the sharing member's hard drive during the transfer, resulting in the potential for computer viruses to be distributed through the sharing network. Such viruses can also be transferred from the sharing member's computer to the requesting member's computer.
The existing sharing technology is also limited in that a requesting member does not receive any quantification or qualification of the accuracy of the information shared. Whereas major biological infoπnation databases screen submissions for accuracy, information shared by members in a peer-to-peer sharing network, either with or without a centralized database, becomes available for sharing without undergoing peer review. This is both a benefit and a disadvantage, in that the reduced review requirement leads to less effort being required to share the data, while at the same time allowing inaccurate data to become published.
Summary of the Invention The present invention is a system and method for sharing biological information over a network. The system includes members who are communicably connected over the network to a network host. At least one of the members has biological information stored on a computer that they are willing to share over the network. The network host has a database, which contains criteria identifying the biological information stored on member's computers. Associated with the criteria identifying the biological information stored on member's computers are references which identify where the information is stored. The network host provides a search tool for the members, so that the members may search the biological information shared by other members to find biological infoπnation of interest. Once a searching member has identified biological information of interest, a copy of the information may be transfeπed from the computer on which the information is stored onto the computer of the searching member.
The members are provided with software for installation on their computers. The software assists in formatting biological information which individual members are willing to share, as well as reports to the network host what biological information has been shared by each member.
The software further comprises features for enhancing the security of individual members associated with the network. The software in one embodiment includes a virus scanner which determines whether biological information made available for sharing is infected with a computer virus, and if infected reports the infection to the network host. In another embodiment, the software encrypts the biological information being transfeπed so that it may not be intercepted during transmission. In an alternate embodiment, digital rights management technologies are used to limit the use of the biological information by the searching member who receives the information.
The invention is also embodied in a method for exchanging biological information. The method is based on providing at least one host computer, and subscribing several members to form a biological information exchange network. Each member has a network device capable of storing biological information, and communicating with the host computer. A database is provided on the host computer. The database stores information describing biological information made available for sharing on the individual member's network devices and information regarding where shared biologic material is stored.
The host computer receives search parameters from a member seeking biological information, and searches the criteria provided by the subscribing members describing biological information which is available for sharing. In one embodiment, the host computer also determines whether biological information which meets the search criteria is stored on a member's network device that is communicably connected to the host computer, and generates a list of biological information available for sharing for the searching member, wherein the list contains only references to biological information on network devices that are communicably connected to the host computer. In this embodiment, only member-to-member transfers of copies of biological information are enabled. In another embodiment, the one host computer generates a list of all biological information which is shared, and allows delayed transfer of a copy of the biological information when both searching and sharing members are communicably connected to the network host. Alternately, the host computer receives a copy of the biological information when the sharing member is communicably connected to the host computer, and transfers the copy to the searching member when the searching member is communicably connected with the host computer.
The present invention is also embodied in a business which provides a biological information exchange. The biological information exchange charges a fee for membership in the biological information exchange. The fee may be a flat fee based on membership in the exchange network, or based on the number of files transfeπed by an individual searching member.
Brief Description of the Drawings Figure 1 shows a system diagram of a system embodying the present invention.
Figures 2 A-2B show a database structure as used with the present embodiment;
Figures 2C-2D show an alternative database structure consistent with the present invention.
Figure 3 shows an illustrative embodiment of a client side software member interface, showing a search entry display.
Figure 4 shows an illustrative embodiment of a client side software member interface showing a file transfer display. Figure 5 shows a process flowchart for providing a genetic information sharing system.
Detailed Description of the Invention
Refeπing to Figure 1, wherein like numerals refer to like elements, there is shown a diagram of a genetic Biological Information sharing system 100 according to the present invention, implemented to show genetic sequence information being exchanged. The genetic sequence sharing server 102 (hereafter "GSSS") is connected to a network 104 such as the Internet or a local intranet via a communications connection. The GSSS 102 of the embodiment illustrated is connected to the Internet. Such connections are commonly known, and may include dial-up connections, cable or DSL interfaces, or a dedicated interface such as a Tl connection. The GSSS 102 acts as a repository or database of references to members 106 who have indicated a willingness to share genetic information, and also hosts server side software 108 (hereafter "SSS") which controls upkeep and access to the database 110.
The GSSS 102 is generally a computer which includes memory 112, a processor
114, and an interface 116 to a network. In the presently illustrated embodiment, the network is the Internet, allowing researchers to become members of the system without requiring significant investment in communications infrastructure. Although the GSSS 102 is described in the illustrated embodiment as a single server, a cluster of servers 102a, 102b, ... 102n may be used to provide adequate system performance if the number of members 106 so requires. In such a configuration, each server is a copy of each other, with one server being designated as the master server. The memory of the GSSS 102 stores sharing system software 108, described below. A database 110 of information regarding available genetic sequences is also connected to the GSSS 102, either resident in memory or housed on a separate machine. The database 110 may use a relational database for storing information related to individual shared genetic sequences, as described below. The database 110 may be a central database, with multiple sharing servers 102a, 102b... 102n able to access the database 110, or it may include miπored copies of a database, with a copy stored on each server 102a, 102b ....102n.
Client computers 118 (hereafter "CC's") associated with members 106 who desire to share or search for shared genetic sequence information are also connected to the
Internet 120. The Internet connection allows the client computers to communicate with the GSSS 102. The client computers 118 may be associated with an individual researcher, or with a research establishment 124. In the presently prefeπed embodiment, each client computer 118 includes client side software 122 (hereafter "CSS"). Client computers 118 used for sharing genetic sequences also may include memory 124 in which genetic sequence information to be shared is stored. A library 126 of such material may be created in the memory 124 to provide a boundary between shared and non-shared files on a member's computer. Such a boundary may be created by using a sub-directory or folder, or may involve the creation of a separate logical drive dedicated to shared information.
In the illustrated embodiment, the CSS 122 allows each CC 118 to either search for or share genetic information, however different versions of the CSS 122 may be made to be limited to searching only, sharing only, or searching and sharing capable. Each version of CSS 122 which is capable of sharing genetic information includes a formatting tool for converting genetic information on a member's computer into a standardized format, ensuring that other members will be able to access the data.
DataBase and Library Structures As shown in Figures 2 A and 2B, the server side database 110 includes catalog entries 202 for searching for shared biological information. Each unit of biological information shared by a member may be represented by a catalog entry 202a, 202b, ... 204n in the database on the GSSS. The catalog records may also include a flag indicating whether or not a sharing member is presently available on the network. The entries in the database are associated with member information records 204a, 204b, ... 204n.
The biological information 206 stored in a member's library may include a tag name 208 for the infoπnation, the biological information itself 210, a chromatograph file 212, research notes 214, journal articles 216 associated with the biological information, and other associated information 218 associated with the shared biological information. The information may include multiple files associated with the information, or may utilize a information file which includes one or more of the above associated material.
The catalog entries 202a, 202b, ... 204n stored in the GSSS database may include the tag name 206 selected by the sharing member to describe the biological information, a reference ID number 220, an ID field for identifying the sharing member 222, an accession number 224, the tissue type 226 from which the information was generated, the organism type 228 from which the biological information was derived, and a sequence 230 associated with the biological information. In addition to the catalog entries, the GSSS database 110 may also contain information associated with the sharing member. This information may include the member ID 222, a reference ID for the member 232, information concerning a member's affiliation 234 with a research organization, reputational information 236 associated with the sharing member, the member's transfer record 238, subscription information 240 associated with the member, and the member's Internet address 242 when the Internet is used as the network communicably connecting the member with the GSSS.
Member records may also be established for member's who do not share any information. The member records are preferably separate records which are linked to catalog entries, allowing a single member record to be associated with multiple catalog entries, reducing the amount of memory required to store the member's records. Alternately, elements of the catalog infoπnation may be stored in individual member's libraries where associated shared biological information is stored, further reducing the amount of memory required for the GSSS.
The database 110 of the illustrated embodiment stores cataloging information of genetic sequences as the genetic sequences are shared. The database 110 also maintains a flag 242 associated with shared genetic sequences which indicates whether the shared sequence is presently available for file transfer. The determining factor as to whether a sequence is presently available for file transfer is the existence of an open connection between the GSSS 102 and the CC 118 on which the sequence is stored. When a member logs on to the GSSS 102, the GSSS 102 server updates the database 110 to show that genetic sequences shared by that member 106 are available for transfer. When the member 106 logs off, the GSSS 102 updates the database 110 to show that the genetic sequences shared by that member 106 are not presently available. This method of segregating shared genetic sequences supports the use of the system as a CC-to-CC transfer system only, since only files that are available to be transfeπed CC-to-CC get reported to a searching member in a search report.
Server Side Software
The SSS 108 has several functions. As shown in Figure 1, to accomplish these functions, the software includes several modules, with each module responsible for a given function. A sharing coordinator module 128 interacts with the individual modules to direct the individual modules to accomplish a required task. The individual modules include a search module 130, a module for receiving new catalog information 132 on newly shared files, indexing 134 the database to reflect file availability, subscribing new members 136, distributing copies of CSS 138, and member authentication modules 140. Other modules, as described below, may be implemented as required.
A principal function of the SSS 112 is to control a repository for references to shared genetic information stored on individual CC's. As a member 106 places genetic information into a format for sharing, the GSSS 102 is notified by a transmission from the CC 118 to the GSSS 102 of the catalog information for the newly shared files.
The GSSS 102 also maintains a module for administering member subscriptions 136, such as described below. The administration of subscriptions allows the GSSS 102 to maintain access control over the searching capabilities, and over requested transfers of genetic sequence information. The GSSS 102 may also include an authentication module 140, which allows the GSSS 102 to confirm the validity of a requested CC-to-CC transfer of genetic sequence information. The implementation of an authentication cycle also increases security regarding the accessing of genetic sequence data.
Client Side Software
As noted above, the client side software (CSS) 122 is responsible for allowing a member to interface with the GSSS 102 through a CC 118, and to facilitate the transfer of information to and from the member's CC 118. To accomplish this, the CSS 122 also includes an interface generating module 146.
Figure 3 shows a member graphical user interface display 300 for illustrative purposes. The interface has four primary function groups which a member may access through the illustrated interface. These are a chat 302 function, a set of library functions 304, a search function 306, and a data transfer function 308. Each function may be invoked by clicking on the tab representing each function.
The chat function 302 allows a member to enter into a conversation with another member over the Internet. The presently illustrated embodiment uses a typed chat function, wherein the members are able to enter messages using the keyboard of their associated client computer 118. The chat function may also be implemented by Internet telephony, or Internet videoconferencing.
The chat function allows researchers to coπespond regarding shared genetic sequence information. For example, a researcher who has identified what he or she believes to be a relevant genetic sequence may have questions regarding how the sequence was derived. The researcher may contact the member who has shared the data to inquire how the information was derived. Since the database 110 already has implemented a signal regarding the availability of the sharing member, the availability flag may also be used to only present a sharing member's contact address when the sharing member is connected to the GSSS 102.
The library function 304 allows a member to set up a library 126 for storing genetic sequences which the member is willing to share, assists the member in creating formatted copies of the genetic sequences which are to be shared, assists the member in providing cataloging information regarding genetic sequences to be shared, and allows the member to remove a shared genetic sequence from the library, thus removing its availability for sharing.
The searching function 306 allows a member to define parameters for a genetic sequence which the member is searching for. The search may be for a specific sequence string, and may also may be further refined based on other search criteria. Search criteria illustrated in Figure 2A include gene name 312, member code 314, accession number, organism, and tissue type. The member code may be used to search for shared genetic sequences shared by a specific researcher. Other search parameters may include, but are not limited to, dates when a genetic sequence was initially shared and researcher type information, such as whether the source of the shared file is a private or public organization, and whether the source of the genetic sequence places additional conditions on the use of the genetic sequence.
The presently prefeπed search method compares a sought after sequence to sequences associated with shared biological information. A searching member may specify a tolerance for the search, where the tolerance is based on the number of positions in a sequence which differ from the sought after sequence. The tolerance may be described in a percentage of positions which match the search criteria.
Searches may be performed using the GSSS database only, the GSSS database in conjunction with direct review of shared material, or direct review of shared material only. In the case of database only searches, the GSSS reviews a member's search parameters against the catalog information stored in the database, and returns hits to the member based on the limited information stored in the database. This may result in less than optimal search results, as additional characteristics of the shared information may not be apparent from the catalog information.
Direct review of shared information may utilize the transmission of the search parameters to sharing member's computers, such that the CSS installed on the machine may perform a comparison of the parameters to the material shared in the member's library. As shown in Figure 2B, the sequence and accession number may be stored in a member's library, reducing the amount of information which needs to be stored in the database 110. This method allows for richer searching, as more information than can be efficiently stored on the GSSS or database can be examined, using the distributed processing power of each sharing member's computer. Once a CC has compared shared biological information to the search parameters, the CC reports to the GSSS whether any material meeting the search parameters is stored on the computer, and if so, the identity of the shared information which meets the parameters. Alternately, the CC may report only positive results to reduce the amount of information transmitted to the GSSS. The search method may also be a combination of database and direct searching, utilizing the database to naπow the searched files or libraries, and thus compromising on the required data transmissions while still allowing the richer direct comparison results.
The search parameters for genetic sequences may include a known string of values, wherein a member seeks genetic sequences containing the string. In this case, a searching member may also tolerate a threshold level of matching, such as, for example, that 70% or 80% of the values in the string are matched in a shared file. This threshold value may be specified by a searching member, and used in conjunction with other search parameters to naπow or broaden a search.
The transfer function 308 is shown in Figure 4 as embodied in a graphical user interface between a member and the member's associated client computer. The transfer interface 400 may include, but is not limited to, a display of the catalog information 402 associated with the source of a shared file being transfeπed, a summary value for reputational information 404, and statistical information 406 regarding the searching member's activities.
A genetic sequence to be downloaded may appear in a Sequence Alignment Window 408 along with any overlapping sequences that are present in the searching member's library 410, 412 [Sequence 1 and Sequence 2 in figure]. These genetic sequences appear in a window along with a copy of the search string 414. The search string 416 [represented by the black bar in the diagram] is displayed within the context of the entire sequence to show the member the direct area of overlap. The similarity between the search string and the resultant sequence may be expressed with a separate value 418. Sequence 2 (element number 412) represents an additional sequence in the member's library which overlaps with the downloaded sequence but not with any other sequence in the library. In this way, the member is able to build a larger contiguous genetic sequence (e.g., the member has sequence information spanning base pairs 200-1500, instead of just 200-1322 if Sequence 2 is absent).
The sequence overlap presentation may also be dependant on a tolerance defined by the searching member, such that individual files stored in a member's library may he compared to the biological information being considered for transfer, with each file defining biological information in the sharing member's library being compared to the information being considered, and an overlap value determined for each comparison between a file being considered for transfer and an individual file in the searching member's library.
After reviewing the sequence specifications, member specifications and viewing the alignment of the sequence as it compares to the sequences in the member's library, the searching member will have the option of transferring a copy of the sequence information from the sharing member's CC to their own library by selecting a complete transfer button 420. If the searching member is not satisfied with the sequence information, the searching member may decide to cancel the transfer by selection cancel transfer button 422.
A library maintenance function associated with client side software may be implemented to allow a member to add biological information files to the member's library. This addition function can designate information as either shared or closed. Files that a member does not desire to share may be designated as closed to allow the member to designate files for comparison purposes during search and transfer functions.
In addition to the express functions provided on the interface, the CSS 122 may also act as a firewall between members and the GSSS 102 on the Internet 104. Such a firewall is useful to prevent members from providing information for sharing which may be harmful to other members, or to the GSSS 102. The illustrated embodiment includes a computer- virus scanning routing 148 (Shown in Figure 1) which is executed when genetic sequence files are formatted for sharing. The scanning routine works simultaneously with the formatting routine, such that any genetic sequence that is reformatted to be shared is also virus-scanned. If the scanner 148 detects the potential presence of a computer virus in the file, the scanner 148 may mark the file as potentially infected, and report the condition to the GSSS 102. References to a shared file which is marked as potentially infected will not be added to the database, thus preventing the possibly infected file from being unwisely transfeπed. Additional actions may also be initiated upon discovery of a potentially infected file, such as quarantining the member who attempted to share the file, and/or requiring all shared files in the member's library to be rescanned.
Member Authentication
A member authentication routine 150 may also be employed to ensure that copy transfer requests, when the requests are issued client-to-client, are valid requests from members of the network of members. The member authentication check 150 functions by verifying that the received transfer request is from a member who received the address by conducting a valid search utilizing the GSSS 102. Each time a member runs a search, the GSSS 102 generates the list of shared genetic sequences that meet the searcher's criteria. The member selects a link to a sharing member's client computer that houses the shared genetic sequence. By logging each link that the searching member actuates, the OSSS 102 creates a log of valid transfers created by the GSSS 102.
A transfer request made to a sharing member's could be made without the use of the GSSS 102,and thus be an invalid transfer request. In order to limit the potential for such invalid transfer requests, the sharing member's CC 118 may query the GSSS 102 as to whether the received transfer order was generated through the GSSS 102. If the request was not made through the GSSS 102, the sharing member's CC 118 may be so notified, and instructed to refuse the transfer request.
An alternate embodiment of the authentication routine may rely on the use of certificates as a means of verifying that the searching member is a member of the genetic sequence sharing network. Certificates are identification data blocks issued by certificate authorities, where the data blocks may be encrypted using asymmetrical encryption based on private-key, public key methods. The client side software may either decrypt the certificate itself, or reflect the certificate to the GSSS 102 for the GSSS 102 to decrypt. If the certificate is not valid, the transfer may be refused.
Peer Review and Chromatogram Options
The present system allows genetic information to be shared without requiring the rigor of scrutiny before the data is shared. Present methods of publicly publishing genetic information require the data to be scrutinized before it is published, for academic and liability reasons. As such, a researcher must first format the data in a form acceptable to a reviewing body, answer questions regarding the derivation of the data, and then await a publication decision from a publishing board. This distinction allows genetic information to be published without requiring a researcher who derived the data to expend resources to publish the information, thus allowing more genetic information to be made available for the same level of effort. However, the lack of scrutiny of the genetic sequence being shared results in some limitations as to the value of the genetic sequence.
In order to provide an indicator of the accuracy of the genetic sequence, two options may be integrated with the genetic sequence transferring system. The first is a peer review reference, wherein comments regarding the accuracy or inaccuracy of a specific member's shared genetic sequences may be received, stored, and then indexed for review in association with all of the shared genetic sequences of that specific member. The second method maybe to provide a chromatograph image file from which a genetic sequence is derived along with the shared genetic sequence file itself during transfer. The second method may be expanded to include additional reference files with the shared information, such as articles written regarding the biological information, chromagraph or other image files, files containing researcher notes regarding the shared information, or other files which may be of use to a searching member. For ease of description, shared information and/or a shared file may hereinafter refer to either the biological information itself or the joined biological infoπnation and associated files.
A chromatograph is a graphical representation of raw sequencer results. The chromatograph allows a researcher to compare a given genetic sequence against the sequence's underlying chromatograph to review the resultant sequence data, in effect is double-checking the derived genetic sequence. As a second means for providing an indication of the reliability of a shared genetic sequence, a copy of the chromatograph from which the sequence was derived may be attached to a transferable or transfeπed genetic sequence. Since it is difficult to falsify chromatographs, the attachment of the chromatograph to a sequence file provides a measure of authentication of the validity and accuracy of the genetic sequence.
Data Transmission
The transmission of genetic information files from one member's computer to another member's computer preferably takes place between the two computers as a means of reducing traffic through the GSSS 102. By utilizing direct transfers between client computers, only search criteria, search results, and authentication information must be transmitted by, to, or through the GSSS 102, thus reducing the bandwidth required to support the network of members.
Since some degree of confidentiality or proprietary control may need to be exercised over shared genetic sequences, such as in the case where the sequence is owned by a private company, the client side software may include encryption and digital rights management technologies.
A variety of encryption technologies may be implemented, however the system of the present illustration is based on a limited public-key encryption system such that the client side software may be distributed to researchers in foreign countries without breaching encryption export limitations. If the network of researchers is intended to be entirely within the United States, higher levels of encryption may be utilized.
The encryption algorithm encrypts data being transfeπed from a sharing member's computer, and another member's client side software decrypts the transfeπed data when received. In order to accomplish this, the client soft software modules may use secure socket layer protocols, transport layer security protocols, and/or other forms of public-key encryption.
Also, digital rights management techniques may be incorporated into the genetic sequence files, such that access to the genetic sequence, and/or the ability of a member to copy or further share a genetic sequence, is controlled by a member who shares the original genetic sequence. The implementation of such technology allows a member to control access to a shared genetic code, even after the code has been transfeπed to a searching member. This allows private owners of genetic sequences, who desire to charge for access to the sequence, to limit a transferal of a genetic sequence to only the recipient of the sequence. Also, embedded icons can be inserted into shared biological information to determine whether a file has been modified since it was originally transfeπed, and also who the original source of the file was. Since either the GSSS 102, or the sharing member's computer, may track the transferal of genetic sequence files, the transfers may be also tracked for billing purposes. Transfer records documenting individual file transfers, may also be associated with individual members, tracking both transfers to and transfers from, to allow further historical analysis of shared biological information.
Implementing the tracking service into the GSSS 102 allows small researchers who develop genetic sequence information to post such information without first having to generate tracking and billing infrastructure to support a pay-per-share posting of the genetic sequence. Delayed Transmission
In an alternate embodiment of the present invention, the GSSS 102 may provide support for the transfer of shared genetic sequences from sharing member's CC's 118 when the CC's 118 are not connected to the Internet. Such a delayed transfer may be accomplished by the GSSS 102 presenting all files that meet a searching member's criteria, not just those belonging to a presently connected member. When a searching member requests transfer of a shared genetic sequence from a sharing member that is not presently connected, the GSSS 102 may generated a file transfer request to the sharing member, with the GSSS 102 as the recipient of the file transfer, followed by transfer of the file from the GSSS 102 to the searching member when the searching member is connected to the GSSS 102. Such an implementation requires the GSSS 102 to maintain available memory for the temporary storage of delayed transfer files. In the described embodiment, transfer requests from sharing members that are connected to the GSSS 102 would occur CC-to-CC, as utilized in the embodiment described above. A searching member could apply sequence comparison tools as described above, allowing a right of rejection of a file that differs from what the searching member believed himself to be requesting.
Method for Providing a Genetic Information Sharing Service
The above system may be implemented in a business activity which facilitates the sharing of genetic information. Such an activity is premised on the providing 502a GSSS 102 as described above. With the server in operation, the business activity may subscribe users 504 to the sharing service. The subscriptions may be fee based or free. Fee based subscriptions may also accommodate varying rates, with discounted rates provided for members who contribute shared genetic information, while members who only search for genetic information are required to pay a frill rate. In the case of a fee-based system, the subscription may be a flat-rate subscription, or may implement different rates based on usage.
Usage-based subscription rates may be based on a fixed amount per search request. Additionally, free searches could be provided to a member in exchange for the member making a unit of genetic information available for sharing. The usage-based subscription rate may also be combined with a fixed fee, such that upon payment of the fixed fee, a set number of searches are free to the member. Such a hybrid system may also be implemented by varying the fixed portion of a hybrid fixed/usage fee inversely with a per- search rate. As such, a member who desires to make frequent searches may pay a high fixed fee, but a low per-search fee, resulting in lower overall costs.
Once members have been subscribed to the system, client side software is distributed 506. The client side software provides a member with the ability to format genetic information for sharing, and to provide information related to cataloging the shared genetic information. The software distribution may either be by physical delivery of a storage medium, or by providing a down load site for a member to down-load the software from. Once the member has the copy of the client side software, the member may be instructed to install 508 the software on the client computer associated with the member. The software distribution may also involve the distribution of a security such as a password, certificate, or encryption key. The security code may also enable the client side software. Such a security code may also be distributed from the client side software.
Once a member has installed the software on the client computer associated with the member, the member may be prompted to format 512 and provide cataloging data for information that the member has a willingness to share. Once the data is formatted and the cataloging information has been provided, the client computer, under the direction of the client side software, reports 514 the cataloging information and identity of the member to the GSSS 102.
The information sharing receiver receives 516 the information, and stores 518 the information in a database. The GSSS 102 also may associate a link 520 to the member's CC 118 with the genetic information, such that later retrieval of the cataloged information by a searcher results in the GSSS 102 being able to identify the path to the shared data. Also, the GSSS 102 may associate a reference 522 to reputational information associated with the member to the cataloged information, such that members may consider the reputational information of the source of genetic information identified on a search before requesting a copy of the genetic information.
In the presently described embodiment, files may only be shared while a sharing member is connected to the GSSS 102. When the GSSS 102 receives cataloging information from a sharing member, the GSSS 102 indexes 524 the database to reflect the availability of the information for sharing. The availability of information for sharing may be indicated by the setting of a flag associated with a member or a file. When a member logs off, the server is notified of the unavailability of the information by the log off procedure, and updates the database to remove files shared by the member who logged
off.
Also in the presently described embodiment, it is envisioned that a plurality of servers may be employed to meet member demand. Each server in such a configuration contains a copy of the database for searching. In this configuration, one server is designated as the master server, and is responsible for re-indexing the data-bases maintained by each server, and publishing 526 the re-indexed information to the other servers.
Also, the system may maintain the catalog information in memory when a member logs off and only remove a reference signifying the availability of the shared files, such that a later log-on only requires entry of new shared files, and the re-setting of a flag indicating that a member is connected to the network. The system may also send test communications to members to determine if they are still connected to the network in advance of file transfer requests, or may merely monitor transfer requests for failed transfer requests, and update the availability information in the database based on a communications failure.
An alternate embodiment of the system may rely on the use of standard Internet browsers for viewing generated display pages on the GSSS 102, obviating the need to distribute client side software to members who only intend to search. Once a member has conducted a search, and identified desired genetic information, the information sharing software may issue an electronic request to the client computer on which the information is shared for the client computer to transmit a copy of the genetic information directly to the requesting member, for example using FTP (file transfer protocol) or HTTP (hypertext transfer protocol).
Once the system has received 528 a search request from a member, the GSSS 102 acquires parameters 530 describing the genetic information sought by a member. This information is compared 532 to the catalog information in the database. A list of shared information which meets the search criteria is generated 534 for the searching member, and transmitted 536 to the searching member's computer.
If it is determined 538 that the searching member desires to further examine shared information listed in the search results, as shown in Figure 3, the user may identify information for further examination by selecting the information on the search list. Selecting the information on the search list transfers 540 the searching member to the transfer graphical user interface, shown in Figure 3. The GSSS 102 in conjunction with the client side software may also generate 542 a sequence comparison display as shown in Figure 4. The searching member may then decide to transfer the file, to examine a different file, or to conduct a new search. If it is determined 546 that the searching member desires to begin a new search, the process reverts to step 528. If it is determined that the member desires to view further information on different shared information, the process reverts to step 536.
If it is determined 544 that a searching member desires to obtain a copy of shared information identified as meeting that member's search criteria, the searching member merely selects the address link for the desired shared information from the list of shared information meeting the member's search criteria. Once it has been determined 540 that a searching member desires reputational information, a list of reputational information associated with a source of shared information is generated 542 and transmitted to the searching member.
In a usage based system, the location of the shared information may be implemented as a blind link to the location, such that a member must use the link to identify the destination. Such a system would allow a GSSS 102 to base usage fees on a per transfer basis, rather than on a per search or fiat fee basis. The use of blind links would also allow copy fees for a sharing member to be collected if required by a sharing member.
The actual transfer 546 of the file may be implemented by peer-to-peer sharing, wherein the file is transfeπed from the sharing member's computer to the searching member's computer directly, or by transferring the file to the GSSS 102, and from there to the searching member's computer. Alternately, a hybrid system may also be implemented, where shared information from a member who is presently connected may be transfeπed peer-to-peer, while delayed file requests (because the sharing member is not presently connected to the net) may be handled by uploading the file to the GSSS 102 when the sharing client is available, then transfeπed to the searching member's computer when the searching member is available.
When peer-to-peer transfers are requested, the system may incorporate an member authentication procedure to confirm for a sharing member that the persons requesting the genetic information is authorized to receive the material. The sharing member's computer receives the request, and identifies the requestor. The sharing member's computer then transmits an authorization or member validation query, including the identity of the requestor, to the GSSS 102. If the GSSS 102 confirms that the requestor is a valid member, then the GSSS 102 transmits an approval or authorization to the sharing member's computer. The inclusion of the authorization check provides a redundant means for the GSSS 102 to track the genetic infoπnation file transfer requests made by a searching member for fee calculation purposes.
Server-less Genetic Information Sharing
In an alternate embodiment, the genetic information sharing system may be propagated between peer computers, wherein search routines are based on echoed queries from member to member of a pool of member's who desire to share genetic information. As opposed to the client server implementation described above, the peer-to-peer implementation relies only on software agents installed on member's computers.
In addition to the above-described function of formatting genetic information to be shared; the software agent is responsible for cataloging the information available to be shared on the member computer, and searching the cataloged information in accordance with a search request. The software agent may store cataloged information prior to receiving a search request, or may store the catalog information in individual shared information files, and search the files upon a search request. Additionally, the software agent is capable of forwarding the search request to other computers known to be part of the network, and to have the software installed thereon.
When a member wants to search for genetic information, he generates a search request on his or her computer. Next, he or she determines the address of at least one other computer that is part of a network of computers associated with members who are involved in the sharing of genetic information. This determination may be accomplished by the member's computer connecting to a known Internet location to obtain the address of an associated computer that is presently connected to the Internet, or by reference to a previously known address or group of addresses.
The search request is then transmitted via the Internet to the at least one other computer address. Since the computer located at the address to which the transmission was sent is responsible for searching itself for shared information meeting the search criteria and forwarding the request to other computers associated with the sharing group, the initial transmission must be sent to a computer that is turned on and connected to the Internet. The initial transmission of the search request to multiple addresses will increase the likelihood of the message being transmitted to an active computer.
Each computer receiving the message searches itself for shared information meeting the search criteria. If the receiving computer determines that it has a file or files containing information which meets the search criteria, the receiving computer may generate a message to the search computer that the receiving computer has files meeting the search criteria, or alternately send the files meeting the search criteria.
If the search computer does not have shared information that meets the search criteria, the receiving computer forwards the search request to at least one other computer associated with the genetic information sharing association.
Search requests may be controlled in several ways. A simple time-out counter may be included in the original search request. Each time a computer searches for shared information, the counter is incremented. Once a search request has been forwarded a predetermined number of times, the receiving computer will search its own files for shared genetic information that meets the search criteria, and then dispose of the message without forwarding it.
Alternately, the search request may be forwarded from a receiving computer to only a single forwarding address, wherein a receiving computer that has shared genetic information which meets the search criteria transmits the shared material to the searching member's computer, then discards the search message.
A further embodiment of the system for sharing biological information may be implemented using a hybrid system that uses a server for storing addresses for member's who have biological information available for sharing. The hybrid system thus only stores member information in the server side database.
Search routines using a hybrid system can either rely on client side software to perform searches on shared information, or may use server side software to search libraries on member's computers to which the GSSS has been granted access authority. Client side software for this implementation receives a search request from the GSSS, and then searches libraries contained on the member computer for infoπnation which meets the search criteria. If shared information meeting the criteria is found, the member computer reports the presence of the information to the GSSS, which then compiles a list of information meeting the criteria for the searching member.
The hybrid system can also use the GSSS to perform searches on information stored on individual member's computers. This system relies on access to the shared information on member's computers. The GSSS uses this authority to access the shared information, and search the information for information meeting a set of search criteria. Such a search may be accomplished by transfeπing from the GSSS to the member computers a self-executing program, such as a Java applet or web spider, to shared libraries. The executable program searches information stored in the library for information meeting search criteria, then reports to the GSSS when information meeting the search criteria is found. Once the self-executing program reports back to the GSSS regarding discovered information, or if no information meeting the criteria identified, the self-executing program may delete itself.
From the above, it is evident that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes of the invention. Accordingly, reference should be made to the appended claims, rather than the foregoing specification, as indicating the scope of the invention.

Claims

ClaimsWhat is claimed is:
1. A system for sharing biological information utilizing a network, comprising: a plurality of subscribing members, said subscribing members each having a network device for accessing a network, wherein at least one subscribing member has biological information for sharing, said biological information stored in memory on the subscribing member's network device;
a network host, said network host having a database,
said database containing records identifying biological information stored on member's computers, each of said records containing at least one descriptive element selected from a group of descriptive elements consisting of sequence data, a tissue source descriptor, or an organism type descriptor, and information describing an identity of a network device of a subscribing member wherein biological information associated with said record is stored;
wherein the plurality of subscribing members and the network host are communicably interconnected by a network.
2. A system for sharing biological information according to claim 1, wherein the subscribing member network devices are computers, and further include client side software, said client side software for assisting a subscribing member in formatting biological information to be shared into an acceptable format for sharing.
3. A system for sharing biological information according to claim 1, wherein the subscribing member devices are computers, and further include client side software, said client side software system for sharing biological information for computer virus infection.
4. A system for sharing biological information according to claim 1, wherein network devices are computers, and further include client side software, for encrypting biological information for sharing.
5. A system for sharing biological information according to claim 1, wherein the subscribing member network devices are computers, and further include client side software, said client side software for employing digital rights management technology to limit re-sharing of a shared biological information file.
6. A system for sharing biological information according to claim 1, wherein the biological infoπnation stored on a subscribing member's network device is a file related to a genetic sequence, said sequence comprising a plurality of position values associated with sequence values.
7. A system for sharing biological information according to claim 6, wherein the client side software further is capable of associating an image file with shared biological information, and sharing the associated image file with the biological information.
8. A system according to claim 1, wherein the network host further includes biological information sharing software, said software for receiving information from a subscribing member identifying shared biological information on said subscribing member's network device, said software further allowing information stored in said database to be searched relative to criteria provided by a subscribing member.
9. A system for sharing biological information according to claim 8, wherein the biological information sharing software further includes functionality for receiving reputational information associated with a member who is sharing information and associating the received reputational information with biological information shared by that member.
10. A system for sharing biological information according to claim 8, wherein the biological information sharing software further includes functionality for receiving a request from a sharing member to authenticate a request to share biological information from another member of the network.
11. A system for sharing biological information according to claim 8, wherein the biological information sharing software further is capable of recording transfers requested by individual subscribing members.
12. A system for sharing biological information according to claim 1, wherein the network that communicably interconnects the subscribing members and the network host is the Internet.
13. A system for sharing biological information according to claim 12 wherein the network host includes a plurality of server computers, each server computer containing biological information sharing software.
14. A system for sharing biological information, the system comprising: a plurality of subscribing members;
a means for disseminating information regarding the location of stored biological information, said means for disseminating information capable of receiving information from the plurality of subscribers identifying where biological information is stored, providing a database, said database including records describing stored biological information, said descriptions containing at least one descriptive element selected from a group of descriptive elements consisting of sequence data, a tissue source descriptor, or an organism type descriptor, said records further refeπing subscribing members to the storage location;
at least one means for said plurality of subscribers to communicably connect to the means for disseminating information, wherein said at least one means for subscribers to communicably connect to the means for disseminating information includes a memory for storing biological information.
15. A system according to claim 14, wherein the means for disseminating information further is capable of receiving search criteria from a member, and comparing said search criteria to the information received from the plurality of members to determine if biological information stored on a member's network device meets the search criteria.
16. A system according to claim 15, wherein the means for disseminating information further is capable of generating a list of biological information stored on network devices, said list including only biological information that meets search criteria.
17. A system according to claim 14, wherein the at least one means for said plurality of subscribers to communicably connect to the means for disseminating information further is capable of receiving from a subscriber a set of criteria, said criteria defining biological information sought by the member.
18. A system according to claim 17, wherein the criteria includes at least one sequence value.
19. A system according to claim 17, wherein the parameters include an accuracy threshold, said accuracy threshold defining a number of sequence values which can differ from a specified set of sequence values used as a criteria.
20. A method for exchanging biological information, comprising the steps of: providing a host computer, said host computer being connected to a network;
subscribing a plurality of members, said members each having a network device capable of accessing the host computer via a network;
providing a database on said host computer, said database for storing descriptive elements describing biological information stored on member's network devices and for storing locations associated with the biological information stored on member's network devices, said descriptive elements containing at least one descriptive element selected from a group of descriptive elements consisting of sequence data, a tissue source descriptor, or an organism type descriptor;
receiving via the network descriptive elements from members describing biological information stored on a member's network device;
storing the received descriptive elements in the database;
receiving from a member a request for biological information, said request including criteria describing biological infoπnation sought by the requesting member, said criteria including at least one sequence value; comparing the request criteria with descriptive elements stored in the database;
informing the requesting member of the stored location of biological information when criteria stored in the database meets the request criteria.
21. A method for exchanging biological information according to claim 20, further comprising the step of providing the plurality of members with a computer program, said computer program for installation on the member's network device, said computer program for assisting the member in sharing biological information.
22. A method for exchanging biological information according to claim 20, further comprising the step of indexing the database to indicate when individual members are connected to the network such that biological information may be transfeπed from one member's network device to another's.
23. A method for exchanging biological information according to claim 22, further comprising the steps of receiving from a searching member a request for the host computer to determine whether biological information described in the database meets search criteria provided by the searching member; determining whether biological information described in the database meets the search criteria provided by the searching member; and informing the searching member of the stored descriptive elements describing biological information which meets the search criteria.
24. A method for exchanging biological information according to claim 23, wherein the step of informing the searching member of the presence of the biological information which meets the search criteria includes generating a list of individual biological information references, wherein each individual biological information reference describes a single unit of biological information stored on a member's network device.
25. A method for exchanging biological information according to claim 24, further comprising the steps of: receiving from a searching member an indication that the searching member would like to receive a copy of a unit of biological information identified in the list;
determining whether the network device on which the desired biological information is stored is communicably connected to the network host;
when the network device on which the desired biological information is stored is communicably connected to the network host, transmitting to the network device on which the desired biological information is stored a request for a copy of the indicated unit of biological information to be transfeπed to the searching member.
26. A method for sharing biological information according to claim 25, further comprising the steps of: when the network device on which the desired biological information is stored is not communicably connected to the network host, storing a request to transfer a copy of the desired biological information;
determining when the network device on which the desired biological information is stored becomes communicably connected to the network host; when it is determined that the network device on which the desired biological infoπnation is stored is communicably connected to the network host, determining whether the searching member is communicably connected to the network host;
when it is determined that both the network device on which the desired biological information is stored and the searching member are communicably connected to the network host, transmitting to the network device on which the desired biological information is stored the stored request to transfer a copy of the desired biological information to the searching member.
27. A method for sharing biological information according to claim 26, further comprising the steps of: when it is determined that the network device on which the desired biological information is stored is communicably connected to the network host, determining whether the searching member is communicably connected to the network host;
when it is determined that the network device on which the desired biological information is stored is communicably connected and that the searching member is not communicably connected to the network host, transmitting to the network device on which the desired biological information is stored the stored request to transfer a copy of the desired biological information to the network host;
storing the desired biological information on the network host; and transmitting the desired biological information from the network host to the searching member when the searching member is next communicably connected to the network host.
28. A method for exchanging biological information according to claim 22, further comprising the steps of receiving from a searching member a request for the host computer to determine whether biological information described in the database meets search criteria defined by the searching member; transmitting the search criteria to member's computers on which biological information is stored, receiving comparison results from member's computer's identifying shared biological information which meets the search criteria provided by the searching member; and informing the searching member of biological information having descriptive elements which meet the search criteria.
29. A computer-readable medium tangibly embodying instructions which, when executed by a computer, implement a process comprising the steps of: prompting a member of a computer executing the computer-readable medium to identify biological information that the member is willing to share;
prompting a member to define criteria describing biological information that the member is willing to share;
formatting biological information that the member is willing to share into a standardized format;
storing the formatted biological information in a known location on the computer executing the computer-readable medium; transmitting to another computer the criteria describing the shared biological information and the location on the computer where the formatted biological information is stored.
30. A computer-readable medium according to claim 29, wherein said instructions when executed by a computer implement a process further comprising the step of scanning biological information to be formatted to detect if the biological information may be infected with a computer virus.
31. A computer-readable medium according to claim 30, wherein said instructions when executed by a computer implement a process further comprising the step of quarantining biological information to be shared when a computer virus is detected.
32. A computer-readable medium according to claim 29, wherein the another computer is a network host, and wherein the instructions implement a process further comprising the step of requesting an authentication from the network host when a request to share biological information is received from another computer.
33. A computer readable medium according to claim 29, wherein said instructions when executed by a computer implement a process further comprising the step of transmitting to another member a copy of shared biological information when the another member requests a copy of the shared biological information.
34. A computer-readable medium according to claim 33, wherein said instructions when executed by a computer implement a process further comprising the step of transmitting a chromatograph associated with shared biological information when said shared biological information is transmitted to the another member.
35. A computer-readable medium according to claim 33, wherein said instructions when executed by a computer implement a process further comprising the step of encrypting the copy of shared biological information before transmitting the copy to the another member.
36. A computer-readable medium according to claim 29, wherein the another computer is a network host, and wherein said instructions when executed by a computer implement a process further comprising the steps of: prompting a member to provide search criteria when the member desires to search for shared biological information;
transmitting the search criteria to the network host;
receiving from the network computer references to locations where shared biological information meeting the provided search criteria is stored.
37. A computer-readable medium according to claim 36, wherein said instructions implement a process further comprising the step of transmitting a request to a location where biological information meeting the provided search criteria is stored, said request requesting that the recipient of the request transmit to the sender of the request a copy of the shared biological information which meets search parameters.
38. A computer-readable medium according to claim 37, wherein said instructions implement a process further comprising the step of receiving biological information which meets search criteria.
39. A computer-readable medium according to claim 38, wherein said instructions implement a process further comprising the step of decrypting received biological information when the received biological information has been encrypted by a sender.
40. A computer-readable medium according to claim 38, wherein said search criteria comprise a genetic sequence, wherein said instructions implement a process further comprising the steps of: determining whether biological information stored on the searching member's computer has a genetic sequence matching the genetic sequence searched for;
when biological information on the searching member's computer has a genetic sequence matching the genetic sequence searched for, displaying the received biological information in a format allowing a member to compare the position of the genetic sequence of the biological information stored on the computer to the biological information received; and
when the member of the computer determines, after comparing the positioning of the genetic sequence in the received biological information to the position of the genetic sequence of biological information stored on the searching member's computer, that the member does not desire to retain the received biological information, discarding the biological information and informing the network host that the biological information has not been retained.
41. A business method for providing a biological information exchange, comprising the steps of: providing at least one host computer, said at least one host computer being connected to a network; subscribing a plurality of members, said members each having a computer capable of accessing the host computer via the Internet;
providing a database on said at least one host computer, said database for storing criteria describing biological information stored on member's computers and for storing locations associated with the biological information stored on member's computers, said criteria including at least one descriptive element selected from a group of descriptive elements consisting of sequence data, a tissue source descriptor, an organism type descriptor, or an accession number;
receiving via the Internet descriptive elements information from member's describing biological information stored on a member's network device;
storing the received descriptive elements in the database;
receiving from a member via the Internet a request for biological information, said request including criteria describing biological information sought by the requesting member;
comparing the request criteria with descriptive elements stored in the database;
informing the requesting member of the presence of stored biological information stored on another member's computer when descriptive elements stored in the database meets the request criteria.
42. A business method for providing a biological information exchange according to claim 41, further comprising assessing each member a subscription fee for subscribing to the biological information exchange.
43. A business method for providing a biological information exchange according to claim 41, further comprising the steps of providing the plurality of members with a computer program for a fee, said computer program for installation on the member's network device, and assisting the members in sharing biological information.
44. A business method for providing a biological information exchange according to claim 41, further comprising the steps of providing the plurality of members with a computer program, said computer program for installation on the member's network device, and assisting the members in sharing biological information, and indexing the database to indicate when individual member's are connected to the network such that biological information may be transfeπed from one member's network device to another's.
45. A process for exchanging biological information according to claim 44, further comprising the steps of receiving from a searching member a request for the host computer to determine whether biological information described in the database meets search criteria defined by the searching member; determining for a fee whether biological information described in the database meets the search criteria defined by the searching member; and informing the searching member of biological information having descriptive elements which meet the search criteria.
46. A process for exchanging biological information according to claim 44, further comprising the steps of receiving from a searching member a request for the host computer to determine whether biological information described in the database meets search criteria defined by the searching member; determining whether biological information described in the database meets the search criteria defined by the searching member; and informing the searching member of the stored criteria describing biological information which meets the search criteria, wherein the step of informing the searching member of the presence of the biological information which meets the search criteria includes generating a list of individual biological information references, wherein each individual biological information reference describes a single unit of biological information stored on a member's network device.
47. A process for exchanging biological information according to claim 45, further comprising the steps of: receiving from a searching member an indication that the searching member would like to receive a copy of a unit of biological information identified in the list;
charging the searching member a fee associated with the request to receive a copy of a unit of biological information identified in the list;
determining whether the network device on which the desired biological information is stored is communicably connected to the network host;
when the network device on which the desired biological information is stored is communicably connected to the network host, transmitting to the computer on which the desired biological information is stored a request for a copy of the indicated unit of biological information to be transfeπed to the searching member.
48. A process for sharing biological information according to claim 47, further comprising the steps of: when the network device on which the desired biological information is stored is not communicably connected to the network host, storing a request to transfer a copy of the desired biological information;
determining when the network device on which the desired biological information is stored becomes communicably connected to the network host;
when it is determined that the network device on which the desired biological information is stored is communicably connected to the network host, determining whether the searching member is communicably connected to the network host;
when it is determined that both the network device on which the desired biological information is stored and the searching member are communicably connected to the network host, transmitting to the network device on which the desired biological information is stored the stored request to transfer a copy of the desired biological information to the searching member.
49. A process for sharing biological information according to claim 48, further comprising the steps of: when it is determined that the network device on which the desired biological information is stored is communicably connected to the network host, determining whether the searching member is communicably connected to the network host;
when it is determined that the network device on which the desired biological information is stored is communicably connected and that the searching member is not communicably connected to the network host, transmitting to the network device on which the desired biological information is stored the stored request to transfer a copy of the desired biological infoπnation to the network host;
storing the desired biological information on the network host; and
transmitting the desired biological information from the network host to the searching member when the searching member is next communicably connected to the network host.
50. A process for exchanging biological information according to claim 46, further comprising the steps of: receiving from a searching member an indication that the searching member would like to receive a copy of a unit of biological information identified in the list;
charging the searching member a fee associated with the request to receive a copy of a unit if biological information identified in the list, wherein the fee is dependent upon the number of requests to receive a copy of a unit of biological information made by the searching member;
determining whether the network device on which the desired biological information is stored is communicably connected to the network host;
when the network device on which the desired biological information is stored is communicably connected to the network host, transmitting to the computer on which the desired biological information is stored a request for a copy of the indicated unit of biological information to be transfeπed to the searching member.
51. A process for sharing biological information according to claim 50, further comprising the steps of: when the network device on which the desired biological infoπnation is stored is not communicably connected to the network host, storing a request to transfer a copy of the desired biological information;
determining when the network device on which the desired biological information is stored becomes communicably connected to the network host;
when it is determined that the network device on which the desired biological information is stored is communicably connected to the network host, determining whether the searching member is communicably connected to the network host;
when it is determined that both the network device on which the desired biological information is stored and the searching member are communicably connected to the network host, transmitting to the network device on which the desired biological information is stored the stored request to transfer a copy of the desired biological information to the searching member.
52. A process for sharing biological information according to claim 50, further comprising the steps of: when it is determined that the network device on which the desired biological information is stored is communicably connected to the network host, determining whether the searching member is communicably connected to the network host; when it is determined that the network device on which the desired biological information is stored is communicably connected and that the searching member is not communicably connected to the network host, transmitting to the network device on which the desired biological information is stored the stored request to transfer a copy of the desired biological information to the network host;
storing the desired biological information on the network host; and
transmitting the desired biological information from the network host to the searching member when the searching member is next communicably connected to the network host.
PCT/US2001/025956 2000-08-22 2001-08-20 Method and system for sharing biological information WO2002017190A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001290545A AU2001290545A1 (en) 2000-08-22 2001-08-20 Method and system for sharing biological information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US64364300A 2000-08-22 2000-08-22
US09/643,643 2000-08-22

Publications (2)

Publication Number Publication Date
WO2002017190A1 WO2002017190A1 (en) 2002-02-28
WO2002017190A9 true WO2002017190A9 (en) 2003-03-27

Family

ID=24581696

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/025956 WO2002017190A1 (en) 2000-08-22 2001-08-20 Method and system for sharing biological information

Country Status (2)

Country Link
AU (1) AU2001290545A1 (en)
WO (1) WO2002017190A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2468144A1 (en) 2001-11-22 2003-06-12 Hitachi, Ltd. Information processing system using information on base sequence
JP3677258B2 (en) 2002-07-15 2005-07-27 株式会社日立製作所 Information processing system using base sequence related information
US20050216314A1 (en) * 2004-03-26 2005-09-29 Andrew Secor System supporting exchange of medical data and images between different executable applications
WO2019243969A1 (en) 2018-06-19 2019-12-26 Ancestry.Com Dna, Llc Filtering genetic networks to discover populations of interest
US12050629B1 (en) 2019-08-02 2024-07-30 Ancestry.Com Dna, Llc Determining data inheritance of data segments
CA3165254A1 (en) 2019-12-20 2021-06-24 Ancestry.Com Dna, Llc Linking individual datasets to a database

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970500A (en) * 1996-12-12 1999-10-19 Incyte Pharmaceuticals, Inc. Database and system for determining, storing and displaying gene locus information
US5966712A (en) * 1996-12-12 1999-10-12 Incyte Pharmaceuticals, Inc. Database and system for storing, comparing and displaying genomic information
US6125383A (en) * 1997-06-11 2000-09-26 Netgenics Corp. Research system using multi-platform object oriented program language for providing objects at runtime for creating and manipulating biological or chemical data
JP2001515234A (en) * 1997-07-25 2001-09-18 アフィメトリックス インコーポレイテッド System for providing a polymorphism database

Also Published As

Publication number Publication date
AU2001290545A1 (en) 2002-03-04
WO2002017190A1 (en) 2002-02-28

Similar Documents

Publication Publication Date Title
US7953699B2 (en) System for the processing of information between remotely located healthcare entities
US11588802B2 (en) Resource transfer setup and verification
CN109643429B (en) Federated system and method for sharing medical data
US7849306B2 (en) Relay method of encryption communication, gateway server, and program and program memory medium of encryption communication
EP2193470B1 (en) Method and apparatus for simultaneous viewing of two isolated data sources
US8037202B2 (en) Presence detection using mobile agents in peer-to-peer networks
US6532459B1 (en) System for finding, identifying, tracking, and correcting personal information in diverse databases
US8108455B2 (en) Mobile agents in peer-to-peer networks
US20030055824A1 (en) Distributed personalized genetic safe
US7438233B2 (en) Blinded electronic medical records
JP2002501250A (en) Protected database management system for sensitive records
WO2007120861A2 (en) Secure digital couriering system and method
WO2020120933A1 (en) Proof-of-work for blockchain applications
WO2001069446A1 (en) System and method for interacting with legacy healthcare database systems
CA2447963A1 (en) System and method for life sciences discovery, design and development
KR101232379B1 (en) Method and system for managing electronic personal healthrecords
US20030233258A1 (en) Methods and systems for tracking and accounting for the disclosure of record information
WO2002017190A9 (en) Method and system for sharing biological information
CN114911795A (en) Medical data processing method and application
CN1601954A (en) Moving principals across security boundaries without service interruption
JP2003016286A (en) Method, server and program for providing digital contents
JP2000250832A (en) Distributed directory management system
JP2002099773A (en) Information rating, authenticating and mediating system using genetic information database
US20070056044A1 (en) Matching entitlement information for multiple sources
CN114287001A (en) Restricted full privacy conjunctive database queries for protecting user privacy and identity

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
COP Corrected version of pamphlet

Free format text: PAGES 6, 9, 13 AND 14, DESCRIPTION, REPLACED BY NEW PAGES 6, 9, 13 AND 14; AFTER RECTIFICATION OF OBVIOUS ERRORS AS AUTHORIZED BY THE INTERNATIONAL SEARCHING AUTHORITY; PAGES 1/8-8/8, DRAWINGS, REPLACED BY NEW PAGES 1/10-10/10

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION UNDER RULE 69 EPC ( EPO FORM 1205A DATED 11/07/03 )

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP