WO2024108592A1 - 一种组学数据处理方法、装置及计算机设备 - Google Patents

一种组学数据处理方法、装置及计算机设备 Download PDF

Info

Publication number
WO2024108592A1
WO2024108592A1 PCT/CN2022/134484 CN2022134484W WO2024108592A1 WO 2024108592 A1 WO2024108592 A1 WO 2024108592A1 CN 2022134484 W CN2022134484 W CN 2022134484W WO 2024108592 A1 WO2024108592 A1 WO 2024108592A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processing
omics
identifier
preset
Prior art date
Application number
PCT/CN2022/134484
Other languages
English (en)
French (fr)
Inventor
谢尚波
肖贡
罗小舟
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Priority to PCT/CN2022/134484 priority Critical patent/WO2024108592A1/zh
Publication of WO2024108592A1 publication Critical patent/WO2024108592A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs

Definitions

  • the present invention relates to the field of biological data processing technology, and in particular to a method, device and computer equipment for omics data processing.
  • the embodiments of this specification provide a method, apparatus, computer equipment and storage medium for processing omics data, which automatically determine the target processing model based on the instrument identification and processing identification after determining the omics data to be processed, the processing identification and the instrument identification, and then process the omics data to be processed according to the target processing model, thereby improving the degree of automation in the omics data management process and reducing resource waste.
  • the embodiments of this specification provide a method for processing omics data, comprising:
  • omics data processing request Determining the omics data to be processed according to the omics data identifier included in the received omics data processing request, wherein the omics data processing request also includes the processing identifier;
  • the omics data to be processed is processed to obtain data processing result information.
  • the method further includes:
  • the omics data to be processed is stored in a target database.
  • processing identifier includes a specified data category
  • determining of the target processing model according to the instrument identifier and the processing identifier further includes:
  • a first preset processing model corresponding to the first preset data category is determined as the target processing model.
  • the first preset data category is inconsistent with the designated data category, extracting characteristic information of the to-be-processed omics data;
  • a third preset processing model corresponding to the received update data category is used as the target processing model.
  • the to-be-processed omics data is processed based on the target processing model to obtain data processing result information, further comprising:
  • the executable processing script is run to obtain the data processing result information.
  • the method further includes:
  • an omics data processing device including:
  • a first determining unit configured to determine the omics data to be processed according to the omics data identifier included in the received omics data processing request, wherein the omics data processing request further includes a processing identifier;
  • a second determining unit configured to determine an instrument identifier corresponding to an instrument that collects the omics data to be processed
  • a third determining unit is used to determine a target processing model according to the instrument identifier and the processing identifier.
  • a processing unit is used to process the to-be-processed omics data based on the target processing model to obtain data processing result information.
  • processing unit it further includes:
  • a fourth determining unit configured to determine, according to the received sharing request for the data processing result information, a user identifier included in the sharing request
  • An acquisition unit configured to acquire a preset sharing script according to the sharing request
  • an updating unit configured to update the preset sharing script using the data processing result information and the user identifier to obtain an executable sharing script
  • the running unit is used to run the executable sharing script to obtain a target address link.
  • an embodiment of the present specification further provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the above method when executing the computer program.
  • an embodiment of the present specification further provides a computer-readable storage medium on which computer instructions are stored, and the computer instructions implement the above method when executed by a processor.
  • the omics data to be processed is determined; the instrument identifier corresponding to the omics data to be processed is determined; the corresponding data category is determined with the instrument identifier and the processing identifier as indexes to determine the corresponding target processing model. Then, based on the target processing model, the omics data to be processed is processed to obtain data processing result information. Thereby, the data category corresponding to the omics data to be processed is automatically determined, and then the corresponding target processing model is automatically determined to complete the processing of the omics data to be processed. As a result, the degree of automation in the omics data management process is improved and resource waste is reduced.
  • FIG1 is a schematic diagram of an implementation system of an omics data processing method according to an embodiment of this specification
  • FIG2 is a flow chart of an omics data processing method according to an embodiment of the present specification
  • FIG3A is a flow chart of an omics data processing method according to another embodiment of the present specification.
  • FIG3B is a flow chart of an omics data processing method according to another embodiment of the present specification.
  • FIG4 is a flow chart of an omics data processing method according to another embodiment of the present specification.
  • FIG5A is a schematic diagram of a method for storing omics data according to an embodiment of the present specification
  • FIG5B is a schematic diagram showing a method for sharing data processing result information according to an embodiment of this specification.
  • FIG6A is a schematic diagram showing the structure of an omics data processing device according to an embodiment of the present specification.
  • FIG6B is a schematic diagram showing the structure of an omics data processing device according to another embodiment of the present specification.
  • FIG6C is a schematic diagram showing the structure of an omics data processing device according to another embodiment of the present specification.
  • FIG. 7 is a schematic diagram of the structure of a computer device according to an embodiment of the present specification.
  • FIG1 is a schematic diagram of an implementation system of an omics data processing method according to an embodiment of the present specification, which may include: a user terminal 101 and a server 102, wherein the user terminal 101 and the server 102 communicate with each other through a network, and the network may include a local area network (LAN), a wide area network (WAN), the Internet or a combination thereof, and is connected to a website, a user device (such as a computing device) and a back-end system.
  • LAN local area network
  • WAN wide area network
  • the Internet or a combination thereof
  • the server 102 After receiving an omics data processing request sent by a user through the user terminal 101, the server 102 determines the omics data to be processed based on the omics data identifier included in the omics data processing request; determines the instrument identifier corresponding to the omics data to be processed; determines the target processing model based on the instrument identifier and the processing identifier included in the omics data processing request; and then uses the target processing model to process the omics data to be processed, obtains data processing result information, and sends the data processing result information to the user terminal 101.
  • the server 102 when the server 102 receives a sharing request for data processing result information sent by the user terminal 101, it determines the user ID included in the sharing request; obtains a preset sharing script according to the sharing request; updates the preset sharing script using the data processing result information and the user ID to obtain an executable sharing script; and runs the executable sharing script to obtain a target address link, and sends the target address link to the user terminal 101, so that the user can share it with other user terminals through the user terminal 101. Furthermore, when the server 102 receives an omics data storage request sent by the user terminal 101, it can also store the omics data to be processed.
  • the server 102 may be a node of a cloud computing system (not shown), or each server 102 may be a separate cloud computing system including a plurality of computers interconnected by a network and operating as a distributed processing system.
  • the user terminal 103 may include electronic devices such as but not limited to smart phones, acquisition devices, desktop computers, tablet computers, laptop computers, smart speakers, digital assistants, augmented reality (AR)/virtual reality (VR) devices, smart wearable devices, etc.
  • the operating system running on the electronic device may include but not limited to Android, IOS, Linux, Windows, etc.
  • FIG. 1 is only an application environment provided by this specification. In actual applications, multiple user terminals 101 may also be included, and this specification does not impose any limitation thereto.
  • FIG2 it is a flowchart of a method for processing omics data in an embodiment of this specification.
  • the omics data processing process is described in this figure, but more or fewer operation steps may be included based on conventional or non-creative labor.
  • the order of steps listed in the embodiment is only one way of executing the order of many steps and does not represent the only execution order.
  • the system or device product is executed in practice, it can be executed in the order of the method shown in the embodiment or the accompanying drawings or in parallel.
  • the method may include:
  • the omics data to be processed is determined; the instrument identifier corresponding to the omics data to be processed is determined; the corresponding data category is determined with the instrument identifier and the processing identifier as indexes to determine the corresponding target processing model. Then, based on the target processing model, the omics data to be processed is processed to obtain data processing result information. Thereby, the data category corresponding to the omics data to be processed is automatically determined, and then the corresponding target processing model is automatically determined to complete the processing of the omics data to be processed. As a result, the degree of automation in the omics data management process is improved and resource waste is reduced.
  • a user terminal when a user wants to analyze and process the omics data to be processed obtained through an experiment, sends a processing identifier and an omics data identifier corresponding to the omics data to be processed to a server.
  • the omics data identifier represents a unique identifier that can be indexed to the omics data to be processed.
  • the processing identifier represents a unique identifier that can be indexed to a processing requirement.
  • the omics data to be processed is sorted to obtain corresponding sequence data, and the processing identifier can be, for example, a unique identifier that can be indexed to a processing requirement of "determine sequence data".
  • the omics data to be processed is determined from the database according to the omics data identifier.
  • the user terminal can also send the omics data to be processed and the processing identifier to the server.
  • the instrument identifier is determined based on the omics data to be processed or the omics data identifier.
  • the instrument identifier represents a unique identifier corresponding to the instrument that collects the omics data to be processed.
  • the instrument identifier is information input by the user when storing the omics data to be processed through the user terminal.
  • the server associates the instrument identifier with the omics data to be processed or associates it with the omics data identifier corresponding to the omics data to be processed. Therefore, after determining the omics data to be processed and the omics data identifier, the corresponding instrument identifier is determined based on at least one of the omics data to be processed and the omics data identifier.
  • the omics data processing request sent to the server includes not only the processed omics data and the processing identifier, but also the instrument identifier.
  • At least one preset processing model is pre-associated with each preset instrument identifier.
  • each preset processing model can be associated with a preset processing identifier in addition to being associated with a preset instrument identifier. That is, a preset instrument identifier and a preset processing identifier are associated with a preset processing model.
  • the configuration processing model is a model for processing omics data, for example, a normalization model, a standardization model, a univariate analysis model, and a principal component analysis model. It should be noted that multiple preset instrument identifiers and a preset processing identifier can also be associated with a preset processing model.
  • the two preset instrument identifiers corresponding to the two groups of omics data to be processed and the preset processing identifier corresponding to "determine the expression level of amino acids" are associated with the corresponding preset processing model for performing the expression level of amino acids.
  • a historical omics processing data set can be obtained, and for each preset instrument identifier and each preset processing identifier, multiple target historical omics processing data including the preset instrument identifier and the preset processing identifier can be determined from the historical omics processing data set, and processing models can be extracted for the multiple target historical omics processing data to determine the historical processing model corresponding to each target historical omics processing data; the number of times each historical processing model is adopted is determined, and the historical processing model corresponding to the maximum number of times adopted is used as the preset processing model associated with the preset instrument identifier and the preset processing identifier.
  • a target preset processing model associated with the instrument identification and the processing identification is determined from the plurality of preset processing models, and the target preset processing model is used as the target processing model.
  • the processing script corresponding to the target processing model is used to process the omics data to be processed, and data processing result information corresponding to the omics data to be processed is obtained, and the data processing result information is sent to the user terminal for visual display by the user terminal.
  • determining the target processing model based on the instrument identification and the processing identification for example, it can also include: determining the corresponding omics data format for the omics data to be processed; determining the configuration data format that can be processed by the target processing model; judging whether the omics data format is consistent with the configuration data format; and when it is determined that the omics data format is consistent with the configuration data format, processing the omics data to be processed based on the target processing model to obtain data processing result information.
  • an updated processing model is determined from the backup processing models associated with the instrument identifier and the processing identifier, and the updated processing model is used as the target processing model to process the omics data to be processed based on the target processing model to obtain data processing result information.
  • the preset processing model based on the historical omics processing data set multiple historical processing models are determined, and the historical processing model corresponding to the number of times adopted is used as the preset processing model.
  • the historical processing models corresponding to other adopted times are sorted in the order of the number of times adopted as backup processing models and associated with the preset instrument identifier and the preset processing identifier. For example, the number of times historical processing model A is adopted is 95, the number of times historical processing model B is adopted is 760, and the number of times historical processing model C is adopted is 46. Then historical processing model B is used as the preset processing model, historical processing model A is used as the first backup processing model, and historical processing model C is used as the second backup processing model.
  • determining the updated processing model from the spare processing models associated with the instrument identifier and the processing identifier may be, for example, using the first spare processing model as the updated processing model.
  • the data format that can be processed by the determined target processing model is consistent with the data format of the omics data to be processed, thereby further improving the degree of automation in the omics data management process and reducing resource waste.
  • it also includes determining an extended processing model script, an extended instrument identifier, an extended processing identifier, an extended data category, and an extended processing model according to a received extension request; obtaining a preset construction extension script according to the extension request; updating the preset construction extension script using the extended processing model script, the extended instrument identifier, the extended processing identifier, the extended data category, and the extended processing model to obtain an executable construction extension script; and running the executable construction extension script to associate and store the extended processing model script, the extended instrument identifier, the extended processing identifier, the extended data category, and the extended processing model for processing the omics data to be processed.
  • the user can expand the items that can be processed stored on the server so that the items can be directly called the next time the processing is performed.
  • a corresponding template script for realizing the extended function is configured in advance for the extended request.
  • the template script lacks an extended processing model script, an extended instrument identifier, an extended processing identifier, an extended data category, and an extended processing model. If the extended processing model script, the extended instrument identifier, the extended processing identifier, the extended data category, and the extended processing model are filled into the template script, an executable program is obtained.
  • the extended processing model script is, for example, a template script corresponding to the extended processing model.
  • the extended instrument identifier is the identifier of the instrument that collects the extended omics data to be processed.
  • the extended processing identifier is the identifier for performing corresponding processing on the extended omics data to be processed.
  • the extended data category is the data category associated with the extended instrument identifier.
  • FIG3A is a flowchart of an omics data processing method according to another embodiment of the present specification.
  • an omics data processing process is described, but more or fewer operation steps may be included based on conventional or non-creative labor.
  • the method may include:
  • the processing identification that the user needs to input through the user terminal also includes a specified data category, which represents the data category corresponding to the omics data to be processed input by the user through the user terminal. Then, based on the specified data category and the preset data category corresponding to the instrument identification, a more accurate target processing model is determined for omics data processing.
  • a preset data category corresponding to each preset instrument identifier is pre-associated.
  • the preset data category is a category at the omics level, such as genome, transcriptome, proteome, and metabolome.
  • the specified data category is also a category representing the omics level, such as genome, transcriptome, proteome, and metabolome.
  • the associated first preset data category is determined from multiple preset data categories.
  • the first preset data category is matched for consistency with the designated data category included in the processing identifier to determine a matching value.
  • the matching value satisfies the preset condition
  • it is determined that the first preset category is consistent with the designated data category.
  • the matching data does not meet the preset condition
  • the consistency matching for the first preset data category and the designated data category included in the processing identifier can be performed by using a text similarity processing model to determine the similarity between the first preset data category and the designated data category, and using the similarity as the matching value.
  • the text similarity processing model can be, for example, any model that can determine the similarity between two words or sentences.
  • the preset condition can be, for example, whether it is greater than or equal to a preset threshold. When it is greater than or equal to the preset threshold, it is determined that the matching data meets the preset threshold, otherwise it is determined that the matching value does not meet the preset threshold.
  • the preset threshold can be, for example, 0.99.
  • S3313 is executed. Specifically, a first preset processing model matching the first preset data category is determined from a plurality of preset processing models, and the first preset processing model is used as the target processing model.
  • FIG3B is a flowchart of an omics data processing method according to another embodiment of the present specification.
  • an omics data processing process is described, but more or fewer operation steps may be included based on conventional or non-creative labor.
  • the method may include:
  • S3314 to S3316 are executed.
  • S3324 to S3325 can also be executed.
  • a data category confirmation request is sent to the user terminal that sends the omics data processing request, so that the user can select or fill in the updated data category through the user terminal.
  • the data category confirmation request may, for example, include the first preset data category, the specified data category, and others. It should be noted that when the user selects "Other" through the user terminal, a control that allows the user to enter information through the user terminal is used to display an input text box for the user to enter the updated data category.
  • the user When the user sees the first preset data category, the designated data category, and others displayed on the user terminal, if it is believed that there is an accurate data category corresponding to the omics data to be processed in the first preset data category and the designated data category, the user selects the corresponding data category and sends the data category as an updated data category to the server through the user terminal. If it is believed that there is no accurate data category corresponding to the omics data to be processed in the first preset data category and the designated data category, the user selects others and enters the corresponding updated data category into the user terminal to send to the server.
  • the server After receiving the update data category, the server determines a third preset processing model that matches the update data category from a plurality of preset processing models, and uses the third preset processing model as a target processing model.
  • FIG4 is a flowchart of an omics data processing method according to another embodiment of the present specification.
  • an omics data processing process is described, but more or fewer operation steps may be included based on conventional or non-creative labor.
  • the method may include:
  • a corresponding processing script is configured in advance for each preset processing model for calling. After determining the target processing model to be used, there is no need for personnel to rewrite the corresponding script. Thus, the automation level of the omics experimental data processing process is improved and the waste of resources is reduced.
  • a corresponding processing script is configured in advance for each preset processing model.
  • the processing script is a template program that can be used to implement corresponding processing for target data.
  • the template program lacks target data to be processed. If the target data is filled into the template program, a program that can be run is obtained.
  • a preset processing script associated with the target processing model is determined from a plurality of processing scripts based on the target processing model.
  • the omics data to be processed is filled into a predetermined preset processing script to obtain an executable processing script, and then the executable processing script is run to obtain data processing result information, and the data processing result information is sent to a user terminal for visual display by the user terminal.
  • FIG5A is a schematic diagram of a method for storing omics data in an embodiment of the present specification.
  • an omics data storage process is described, but more or fewer operation steps may be included based on conventional or non-creative labor.
  • the method may include:
  • the user when the user only wants to store the unprocessed omics data obtained through the experiment for subsequent reference, the user can interact with the server through the user terminal to realize the storage of the unprocessed omics data, thereby improving the automation level of the omics experiment data storage process and reducing the waste of resources.
  • the omics data is sent to the server as the omics data to be processed included in the omics data storage request.
  • the server receives the omics data to be processed, it calls the data identification determination script to process the omics data to be processed, and obtains the omics data identification corresponding to the omics data to be processed.
  • the target database is determined.
  • it can also include associating the omics data identification with the omics data to be processed and the database address information to enable the user to extract the omics data to be processed.
  • the database address information is the address information corresponding to the storage space in the target database storing the omics data to be processed.
  • the omics data identifier is sent to the server as an extraction request through the user terminal.
  • the server determines the corresponding database address information according to the omics data identifier included in the extraction request, calls the extraction script to obtain the omics data to be processed from the target database based on the database address information, and sends the omics data to be processed to the user terminal for visual display by the user terminal.
  • FIG5B is a schematic diagram of a method for sharing data processing result information according to an embodiment of the present specification.
  • a data processing result information sharing process is described, but conventional or non-creative labor may include more or fewer operation steps.
  • the method may include:
  • the user can interact with the server through the user terminal to realize the sharing of the omics data to be processed, thereby improving the automation level of the omics experiment data sharing process and reducing the waste of resources.
  • a sharing request for the data processing result information is sent to the server.
  • the sharing request includes a user identifier corresponding to the user to be shared and a sharing identifier representing the data processing result information.
  • the server After receiving the sharing request, the server determines the data processing result information and the user identifier according to the sharing request.
  • a corresponding template script for implementing the sharing function is configured in advance for a sharing request. After receiving a sharing request, the template script associated with the sharing request is obtained and used as a preset sharing script. The template script lacks the content information and user ID to be shared. If the content information and user ID are filled into the template script, a program that can be run is obtained.
  • the data processing result information and the user identifier are respectively filled into the determined preset sharing script to update the preset sharing script and obtain an executable sharing script.
  • the executable sharing script is run to obtain the target address link, and the target address link is sent to the user terminal for visual display by the user terminal.
  • the user can share the target address link displayed by the user terminal to the corresponding user through the user terminal.
  • FIG6A is a schematic diagram of a structure of an omics data processing device according to an embodiment of the present specification. As shown in FIG6A , it includes:
  • a first determining unit 610 is configured to determine the omics data to be processed according to the omics data identifier included in the received omics data processing request, wherein the omics data processing request further includes the processing identifier;
  • a second determining unit 620 is used to determine an instrument identifier corresponding to an instrument that collects the omics data to be processed;
  • a third determining unit 630 is used to determine a target processing model according to the instrument identification and the processing identification.
  • the processing unit 640 is used to process the omics data to be processed based on the target processing model to obtain data processing result information.
  • the implementation of the above device can refer to the implementation of the above method, and the repeated parts will not be repeated.
  • FIG6B is a schematic diagram of a structure of an omics data processing device according to another embodiment of the present specification. As shown in FIG6B , it includes:
  • a fourth determining unit 650 is configured to determine, according to the received sharing request for the data processing result information, a user identifier included in the sharing request;
  • the acquisition unit 660 is used to acquire a preset sharing script according to the sharing request
  • An updating unit 670 is used to update a preset sharing script using the data processing result information and the user identifier to obtain an executable sharing script
  • the running unit 680 is used to run the executable sharing script to obtain a target address link.
  • the implementation of the above device can refer to the implementation of the above method, and the repeated parts will not be repeated.
  • FIG6C is a schematic diagram of the structure of an omics data processing device according to an embodiment of the present specification. As shown in FIG6C , it includes:
  • a fifth determining unit 6010 is configured to determine an omics data identifier according to the omics data to be processed included in the received omics data storage request;
  • the storage unit 6020 is used to store the to-be-processed omics data into a target database based on the omics data identifier.
  • the implementation of the above device can refer to the implementation of the above method, and the repeated parts will not be repeated.
  • the apparatus in this specification can be a computer device in this embodiment, and the method of this specification is executed.
  • the computer device 702 may include one or more processing devices 704, such as one or more central processing units (CPUs), and each processing unit may implement one or more hardware threads.
  • the computer device 702 may also include any storage resource 706, which is used to store any kind of information such as code, settings, data, etc.
  • the storage resource 706 may include any one or more combinations of the following: any type of RAM, any type of ROM, flash memory device, hard disk, optical disk, etc. More generally, any storage resource can use any technology to store information.
  • any storage resource can provide volatile or non-volatile retention of information.
  • any storage resource can represent a fixed or removable component of the computer device 702.
  • the processing device 704 executes an associated instruction stored in any storage resource or a combination of storage resources
  • the computer device 702 can perform any operation of the associated instruction.
  • the computer device 702 also includes one or more drive mechanisms 708 for interacting with any storage resources, such as a hard disk drive mechanism, an optical disk drive mechanism, and the like.
  • the computer device 702 may also include an input/output module 710 (I/O) for receiving various inputs (via input devices 712) and for providing various outputs (via output devices 714).
  • a specific output mechanism may include a presentation device 716 and an associated graphical user interface (GUI) 718.
  • GUI graphical user interface
  • the input/output module 710 (I/O), the input device 712, and the output device 714 may not be included, and the computer device 702 may be used as a computer device in a network.
  • the computer device 702 may also include one or more network interfaces 720 for exchanging data with other devices via one or more communication links 722.
  • One or more communication buses 724 couple the components described above together.
  • the communication link 722 may be implemented in any manner, for example, through a local area network, a wide area network (e.g., the Internet), a point-to-point connection, etc., or any combination thereof.
  • the communication link 722 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc. governed by any protocol or combination of protocols.
  • the embodiments of the present specification also provide a computer-readable storage medium, which stores a computer program.
  • the computer program is executed by a processor, the above method is implemented.
  • the embodiments of this specification also provide a computer program product, which includes a computer program.
  • a computer program product which includes a computer program.
  • this specification may be provided as methods, systems, or computer program products. Therefore, this specification may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, this specification may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本说明书涉及生物数据处理技术领域,尤其涉及一种组学数据处理方法、装置及计算机设备。其中组学数据处理方法包括根据接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据,组学数据处理请求还包括处理标识;确定与采集待处理组学数据的仪器对应的仪器标识;根据仪器标识和处理标识,确定目标处理模型;以及基于目标处理模型,对待处理组学数据进行处理,得到数据处理结果信息。利用本说明书实施例,实现了在确定待处理组学数据、处理标识和仪器标识之后,基于仪器标识和处理标识,自动确定目标处理模型,进而依据该目标处理模型,针对待处理组学数据进行处理,从而提高了组学数据管理过程中的自动化程度,降低了资源浪费。

Description

一种组学数据处理方法、装置及计算机设备 技术领域
本说明书涉及生物数据处理技术领域,尤其涉及一种组学数据处理方法、装置及计算机设备。
背景技术
目前,在生物领域,针对通过实验得到的组学实验数据进行管理时,需要自行编写对应的用于存储或用于数据处理的脚本。且,由于生物领域,不同实验得到的数据所属的数据类别并不相同,针对不同数据类别的数据,需要采用不同的处理模型进行处理。因此,在确定和编写用于数据处理的脚本时,还需要人工确定该数据所属的数据类别,进而基于该数据类别适应性编写对应的脚本。由此,组学实验数据管理的自动化程度较低,导致对数据管理人员的要求较高,从而造成资源浪费。
如何提高组学实验数据管理的自动化程度以降低资源浪费现有技术中亟需解决的问题。
发明内容
为解决现有技术中的问题,本说明书实施例提供了一种组学数据处理方法、装置、计算机设备及存储介质,实现了在确定待处理组学数据、处理标识和仪器标识之后,基于仪器标识和处理标识,自动确定目标处理模型,进而依据该目标处理模型,针对待处理组学数据进行处理,从而提高了组学数据管理过程中的自动化程度,降低了资源浪费。
为了解决上述技术问题,本说明书的具体技术方案如下:
一方面,本说明书实施例提供了一种组学数据处理方法,包括,
根据接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据,所述组学数据处理请求还包括处理标识;
确定与采集所述待处理组学数据的仪器对应的仪器标识;
根据所述仪器标识和所述处理标识,确定目标处理模型;以及
基于所述目标处理模型,对所述待处理组学数据进行处理,得到数据处理结果信息。
进一步,在所述根据接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据之前进一步包括:
根据接收到的组学数据存储请求包括的所述待处理组学数据,确定所述组学数据标识;以及
基于所述组学数据标识,将所述待处理组学数据存储至目标数据库。
进一步,该处理标识包括指定数据类别,所述根据所述仪器标识和所述处理标识,确定目标处理模型进一步包括,
确定与所述仪器标识关联的第一预设数据类别;
判断所述第一预设数据类别与所述指定数据类别是否一致;以及
在确定所述第一预设数据类别与所述指定数据类别一致的情况下,确定与所述第一预设数据类别对应的第一预设处理模型为所述目标处理模型。
进一步,还包括,在确定所述第一预设数据类别与所述指定数据类别不一致的情况下,提取所述待处理组学数据的特征信息;
基于所述特征信息,从多个预设数据类别中确定第二预设数据类别;以及
将与所述第二预设数据类别对应的第二预设处理模型作为所述目标处理模型;
或;
在确定所述第一预设数据类别与所述指定数据类别不一致的情况下,发送数据类别确认请求;以及
将与接收到的更新数据类别对应的第三预设处理模型作为所述目标处理模型。
进一步,该基于所述目标处理模型,对所述待处理组学数据进行处理,得到数据处理结果信息进一步包括,
获取与所述目标处理模型对应的预设处理脚本;
利用所述待处理组学数据,对所述预设处理脚本进行更新,得到可运行处理脚本;以及
运行所述可运行处理脚本,得到所述数据处理结果信息。
进一步,该在所述基于所述目标处理模型,对所述待处理组学数据进行处理,得到数据处理结果信息之后,进一步包括,
根据接收到针对所述数据处理结果信息的分享请求,确定所述分享请求包括的用户标识;
根据所述分享请求,获取预设分享脚本;
利用所述数据处理结果信息和所述用户标识,对所述预设分享脚本进行更新,得到可运行分享脚本;以及
运行所述可运行分享脚本,得到目标地址链接。
另一方面,本说明书实施例还提供了一种组学数据处理装置,包括,
第一确定单元,用于根据接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据,所述组学数据处理请求还包括处理标识;
第二确定单元,用于确定与采集所述待处理组学数据的仪器对应的仪器标识;
第三确定单元,用于根据所述仪器标识和所述处理标识,确定目标处理模型;以及
处理单元,用于基于所述目标处理模型,对所述待处理组学数据进行处理,得到数据处理结果信息。
进一步,在所述处理单元之后,进一步包括,
第四确定单元,用于根据接收到针对所述数据处理结果信息的分享请求,确定所述分享请求包括的用户标识;
获取单元,用于根据所述分享请求,获取预设分享脚本;
更新单元,用于利用所述数据处理结果信息和所述用户标识,对所述预设分享脚本进行更新,得到可运行分享脚本;以及
运行单元,用于运行所述可运行分享脚本,得到目标地址链接。
另一方面,本说明书实施例还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述的方法。
另一方面,本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,该计算机指令被处理器执行时实现上述的方法。
利用本说明书实施例,基于接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据;确定与待处理组学数据对应的仪器标识;以该仪器标识和处理标识为索引,确定对应的数据类别,以确定对应的目标处理模型。进而,基于该目标处理模型,对待处理组学数据进行处理,得到数据处理结果信息。从而实现了自动化确定与待处理组学数据对应的数据类别,进而自动确定对应的目标处理模型,以完成针对待处理组学数据的处理。由此,提高了组学数据管理过程中的自动化程度,降低了资源浪费。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对 于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1所示为本说明书实施例一种组学数据处理方法的实施系统示意图;
图2所示为本说明书实施例一种组学数据处理方法的流程图;
图3A所示为本说明书另一实施例一种组学数据处理方法的流程图;
图3B所示为本说明书另一实施例一种组学数据处理方法的流程图;
图4所示为本说明书另一实施例一种组学数据处理方法的流程图;
图5A所示为本说明书实施例一种组学数据存储方法的原理图;
图5B所示为本说明书实施例一种数据处理结果信息分享方法的示意图;
图6A所示为本说明书实施例一种组学数据处理装置的结构示意图;
图6B所示为本说明书另一实施例的一种组学数据处理装置的结构示意图;
图6C所示为本说明书另一实施例的一种组学数据处理装置的结构示意图;
图7为本说明书实施例一种计算机设备的结构示意图。
【附图标记说明】
101、用户终端;102、服务器;610、第一确定单元;620、第二确定单元;630、第三确定单元;640、处理单元;650、第四确定单元;660、获取单元;670、更新单元;680、运行单元;6010、第五确定单元;6020、存储单元;702、计算机设备;704、处理设备;706、存储资源;708、驱动机构;710、输入/输出模块;712、输入设备;714、输出设备;716、呈现设备;718、图形用户接口;720、网络接口;722、通信链路;724、通信总线。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变 形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、装置、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
在本说明书的技术方案中,所涉及的组学实验数据的收集、存储、使用、加工、传输、提供、公开和应用等处理,均符合相关法律法规的规定,采取了必要保密措施,且不违背公序良俗。
图1所示为本说明书实施例一种组学数据处理方法的实施系统示意图,可以包括:用户终端101和服务器102,用户终端101和服务器102之间通过网络进行通信,网络可以包括局域网(Local Area Network,简称为LAN)、广域网(Wide Area Network,简称为WAN)、因特网或其组合,并连接至网站、用户设备(例如计算设备)和后端系统。服务器102在接受到用户通过用户终端101发送的组学数据处理请求后,基于该组学数据处理请求包括的组学数据标识,确定待处理组学数据;确定与待处理组学数据对应的仪器标识;依据仪器标识和组学数据处理请求包括的处理标识,确定目标处理模型;进而利用该目标处理模型对待处理组学数据进行处理,得到数据处理结果信息,并将该数据处理结果信息发送至用户终端101。此外,服务器102在接收到用户终端101发送的针对数据处理结果信息的分享请求时,确定分享请求包括的用户标识;根据分享请求,获取预设分享脚本;利用数据处理结果信息和用户标识,对预设分享脚本进行更新,得到可运行分享脚本;以及运行可运行分享脚本,得到目标地址链接,并将该目标地址链接发送至用户终端101,以供用户通过用户终端101分享至其他用户终端。再者,服务器102在接收到用户终端101发送的组学数据存储请求时,还可以针对要存储的待处理组学数据进行存储。
可选地,服务器102可以是云计算系统的节点(图中未显示),或者每个服务器102可以是单独的云计算系统,包括由网络互连并作为分布式处理系统工作的多台计算机。
在一个可选的实施例中,用户终端103可以包括电子设备不限于智能手机、采集设备、台式计算机、平板电脑、笔记本电脑、智能音箱、数字助理、增强现实(AR,Augmented Reality)/虚拟现实(VR,Virtual Reality)设备、智能可穿戴设备等类型的电子设 备。可选的,电子设备上运行的操作系统可以包括但不限于安卓系统、IOS系统、Linux、Windows等。
此外,需要说明的是,图1所示的仅仅是本说明书提供的一种应用环境,在实际应用中,还可以包括多个用户终端101,本说明书不做限制。
如图2所示为本说明书实施例一种组学数据处理方法的流程图。在本图中描述了组学数据处理过程,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的系统或装置产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行。具体的如图2所示,方法可以包括:
S210,根据接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据,组学数据处理请求还包括处理标识;
S220,确定与采集待处理组学数据的仪器对应的仪器标识;
S230,根据仪器标识和处理标识,确定目标处理模型;
S240,基于目标处理模型,对待处理组学数据进行处理,得到数据处理结果信息。
利用本说明书实施例,基于接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据;确定与待处理组学数据对应的仪器标识;以该仪器标识和处理标识为索引,确定对应的数据类别,以确定对应的目标处理模型。进而,基于该目标处理模型,对待处理组学数据进行处理,得到数据处理结果信息。从而实现了自动化确定与待处理组学数据对应的数据类别,进而自动确定对应的目标处理模型,以完成针对待处理组学数据的处理。由此,提高了组学数据管理过程中的自动化程度,降低了资源浪费。
根据本说明书的一个实施例,在用户想针对通过实验得到的待处理组学数据进行分析处理时,通过用户终端发送包括处理标识和与待处理组学数据对应的组学数据标识至服务器。组学数据标识表征可以索引到该待处理组学数据的唯一标识。处理标识表征可以索引到处理需求的唯一标识,例如,针对待处理组学数据进行排序,得到对应序列数据,则该处理标识例如可以为可以索引至处理需求为“确定序列数据”的唯一标识。
在接收到组学数据标识后,根据该组学数据标识,从数据库中确定待处理组学数据。
需要注意的是,在用户想针对通过实验得到的待处理组学数据进行分析处理时,也可以通过用户终端发送待处理组学数据和处理标识至服务器。
若从数据库中,确定待处理组学数据时,基于该待处理组学数据或组学数据标识,确定仪器标识。该仪器标识表征与采集该待处理组学数据的仪器对应的唯一标识。该仪 器标识为用户通过用户终端存储待处理组学数据时,输入的信息。在进行待处理组学数据存储时,服务器将仪器标识与待处理组学数据关联存储或与和该待处理组学数据对应的组学数据标识关联存储。由此,在确定待处理组学数据和组学数据标识后,基于该待处理组学数据和组学数据标识中至少一个,确定对应的仪器标识。
若用户并未通过用户终端将待处理组学数据预先存储在数据库时,发送的组学数据处理请求中除了处理组学数据和处理标识至服务器之外,还包括仪器标识。
预先针对每种预设仪器标识关联的至少一个预设处理模型。此外,每个预设处理模型除了与预设仪器标识相关联,还可以与预设处理标识相关联。也就是说,一个预设仪器标识和一个预设处理标识与一个预设处理模型相互相关联。配置处理模型为用于处理组学数据的模型,例如,归一化模型、标准化模型、单变量分析模型和主成分分析模型等。需要注意的是,也可以针对多个预设仪器标识和一个预设处理标识与一个预设处理模型相互关联,例如,基于两组待处理组学数据进行处理,得到氨基酸的表达程度时,则将与该两组待处理组学数据对应的两个预设仪器标识和与“确定氨基酸的表达程度”对应的预设处理标识和对应的进行氨基酸的表达程度的预设处理模型相关联。从而实现了,针对多组待处理组学数据进行处理,得到一个数据处理结果信息。
在预先针对每个预设仪器标识和每个预设处理标识,确定预设处理模型时,例如可以获取历史组学处理数据集合,针对每个预设仪器标识和每个预设处理标识,从历史组学处理数据集合中确定包括该预设仪器标识和预设处理标识的多个目标历史组学处理数据,针对该多个目标历史组学处理数据进行处理模型提取,确定与每个目标历史组学处理数据对应的历史处理模型;确定每个历史处理模型被采用的次数,并将与被采用的次数最多次对应的历史处理模型作为与该预设仪器标识和该预设处理标识关联的预设处理模型。
在确定仪器标识和处理标识之后,从该多个预设处理模型中,确定与该仪器标识和处理标识关联的目标预设处理模型,并将该目标预设处理模型作为目标处理模型。
在确定目标处理模型之后,利用与该目标处理模型对应的处理脚本对待处理组学数据进行处理,得到与该待处理组学数据对应的数据处理结果信息,并将该数据处理结果信息发送至用户终端,以供用户终端进行可视化展示。
根据本说明书的另一个实施例,根据仪器标识和处理标识,确定目标处理模型之后例如还可以包括:针对待处理组学数据,确定对应的组学数据格式;确定目标处理模型可以处理的配置数据格式;判断该组学数据格式与该配置数据格式是否一致;在确定该 组学数据格式与该配置数据格式一致的情况下,基于该目标处理模型,对待处理组学数据进行处理,得到数据待处理结果信息。
在确定该组学数据格式与该配置数据格式不一致的情况下,从与该仪器标识和处理标识关联的备用处理模型中确定更新处理模型,并将该更新处理模型作为该目标处理模型,以基于该目标处理模型,对待处理组学数据进行处理,得到数据待处理结果信息。
在基于历史组学处理数据集合,确定预设处理模型时,确定了多个历史处理模型,并将被采用的次数最多次对应的历史处理模型作为预设处理模型。此外,针对与其他被采用的次数对应的历史处理模型按照该被采用的次数的顺序,进行排序作为备用处理模型与该预设仪器标识和该预设处理标识关联。例如,历史处理模型A的采用的次数为95,历史处理模型B的采用的次数为760,历史处理模型C的采用的次数为46。则将历史处理模型B作为预设处理模型,将历史处理模型A作为第一个备用处理模型,将历史处理模型C作为第二个备用处理模型。
具体地,从与该仪器标识和处理标识关联的备用处理模型中确定更新处理模型例如可以为将第一个备用处理模型作为更新处理模型。
由此,保证了所确定的目标处理模型可以处理的数据格式与该待处理组学数据的数据格式一致,从而进一步提高了组学数据管理过程中的自动化程度,降低了资源浪费。
根据本说明的另一个实施例,还包括,根据接收到的扩展请求,确定扩展处理模型脚本、扩展仪器标识、扩展处理标识、扩展数据类别和扩展处理模型;根据该扩展请求,获取预设构建扩展脚本;利用该扩展处理模型脚本、扩展仪器标识、扩展处理标识、扩展数据类别和扩展处理模型,对该预设构建扩展脚本进行更新,得到可运行构建扩展脚本;以及运行该可运行构建扩展脚本,将扩展处理模型脚本、扩展仪器标识、扩展处理标识、扩展数据类别和扩展处理模型进行关联存储,以用于对待处理组学数据处理。
在用户想针对组学数据进行处理时,发现服务器无法进行该项处理,可以针对服务器存储的可以处理的事项进行扩展,以在下一次进行该项处理时,直接调用。
预先针对扩展请求,配置对应的用于实现扩展功能的模板脚本。该模板脚本中缺少扩展处理模型脚本、扩展仪器标识、扩展处理标识、扩展数据类别和扩展处理模型,若将扩展处理模型脚本、扩展仪器标识、扩展处理标识、扩展数据类别和扩展处理模型填充入该模板脚本中,则得到可以运行的程序。
扩展处理模型脚本例如为与扩展处理模型对应的模板脚本。扩展仪器标识为采集扩展待处理组学数据的仪器的标识。扩展处理标识为针对该扩展待处理组学数据进行相应 处理的标识。扩展数据类别为与该扩展仪器标识关联的数据类别。由此,实现了用户自行扩充处理的模型,以扩展该服务器可以进行处理的范围。
图3A所示为本说明书另一实施例一种组学数据处理方法的流程图。在本图中描述了一种组学数据处理过程,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。具体的如图3A所示,方法可以包括:
S3311,确定与仪器标识关联的第一预设数据类别;
S3312,判断第一预设数据类别与指定数据类别是否一致;
S3313,在确定第一预设数据类别与指定数据类别一致的情况下,确定与第一预设数据类别对应的第一预设处理模型为目标处理模型;
S3314,在确定第一预设数据类别与指定数据类别不一致的情况下,提取待处理组学数据的特征信息;
S3315,基于特征信息,从多个预设数据类别中确定第二预设数据类别;
S3316,将与第二预设数据类别对应的第二预设处理模型作为目标处理模型。
利用本说明书实施例,由于存在用户输入错误仪器标识的情况,为了提高确定的目标处理模型的准确率,在需要用户通过用户终端输入的处理标识中还包括指定数据类别,该指定数据类别表征用户通过用户终端输入的与该待处理组学数据对应的数据类别。进而基于该指定数据类别和基于仪器标识对应的预设数据类别,确定较准确的目标处理模型。以用于组学数据处理。
根据本说明书的另一个实施例,预先针对每个预设仪器标识关联对应的预设数据类别。该预设数据类别为组学层面的类别,例如,基因组、转录组、蛋白组和代谢组等。指定数据类别也为表征组学层面的类别,例如,基因组、转录组、蛋白组和代谢组等。
基于确定的仪器类别,从多个预设数据类别中确定关联的第一预设数据类别。针对该第一预设数据类别与处理标识包括的指定数据类别进行一致性匹配,确定匹配数值。在确定该匹配数值满足预设条件的情况下,确定该第一预设类别与指定数据类别一致,在确定该匹配数据不满足预设条件的情况下,确定该第一预设类别与指定数据类别不一致。具体地,针对该第一预设数据类别与处理标识包括的指定数据类别进行一致性匹配可以为,利用文字相似度处理模型,确定第一预设数据类别与指定数据类别之间的相似度,并将该相似度作为匹配数值。文字相似度处理模型例如可以为任意可以确定两个词语或句子相似程度的模型。预设条件例如可以为,是否大于或等于预设阈值,在大于或 等于预设阈值的情况下,确定该匹配数据满足预设阈值,否则确定该匹配数值不满足该预设阈值。该预设阈值例如可以为0.99。
在确定第一预设数据类别与指定数据类别一致的情况下,执行S3313。具体地,从多个预设处理模型中,确定与该第一预设数据类别相匹配的第一预设处理模型,并将该第一预设处理模型作为目标处理模型。
在确定第一预设数据类别与指定数据类别不一致的情况下,执行S3314~S3316。具体地,提取待处理组学数据的特征信息,并基于该特征信息,从多个预设数据类别中确定第二预设数据类别。具体地,可以采用任意可以基于特征信息,针对待处理组学数据进行分类的模型实现该步骤,例如,支持向量机模型,训练后的神经网络模型和随机森林模型等。在确定第二预设数据类别之后,从多个预设处理模型中,确定与该第二预设数据类别相匹配的第二预设处理模型,并将该第二预设处理模型作为目标处理模型,以用于针对待处理组学数据的处理。
图3B所示为本说明书另一实施例一种组学数据处理方法的流程图。在本图中描述了一种组学数据处理过程,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。具体的如图3B所示,方法可以包括:
S3324,发送数据类别确认请求;
S3325,将与接收到的更新数据类别对应的第三预设处理模型作为目标处理模型。
根据本说明书的另一个实施例,在图3A中在确定第一预设数据类别与指定数据类别不一致的情况下,执行S3314~S3316。除此之外,在在确定第一预设数据类别与指定数据类别不一致的情况下,还可以执行S3324~S3325。具体地,发送数据类别确认请求至发送组学数据处理请求的用户终端,以供用户通过用户终端选择或填入更新数据类别。该数据类别确认请求例如可以包括第一预设数据类别、指定数据类别和其他。需要注意的时,在用户通过用户终端选择“其他”时,利用可供用户通过用户终端输入信息的控件,展示可输入文字框,以供用户输入更新数据类别。
用户在看到用户终端显示的第一预设数据类别、指定数据类别和其他时,若认为第一预设数据类别和指定数据类别中存在与该待处理组学数据对应的准确数据类别时,选择对应的数据类别,并通过用户终端将该数据类别作为更新数据类别发送至服务器。若认为第一预设数据类别和指定数据类别中不存在与该待处理组学数据对应的准确数据类别时,选择其他,并输入对应的更新数据类别至用户终端,以发送至服务器。
服务器在接收到更新数据类别后,从多个预设处理模型中,确定与该更新数据类别相匹配的第三预设处理模型,并将该第三预设处理模型作为目标处理模型。
图4所示为本说明书另一实施例一种组学数据处理方法的流程图。在本图中描述了一种组学数据处理过程,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。具体的如图4所示,方法可以包括:
S441,获取与目标处理模型对应的预设处理脚本;
S442,利用待处理组学数据,对预设处理脚本进行更新,得到可运行处理脚本;
S443,运行可运行处理脚本,得到数据处理结果信息。
利用本说明书实施例,预先针对每个预设处理模型,配置对应的处理脚本,以供调用。实现了在确定要用的目标处理模型之后,无需人员再次编写对应的脚本。从而,提高了组学实验数据处理过程的自动化程度,降低了资源的浪费。
根据本说明书的另一个实施例,预先针对每个预设处理模型,配置对应的处理脚本。该处理脚本为可以用于实现针对目标数据进行对应处理的模板程序。该模板程序中缺少需要处理的目标数据,若将目标数据填充入该模板程序中,则得到可以运行的程序。
在确定目标处理模型后,基于该目标处理模型,从多个处理脚本中,确定与该目标处理模型关联的预设处理脚本。
将待处理组学数据填充入确定的预设处理脚本中,得到可运行处理脚本。进而运行该可运行处理脚本,得到数据处理结果信息,并将该数据处理结果信息发送至用户终端,以供用户终端进行可视化展示。
图5A所示为本说明书实施例一种组学数据存储方法的原理图。在本图中描述了一种组学数据存储过程,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。具体的如图5A所示,方法可以包括:
S5010,根据接收到的组学数据存储请求包括的待处理组学数据,确定组学数据标识;
S5020,基于组学数据标识,将待处理组学数据存储至目标数据库。
利用本说明书实施例,在用户仅想将通过实验得到的待处理组学数据存储,以用于后续查阅时,用户可以通过用户终端与服务器进行交互,以实现针对待处理组学数据的存储,提高了组学实验数据存储过程的自动化程度,降低了资源的浪费。
根据本说明书的另一个实施例,在用户想将得到的组学数据存储至该服务器的数据库中时,将该组学数据作为组学数据存储请求包括的待处理组学数据发送至服务器。服 务器在接收到该待处理组学数据时,调用数据标识确定脚本针对该待处理组学数据进行处理,得到与该待处理组学数据对应的组学数据标识。并基于该组学数据标识,确定目标数据库。调用存储脚本,以将该待处理组学数据存储于该目标数据库中。例如,还可以包括将该组学数据标识与该待处理组学数据和数据库地址信息相关联,以实现用户提取该待处理组学数据。该数据库地址信息为与存储该待处理组学数据的目标数据库中的存储空间对应的地址信息。
根据本说明书的另一个实施例,在用户像针对存储后的待处理组学数据进行提取时,通过用户终端将该组学数据标识作为提取请求发送至服务器。服务器在接收到该提取请求后,根据该提取请求包括的组学数据标识,确定对应的数据库地址信息,调用提取脚本,以基于该数据库地址信息从目标数据库中获取该待处理组学数据,并将该待处理组学数据发送至用户终端,以供用户终端进行可视化显示。
图5B所示为本说明书实施例一种数据处理结果信息分享方法的示意图。在本图中描述了一种数据处理结果信息分享过程,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。具体的如图5B所示,方法可以包括:
S550,根据接收到针对数据处理结果信息的分享请求,确定分享请求包括的用户标识;
S560,根据分享请求,获取预设分享脚本;
S570,利用数据处理结果信息和用户标识,对预设分享脚本进行更新,得到可运行分享脚本;
S580,运行可运行分享脚本,得到目标地址链接。
利用本说明书实施例,在用户看到数据处理结果信息之后,若想将该数据处理结果信息分享至其他用户时,用户可以通过用户终端与服务器进行交互,以实现针对待处理组学数据的分享,提高了组学实验数据分享过程的自动化程度,降低了资源的浪费。
根据本说明书的另一个实施例,在用户想将用户终端显示的数据处理结果信息展示给其他用户时,发送针对数据处理结果信息的分享请求至服务器。该分享请求包括与需要分享的用户对应的用户标识和表征该数据处理结果信息的分享标识。
服务器在接收到该分享请求后,根据该分享请求,确定该数据处理结果信息和用户标识。
预先针对分享请求,配置对应的用于实现分享功能的模板脚本。在接受到分享请求后,获取与分享请求关联的模板脚本,并将该模板脚本作为预设分享脚本。该模板脚本 中缺少需要分享的内容信息和用户标识,若将内容信息和用户标识填充入该模板脚本中,则得到可以运行的程序。
将数据处理结果信息和用户标识分别填充入确定的预设分享脚本中,以对该预设分享脚本进行更新,得到可运行分享脚本。
进而运行该可运行分享脚本,得到目标地址链接,并将该目标地址链接发送至用户终端,以供用户终端进行可视化展示。从而,用户可以将用户终端展示的目标地址链接通过用户终端分享至对应的用户。
图6A所示为本说明书实施例一种组学数据处理装置的结构示意图。如图6A所示,包括,
第一确定单元610,用于根据接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据,组学数据处理请求还包括处理标识;
第二确定单元620,用于确定与采集待处理组学数据的仪器对应的仪器标识;
第三确定单元630,用于根据仪器标识和处理标识,确定目标处理模型;以及
处理单元640,用于基于目标处理模型,对待处理组学数据进行处理,得到数据处理结果信息。
由于上述装置解决问题的原理与上述方法相似,因此上述装置的实施可以参见上述方法的实施,重复之处不再赘述。
图6B所示为本说明书另一实施例的一种组学数据处理装置的结构示意图。如图6B所示,包括,
第四确定单元650,用于根据接收到针对数据处理结果信息的分享请求,确定分享请求包括的用户标识;
获取单元660,用于根据分享请求,获取预设分享脚本;
更新单元670,用于利用数据处理结果信息和用户标识,对预设分享脚本进行更新,得到可运行分享脚本;以及
运行单元680,用于运行可运行分享脚本,得到目标地址链接。
由于上述装置解决问题的原理与上述方法相似,因此上述装置的实施可以参见上述方法的实施,重复之处不再赘述。
图6C所示为本说明书实施例一种组学数据处理装置的结构示意图。如图6C所示,包括,
第五确定单元6010,用于根据接收到的组学数据存储请求包括的待处理组学数据,确定组学数据标识;以及
存储单元6020,用于基于组学数据标识,将待处理组学数据存储至目标数据库。
由于上述装置解决问题的原理与上述方法相似,因此上述装置的实施可以参见上述方法的实施,重复之处不再赘述。
如图7所示为本说明书实施例一种计算机设备的结构示意图,本说明书中的装置可以为本实施例中的计算机设备,执行上述本说明书的方法。计算机设备702可以包括一个或多个处理设备704,诸如一个或多个中央处理单元(CPU),每个处理单元可以实现一个或多个硬件线程。计算机设备702还可以包括任何存储资源706,其用于存储诸如代码、设置、数据等之类的任何种类的信息。非限制性的,比如,存储资源706可以包括以下任一项或多种组合:任何类型的RAM,任何类型的ROM,闪存设备,硬盘,光盘等。更一般地,任何存储资源都可以使用任何技术来存储信息。进一步地,任何存储资源可以提供信息的易失性或非易失性保留。进一步地,任何存储资源可以表示计算机设备702的固定或可移除部件。在一种情况下,当处理设备704执行被存储在任何存储资源或存储资源的组合中的相关联的指令时,计算机设备702可以执行相关联指令的任一操作。计算机设备702还包括用于与任何存储资源交互的一个或多个驱动机构708,诸如硬盘驱动机构、光盘驱动机构等。
计算机设备702还可以包括输入/输出模块710(I/O),其用于接收各种输入(经由输入设备712)和用于提供各种输出(经由输出设备714)。一个具体输出机构可以包括呈现设备716和相关联的图形用户接口(GUI)718。在其他实施例中,还可以不包括输入/输出模块710(I/O)、输入设备712以及输出设备714,仅作为网络中的一台计算机设备。计算机设备702还可以包括一个或多个网络接口720,其用于经由一个或多个通信链路722与其他设备交换数据。一个或多个通信总线724将上文所描述的部件耦合在一起。
通信链路722可以以任何方式实现,例如,通过局域网、广域网(例如,因特网)、点对点连接等、或其任何组合。通信链路722可以包括由任何协议或协议组合支配的硬连线链路、无线链路、路由器、网关功能、名称服务器等的任何组合。
本说明书实施例还提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现上述方法。
本说明书实施例还提供一种计算机程序产品,计算机程序产品包括计算机程序,计算机程序被处理器执行时实现上述方法。
本领域内的技术人员应明白,本说明书的实施例可提供为方法、系统、或计算机程序产品。因此,本说明书可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本说明书可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书是参照根据本说明书实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上的具体实施例,对本说明书的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上仅为本说明书的具体实施例而已,并不用于限定本说明书的保护范围,凡在本说明书的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书的保护范围之内。

Claims (10)

  1. 一种组学数据处理方法,其特征在于,包括:
    根据接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据,所述组学数据处理请求还包括处理标识;
    确定与采集所述待处理组学数据的仪器对应的仪器标识;
    根据所述仪器标识和所述处理标识,确定目标处理模型;以及
    基于所述目标处理模型,对所述待处理组学数据进行处理,得到数据处理结果信息。
  2. 根据权利要求1所述的方法,其特征在于,在所述根据接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据之前还包括:
    根据接收到的组学数据存储请求包括的所述待处理组学数据,确定所述组学数据标识;以及
    基于所述组学数据标识,将所述待处理组学数据存储至目标数据库。
  3. 根据权利要求1所述的方法,其特征在于,所述处理标识包括指定数据类别,所述根据所述仪器标识和所述处理标识,确定目标处理模型包括:
    确定与所述仪器标识关联的第一预设数据类别;
    判断所述第一预设数据类别与所述指定数据类别是否一致;以及
    在确定所述第一预设数据类别与所述指定数据类别一致的情况下,确定与所述第一预设数据类别对应的第一预设处理模型为所述目标处理模型。
  4. 根据权利要求3所述的方法,其特征在于,还包括:
    在确定所述第一预设数据类别与所述指定数据类别不一致的情况下,提取所述待处理组学数据的特征信息;
    基于所述特征信息,从多个预设数据类别中确定第二预设数据类别;以及
    将与所述第二预设数据类别对应的第二预设处理模型作为所述目标处理模型;
    或;
    在确定所述第一预设数据类别与所述指定数据类别不一致的情况下,发送数据类别确认请求;以及
    将与接收到的更新数据类别对应的第三预设处理模型作为所述目标处理模型。
  5. 根据权利要求1所述的方法,其特征在于,所述基于所述目标处理模型,对所述待处理组学数据进行处理,得到数据处理结果信息包括:
    获取与所述目标处理模型对应的预设处理脚本;
    利用所述待处理组学数据,对所述预设处理脚本进行更新,得到可运行处理脚本;以及
    运行所述可运行处理脚本,得到所述数据处理结果信息。
  6. 根据权利要求1所述的方法,其特征在于,在所述基于所述目标处理模型,对所述待处理组学数据进行处理,得到数据处理结果信息之后,还包括:
    根据接收到针对所述数据处理结果信息的分享请求,确定所述分享请求包括的用户标识;
    根据所述分享请求,获取预设分享脚本;
    利用所述数据处理结果信息和所述用户标识,对所述预设分享脚本进行更新,得到可运行分享脚本;以及
    运行所述可运行分享脚本,得到目标地址链接。
  7. 一种组学数据处理装置,其特征在于,包括:
    第一确定单元,用于根据接收到的组学数据处理请求包括的组学数据标识,确定待处理组学数据,所述组学数据处理请求还包括处理标识;
    第二确定单元,用于确定与采集所述待处理组学数据的仪器对应的仪器标识;
    第三确定单元,用于根据所述仪器标识和所述处理标识,确定目标处理模型;以及
    处理单元,用于基于所述目标处理模型,对所述待处理组学数据进行处理,得到数据处理结果信息。
  8. 根据权利要求7所述的装置,其特征在于,在所述处理单元之后,还包括:
    第四确定单元,用于根据接收到针对所述数据处理结果信息的分享请求,确定所述分享请求包括的用户标识;
    获取单元,用于根据所述分享请求,获取预设分享脚本;
    更新单元,用于利用所述数据处理结果信息和所述用户标识,对所述预设分享脚本进行更新,得到可运行分享脚本;以及
    运行单元,用于运行所述可运行分享脚本,得到目标地址链接。
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述权利要求1-6中任一项的方法。
  10. 一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述权利要求1-6任一项的方法。
PCT/CN2022/134484 2022-11-25 2022-11-25 一种组学数据处理方法、装置及计算机设备 WO2024108592A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/134484 WO2024108592A1 (zh) 2022-11-25 2022-11-25 一种组学数据处理方法、装置及计算机设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/134484 WO2024108592A1 (zh) 2022-11-25 2022-11-25 一种组学数据处理方法、装置及计算机设备

Publications (1)

Publication Number Publication Date
WO2024108592A1 true WO2024108592A1 (zh) 2024-05-30

Family

ID=91195058

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134484 WO2024108592A1 (zh) 2022-11-25 2022-11-25 一种组学数据处理方法、装置及计算机设备

Country Status (1)

Country Link
WO (1) WO2024108592A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111897828A (zh) * 2020-07-31 2020-11-06 广州视源电子科技股份有限公司 数据批处理实现方法、装置、设备及存储介质
CN113190295A (zh) * 2021-04-28 2021-07-30 维沃移动通信(深圳)有限公司 信息处理方法、处理装置和电子设备
CN113889181A (zh) * 2020-07-02 2022-01-04 华为技术有限公司 医学事件的分析方法及装置、计算机设备、存储介质
US20220262466A1 (en) * 2019-07-26 2022-08-18 Sartorius Stedim Data Analytics Ab Storing data from a process to produce a chemical, pharmaceutical, biopharmaceutical and/or biological product
CN115359846A (zh) * 2022-09-08 2022-11-18 上海氨探生物科技有限公司 一种组学数据的批次矫正方法、装置、存储介质及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220262466A1 (en) * 2019-07-26 2022-08-18 Sartorius Stedim Data Analytics Ab Storing data from a process to produce a chemical, pharmaceutical, biopharmaceutical and/or biological product
CN113889181A (zh) * 2020-07-02 2022-01-04 华为技术有限公司 医学事件的分析方法及装置、计算机设备、存储介质
CN111897828A (zh) * 2020-07-31 2020-11-06 广州视源电子科技股份有限公司 数据批处理实现方法、装置、设备及存储介质
CN113190295A (zh) * 2021-04-28 2021-07-30 维沃移动通信(深圳)有限公司 信息处理方法、处理装置和电子设备
CN115359846A (zh) * 2022-09-08 2022-11-18 上海氨探生物科技有限公司 一种组学数据的批次矫正方法、装置、存储介质及电子设备

Similar Documents

Publication Publication Date Title
US11392775B2 (en) Semantic recognition method, electronic device, and computer-readable storage medium
WO2019140828A1 (zh) 电子装置、分布式系统日志查询方法及存储介质
CN107844634B (zh) 多元通用模型平台建模方法、电子设备及计算机可读存储介质
WO2021184571A1 (zh) 动态表单生成方法、装置、计算机设备和存储介质
CN108388515B (zh) 测试数据生成方法、装置、设备以及计算机可读存储介质
US8869111B2 (en) Method and system for generating test cases for a software application
US20200286100A1 (en) Payment complaint method, device, server and readable storage medium
CN108415998B (zh) 应用依赖关系更新方法、终端、设备及存储介质
CN110765195A (zh) 一种数据解析方法、装置、存储介质及电子设备
WO2021022714A1 (zh) 跨区块链节点的消息处理方法及装置、设备、介质
WO2020119064A1 (zh) 互联网信息链式存储方法、装置、计算机设备及存储介质
CN111177113A (zh) 数据迁移方法、装置、计算机设备和存储介质
CN112988997A (zh) 智能客服的应答方法、系统、计算机设备及存储介质
CN111435367A (zh) 知识图谱的构建方法、系统、设备及存储介质
CN115794437A (zh) 微服务的调用方法、装置、计算机设备及存储介质
CN110018845B (zh) 元数据版本对比方法及装置
EP3901761A2 (en) Method, apparatus, and electronic device for processing visual data of deep model
CN112883088B (zh) 一种数据处理方法、装置、设备及存储介质
CN112559526A (zh) 数据表导出方法、装置、计算机设备及存储介质
WO2019071907A1 (zh) 基于操作页面识别帮助信息的方法及应用服务器
WO2019080419A1 (zh) 标准知识库的构建方法、电子装置及存储介质
WO2024108592A1 (zh) 一种组学数据处理方法、装置及计算机设备
CN110188106B (zh) 一种数据管理方法和装置
CN115470426B (zh) 浏览器内核确定方法、装置、计算机设备和存储介质
CN110727565B (zh) 一种网络设备平台信息收集方法及系统