CN113611424A - Method and device for knowledge mining of coronavirus associated data based on strain angle - Google Patents

Method and device for knowledge mining of coronavirus associated data based on strain angle Download PDF

Info

Publication number
CN113611424A
CN113611424A CN202110726433.5A CN202110726433A CN113611424A CN 113611424 A CN113611424 A CN 113611424A CN 202110726433 A CN202110726433 A CN 202110726433A CN 113611424 A CN113611424 A CN 113611424A
Authority
CN
China
Prior art keywords
data
coronavirus
node
strain
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110726433.5A
Other languages
Chinese (zh)
Inventor
孙清岚
范国梅
史文聿
吴林寰
马俊才
张幸姣
孙秀强
林思汝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Microbiology of CAS
Original Assignee
Institute of Microbiology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Microbiology of CAS filed Critical Institute of Microbiology of CAS
Priority to CN202110726433.5A priority Critical patent/CN113611424A/en
Publication of CN113611424A publication Critical patent/CN113611424A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a strain angle-based method, apparatus, electronic device, and medium for knowledge mining of coronavirus associated data. One specific implementation of the method comprises the following steps: acquiring first query data and second query data; based on a preset graph structure, searching a communication path between a first node corresponding to the first query data and a second node corresponding to the second query data, wherein the nodes in the graph structure comprise coronavirus names, coronavirus associated data and coronavirus strain data, edges in the graph structure are provided with preset weights, the weights represent step lengths required by establishing association between the corresponding two nodes, and the communication path passes through at least one node of the coronavirus strain data type; the communication path is shown. The method can provide the incidence relation among the incidence data and output the relevant strain information, and is beneficial to the realization of the knowledge mining of the coronavirus incidence data from the perspective of the strain.

Description

Method and device for knowledge mining of coronavirus associated data based on strain angle
Technical Field
The disclosure relates to the technical field of microorganisms, in particular to a method, a device, electronic equipment and a medium for knowledge mining of coronavirus associated data based on strain angles.
Background
Since the discovery of coronaviruses, scientific research on their viral structure, pathogenesis, infection, molecular biology, genome sequencing, etc. has resulted in a variety of relevant data.
Because the quantity of the associated data related to the coronavirus is large, the association relationship among the data needs to be mined. The understanding and research of coronavirus is now that genome sequencing is performed on a virus, whether the virus is a new virus species is identified from whole genome sequence alignment analysis, and then the encoding gene and functional protein of the virus are deeply researched and interpreted. Developments of analyzing the three-dimensional structure of protein crystals, scientific research articles, patent inventions, antibody drugs, and the like have been developed later. Almost all scientific research has been around strains. Therefore, how to mine the association relationship between the data around the strains becomes a problem to be solved.
Disclosure of Invention
The disclosure provides a method and a device for knowledge mining of coronavirus associated data based on strain angles and electronic equipment.
In a first aspect, the present disclosure provides a method of knowledge mining of coronavirus associated data based on strain angle, comprising:
acquiring first query data and second query data;
searching communication paths between a first node corresponding to the first query data and a second node corresponding to the second query data based on a preset graph structure, wherein the nodes in the graph structure comprise coronavirus names, coronavirus associated data and coronavirus strain data, edges in the graph structure are provided with preset weights, the weights represent step lengths required by establishing association between the corresponding two nodes, and the communication paths pass through at least one node of a coronavirus strain data type;
and displaying the communication path.
In some optional embodiments, the finding, based on a preset graph structure, a communication path between a first node corresponding to the first query data and a second node corresponding to the second query data includes:
searching the communication paths passing through at least one coronavirus strain data type node in the graph structure based on a heuristic algorithm of mixed integer programming.
In some alternative embodiments, the coronavirus associated data comprises at least one of coronavirus nucleic acid data, coronavirus protein data, coronavirus crystal structure data, coronavirus antibody data, coronavirus literature data, and coronavirus patent data.
In some alternative embodiments, the weight of an edge in the graph structure is determined by:
determining whether two nodes corresponding to the edge are directly related;
in response to determining yes, determining a weight of the edge to be 1;
and responding to the judgment result, determining the number N of the intermediaries between the two nodes corresponding to the edge, and determining the weight value of the edge as N + 1.
In some optional embodiments, the obtaining the first query data and the second query data includes:
acquiring first query data;
outputting at least one candidate datum, wherein the candidate datum is associated with the first query datum;
and determining the candidate data selected by the user as the second query data in response to the selection operation of the candidate data by the user.
In some alternative embodiments, the node of the coronavirus strain data type through which the communication path passes is a transit node.
In a second aspect, the present disclosure provides a device for knowledge mining of coronavirus associated data based on strain angle, comprising:
the acquisition module is used for acquiring first query data and second query data;
a searching module, configured to search, based on a preset graph structure, communication paths between a first node corresponding to the first query data and a second node corresponding to the second query data, where the nodes in the graph structure include a coronavirus name and coronavirus associated data, an edge in the graph structure has a preset weight, the weight represents a step size required for establishing association between two corresponding nodes, and the communication paths pass through nodes of at least one coronavirus strain data type;
and the display module is used for displaying the communication path.
In some optional embodiments, the lookup module is further configured to:
searching the communication paths passing through at least one coronavirus strain data type node in the graph structure based on a heuristic algorithm of mixed integer programming.
In some alternative embodiments, the coronavirus associated data comprises at least one of coronavirus nucleic acid data, coronavirus protein data, coronavirus crystal structure data, coronavirus antibody data, coronavirus literature data, and coronavirus patent data.
In some alternative embodiments, the weight of an edge in the graph structure is determined by:
determining whether two nodes corresponding to the edge are directly related;
in response to determining yes, determining a weight of the edge to be 1; and responding to the judgment result, determining the number N of the intermediaries between the two nodes corresponding to the edge, and determining the weight value of the edge as N + 1.
In some optional embodiments, the obtaining module is further configured to:
acquiring first query data;
outputting at least one candidate datum, wherein the candidate datum is associated with the first query datum;
and determining the candidate data selected by the user as the second query data in response to the selection operation of the candidate data by the user.
In some alternative embodiments, the node of the coronavirus strain data type through which the communication path passes is a transit node.
In a third aspect, the present disclosure provides an electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method as described in any embodiment of the first aspect of the disclosure.
In a fourth aspect, the present disclosure also provides a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method as described in any of the embodiments of the first aspect of the present disclosure.
In the method and the device for knowledge mining of coronavirus associated data based on the strain angle in this embodiment, a node communication path passing through at least one coronavirus strain data type between a first node corresponding to first query data and a second node corresponding to second query data is searched based on a preset graph structure, and the communication path is displayed, so that the association relationship between the associated data can be provided, and meanwhile, related strain information is output, which is beneficial to realizing knowledge mining of coronavirus associated data from the strain angle, thereby providing powerful support for related scientific research work.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
fig. 2A is a flow diagram of one embodiment of a method of knowledge mining of coronavirus associated data based on strain angle according to the present disclosure;
FIG. 2B is a schematic diagram of one example of a method of knowledge mining of coronavirus association data based on strain angle according to the present disclosure;
FIG. 3 is a schematic structural diagram of one embodiment of a knowledge mining device for coronavirus associated data based on strain angle according to the present disclosure;
FIG. 4 is a schematic block diagram of a computer system suitable for use in implementing the electronic device of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the presently disclosed strain-angle-based method of knowledge mining of coronavirus associated data or strain-angle-based device of knowledge mining of coronavirus associated data may be applied.
As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. The terminal device 101 may be installed with various communication client applications, such as a coronavirus information data recording application, a coronavirus information data processing application, a web browser application, and the like.
The terminal apparatus 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices having a display screen and supporting information input (e.g., text input and/or voice input, etc.), including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatus 101 is software, it can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (e.g. to provide a knowledge mining service of coronavirus associated data based on strain angle) or as a single software or software module. And is not particularly limited herein.
The server 103 may be a server that provides various services, such as a background server that provides processing services for the coronavirus information query request sent by the terminal device 101. The background server can process the received coronavirus information query request and feed back an operation result (such as communication path data) to the terminal equipment.
In some cases, the method for knowledge mining of coronavirus associated data based on strain angle provided by the present disclosure may be performed by the terminal device 101 and the server 103 together, for example, the steps of "obtaining the first query data and the second query data" and "searching for a communication path between a first node corresponding to the first query data and a second node corresponding to the second query data based on a preset graph structure" may be performed by the server 103, and the step of "displaying the communication path" may be performed by the terminal device 101. The present disclosure is not limited thereto. Accordingly, the knowledge mining device for coronavirus associated data based on strain angle may also be respectively provided in the terminal device 101 and the server 103.
In some cases, the knowledge mining method for coronavirus associated data based on strain angle provided by the present disclosure may be performed by the server 103, and accordingly, the knowledge mining device for coronavirus associated data based on strain angle may also be disposed in the server 103, and in this case, the system architecture 100 may also not include the terminal device 101.
In some cases, the method for mining the knowledge of coronavirus associated data based on strain angle provided by the present disclosure may be performed by the terminal device 101, and accordingly, the apparatus for mining the knowledge of coronavirus associated data based on strain angle may also be disposed in the terminal device 101, and in this case, the system architecture 100 may not include the server 103.
The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, to provide a knowledge mining service for coronavirus associated data based on strain angle), or as a single software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2A, a flow 200 of one embodiment of a method of knowledge mining of coronavirus associated data based on strain angle according to the present disclosure is shown. The knowledge mining method of coronavirus associated data based on strain angles comprises the following steps of:
step 201, obtaining first query data and second query data.
In one example, the method is implemented by a terminal device, and the terminal device can receive first query data and second query data input by a user.
In one example, the method is implemented by a terminal device and a server together. At this time, the user may input the first query data and the second query data at the terminal device and transmit them to the server, so as to make the server obtain the first query data and the second query data.
The first query data and the second query data may be coronavirus names, coronavirus nucleic acid data, coronavirus protein data, coronavirus crystal structure data, coronavirus antibody data, coronavirus literature data, coronavirus patent data, or the like.
In one example, after the first query data is obtained, at least one candidate data may be output for selection by a user, wherein the candidate data is associated with the first query data. Then, the candidate data selected by the user may be determined as the second query data in response to the selection operation of the candidate data by the user. For example, assume that the first query data input by the user is "virus a", and the associated data associated with "virus a" includes "nucleic acid a", "gene B", and "antibody a", etc. After the user enters "virus a," the associated "nucleic acid a," "gene B," and "antibody a" may be presented to the user as candidate data. Assuming that the user selects "gene a" from the candidate data, "gene a" may be determined as the second query data.
By the method, convenience of user query is improved.
In one example, the first query data and the second query data input by the user may be non-standard words, in which case the first query data and the second query data may be normalized for subsequent query matching. For example, for a coronavirus name, it may be treated as a standardized virus name using a coronavirus name standardized thesaurus. The standard words and corresponding non-standard words can be recorded in the coronavirus name standard word library, wherein the standard words can be species science names, and the corresponding non-standard words can be corresponding names used before, common writing methods, error writing methods, gene name writing methods and the like.
Step 202, based on a preset graph structure, searching a communication path between a first node corresponding to the first query data and a second node corresponding to the second query data, where the nodes in the graph structure include a coronavirus name and coronavirus associated data, edges in the graph structure have a preset weight, the weight represents a step length required by establishing association between two corresponding nodes, and the communication path passes through at least one node of a coronavirus strain data type.
In one example, the coronavirus associated data may include at least one of coronavirus nucleic acid data, coronavirus protein data, coronavirus crystal structure data, coronavirus antibody data, coronavirus literature data, and coronavirus patent data. Coronavirus nucleic acid data, for example, is the gene sequence of a coronavirus. Coronavirus protein data, for example, is the protein sequence of a coronavirus. Coronavirus crystal structure data, for example, is coronavirus protein three-dimensional crystal structure data. Coronavirus antibody data, for example, is coronavirus-related antibody information. The coronavirus literature data is scientific research literature data in which information related to coronavirus is described, for example. The coronavirus patent data is patent data describing information related to coronavirus, for example.
In one example, the weights of the edges in the graph structure are determined by: first, it is determined whether two nodes corresponding to an edge are directly associated. Secondly, in case that two nodes are directly associated, the weight of the edge is determined to be 1. And under the condition that the two nodes are not directly associated, determining the number N of intermediaries between the two nodes corresponding to the edge, and determining the weight of the edge as N + 1. For example, if node a is a nucleic acid and node B is a gene that the nucleic acid has, then node a and node B are directly related, and the step size required for association is 1. Accordingly, the weight of the edge AB between node a and node B is 1. For another example, if node a is a nucleic acid and node C is a document in which the sequence number of the nucleic acid is directly described, node a and node C are directly related to each other, and the step size required for the association between the two is 1. Accordingly, the weight of the edge AC between node a and node C is 1. For another example, if node a is a nucleic acid and node D is a document that has a gene, but the document does not directly describe the nucleic acid, but describes the gene that the nucleic acid has, node a and node D are not directly related to each other, and an intermediate (i.e., the gene that the nucleic acid includes) exists between them, the step size required for the association between the two is 2. Accordingly, the weight of the edge AD between node a and node D is 2.
In one example, communication paths through at least one coronavirus strain data type node may be looked up in a graph structure based on a MIP (Mixed Integer Programming) heuristic algorithm. A heuristic algorithm is an algorithm proposed with respect to an optimization problem. In general, optimization problems are classified into two types, a continuous optimization problem and a discrete optimization problem. The discrete optimization problem, also called combinatorial optimization problem, is to find a set of discrete variables so that a given objective function value can be maximized or minimized, which is an important branch of operations research. The search space of the combinatorial optimization problem increases exponentially with the increase of the problem scale, so-called 'combinatorial explosion' occurs, the optimal solution of the combinatorial optimization problem is difficult to obtain by an accurate algorithm within a certain time, and an approximately optimal solution can be obtained by a heuristic algorithm within an acceptable calculation time. The heuristic algorithm cannot guarantee the optimality of the solution, and even cannot explain the approximation degree of the optimal solution under certain conditions, but in actual calculation, the heuristic algorithm often shows very excellent performance and is an indispensable scheme for solving various NP-class problems.
In one example, assume that graph structure G ═ V, E, C }, V ═ V1, V2, V3 … vn } is a set of all nodes, E ═ E1, E2, … em } is a set of edges, and C ═ CijAnd | i, j ε V } is the set of non-negative weights for the corresponding edge. The shortest must-path length L from Vi to Vj can be expressed as: l min Σ cyxy. If a path exists from node i to node j, xy1 is ═ 1; if a path does not exist from node i to node j, xy0. Where i ≠ j and Vi, Vj ∈ V.
Here, the length of the communication path is the sum of the weights of the respective sides in the communication path.
It should be noted that, when searching for a communication path passing through a coronavirus strain data type node in a graph structure based on the MIP heuristic algorithm, in addition to recording the shortest path, each path in the searching process is also recorded, so that all paths passing through the coronavirus strain data type node between the first node and the second node are obtained.
In one example, for the node map shown in fig. 2B, assuming that the first query data is virus a and the second query data is gene a, the communication paths between virus a to gene a and passing through the coronavirus strain data type node may be obtained according to the above method including "virus a-strain a-gene a", "virus a-strain a-document a-gene a", and "virus a-document a-strain a-gene a".
And step 203, displaying the communication path.
In the example shown in fig. 2B, the communication paths between virus a to gene a and through the coronavirus strain data type node include "virus a-strain a-gene a", "virus a-strain a-document a-gene a", and "virus a-document a-strain a-gene a". The communication paths described above can be demonstrated in a number of ways. For example, the route may be represented as "virus A-strain A-gene A" or { virus A, strain A, gene A } or the like, and the start and end points of the route may be omitted, and only transit nodes such as "strain A" may be displayed. Here, the transit node means a node excluding the start node and the end node in the communication path.
In the method for mining the knowledge of coronavirus associated data based on the strain angle in this embodiment, a node communication path passing through at least one coronavirus strain data type between a first node corresponding to first query data and a second node corresponding to second query data is searched based on a preset graph structure, and the communication path is displayed, so that the association relationship between the associated data can be provided, and meanwhile, related strain information is output, which is beneficial to the realization of knowledge mining of the coronavirus associated data from the strain angle, thereby providing powerful support for related scientific research work.
With further reference to fig. 3, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a device for knowledge mining of coronavirus associated data based on strain angle, which corresponds to the embodiment of the method shown in fig. 2A, and which can be applied in various electronic devices.
As shown in fig. 3, the knowledge mining device 300 for coronavirus associated data based on strain angle according to the present embodiment may include: an acquisition module 301, a search module 302 and a presentation module 303. The obtaining module 301 may be configured to obtain first query data and second query data; the searching module 302 may be configured to search, based on a preset graph structure, communication paths between a first node corresponding to the first query data and a second node corresponding to the second query data, where the nodes in the graph structure include a coronavirus name and coronavirus associated data, an edge in the graph structure has a preset weight, the weight indicates a step length required for establishing association between two corresponding nodes, and the communication paths pass through a node of at least one coronavirus strain data type; the demonstration module 303 may be used to demonstrate the communication path.
In this embodiment, the specific processes of the obtaining module 301, the searching module 302, and the displaying module 303 of the knowledge mining apparatus 300 based on coronavirus associated data from a strain angle and the technical effects thereof can refer to the related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2A, respectively, and are not repeated herein.
In some optional embodiments, the lookup module 302 may be further configured to: and searching communication paths passing through at least one coronavirus strain data type node in the graph structure based on a heuristic algorithm of mixed integer programming.
In some alternative embodiments, the coronavirus associated data comprises at least one of coronavirus nucleic acid data, coronavirus protein data, coronavirus crystal structure data, coronavirus antibody data, coronavirus literature data, and coronavirus patent data.
In some alternative embodiments, the weights of the edges in the graph structure are determined by: determining whether two nodes corresponding to the edges are directly related; in response to determining yes, determining a weight of the edge to be 1; and responding to the judgment result, determining the number N of intermediaries between two nodes corresponding to the edge, and determining the weight value of the edge as N + 1.
In some optional embodiments, the obtaining module 301 may further be configured to: acquiring first query data; outputting at least one candidate datum, wherein the candidate datum is associated with the first query datum; and determining the candidate data selected by the user as the second query data in response to the selection operation of the candidate data by the user.
In some alternative embodiments, the node of the coronavirus strain data type through which the communication path passes is a transit node.
It should be noted that, for details of implementation and technical effects of the modules in the knowledge mining apparatus for coronavirus associated data based on strain angle provided by the present disclosure, reference may be made to descriptions of other embodiments in the present disclosure, and no further description is given here.
Referring now to FIG. 4, a block diagram of a computer system 400 suitable for use in implementing the electronic device of the present disclosure is shown. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the present disclosure.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for the operation of the system 400 are also stored. The CPU401, ROM402, and RAM403 are connected to each other via a bus 404. An Input/Output (I/O) interface 405 is also connected to the bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a touch screen, a tablet, a keyboard, a mouse, or the like; an output section 407 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from the network through the communication section 409. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable medium of the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules referred to in the present disclosure may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module, a lookup module, and a presentation module. The names of these modules do not in some cases constitute a limitation on the module itself, and for example, the acquiring module may also be described as a "module that acquires first query data and second query data".
As another aspect, the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments, or may exist separately without being assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring coronavirus information data, wherein the coronavirus information data comprises coronavirus biological data and corresponding coronavirus metadata; the coronavirus biological data is used for describing biological information of the coronavirus, and the coronavirus biological data comprises at least one of coronavirus nucleic acid data, coronavirus protein data, coronavirus crystal structure data and coronavirus antibody data; the coronavirus metadata is used for describing the attribute of the corresponding coronavirus biological data; according to a preset standardized lexicon, carrying out standardization processing on the coronavirus metadata to obtain corresponding coronavirus standardized metadata; and determining the association relation between different coronavirus biological data according to the coronavirus standardized metadata to form a coronavirus information integration data set.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (9)

1. A method of knowledge mining of coronavirus associated data based on strain angle, comprising:
acquiring first query data and second query data;
searching communication paths between a first node corresponding to the first query data and a second node corresponding to the second query data based on a preset graph structure, wherein the nodes in the graph structure comprise coronavirus names, coronavirus associated data and coronavirus strain data, edges in the graph structure are provided with preset weights, the weights represent step lengths required by establishing association between the corresponding two nodes, and the communication paths pass through at least one node of a coronavirus strain data type;
and displaying the communication path.
2. The method according to claim 1, wherein the finding communication paths between a first node corresponding to the first query data and a second node corresponding to the second query data based on a preset graph structure includes:
searching the communication paths passing through at least one coronavirus strain data type node in the graph structure based on a heuristic algorithm of mixed integer programming.
3. The method of claim 1, wherein the coronavirus associated data comprises at least one of coronavirus nucleic acid data, coronavirus protein data, coronavirus crystal structure data, coronavirus antibody data, coronavirus literature data, and coronavirus patent data.
4. The method of claim 1, wherein the weight of an edge in the graph structure is determined by:
determining whether two nodes corresponding to the edge are directly related;
in response to determining yes, determining a weight of the edge to be 1;
and responding to the judgment result, determining the number N of the intermediaries between the two nodes corresponding to the edge, and determining the weight value of the edge as N + 1.
5. The method of claim 1, wherein the obtaining first query data and second query data comprises:
acquiring first query data;
outputting at least one candidate datum, wherein the candidate datum is associated with the first query datum;
and determining the candidate data selected by the user as the second query data in response to the selection operation of the candidate data by the user.
6. A method according to any one of claims 1 to 5 wherein the node of the coronavirus strain data type through which the communication path passes is a transit node.
7. A knowledge mining device for coronavirus associated data based on strain angle, comprising:
the acquisition module acquires first query data and second query data;
the searching module is used for searching communication paths between a first node corresponding to the first query data and a second node corresponding to the second query data based on a preset graph structure, wherein the nodes in the graph structure comprise coronavirus names and coronavirus associated data, edges in the graph structure are provided with preset weights, the weights represent step lengths required by establishing association between the corresponding two nodes, and the communication paths pass through at least one node of a coronavirus strain data type;
and the display module displays the communication path.
8. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-6.
9. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.
CN202110726433.5A 2021-06-29 2021-06-29 Method and device for knowledge mining of coronavirus associated data based on strain angle Pending CN113611424A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110726433.5A CN113611424A (en) 2021-06-29 2021-06-29 Method and device for knowledge mining of coronavirus associated data based on strain angle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110726433.5A CN113611424A (en) 2021-06-29 2021-06-29 Method and device for knowledge mining of coronavirus associated data based on strain angle

Publications (1)

Publication Number Publication Date
CN113611424A true CN113611424A (en) 2021-11-05

Family

ID=78303853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110726433.5A Pending CN113611424A (en) 2021-06-29 2021-06-29 Method and device for knowledge mining of coronavirus associated data based on strain angle

Country Status (1)

Country Link
CN (1) CN113611424A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065173A (en) * 2018-07-10 2018-12-21 北京科技大学 The acquisition methods of Knowledge route
CN109964224A (en) * 2016-09-22 2019-07-02 恩芙润斯公司 System, method and the computer-readable medium that significant associated time signal is inferred between life science entity are visualized and indicated for semantic information
CN112541065A (en) * 2020-12-11 2021-03-23 浙江汉德瑞智能科技有限公司 Medical new word discovery processing method based on representation learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109964224A (en) * 2016-09-22 2019-07-02 恩芙润斯公司 System, method and the computer-readable medium that significant associated time signal is inferred between life science entity are visualized and indicated for semantic information
CN109065173A (en) * 2018-07-10 2018-12-21 北京科技大学 The acquisition methods of Knowledge route
CN112541065A (en) * 2020-12-11 2021-03-23 浙江汉德瑞智能科技有限公司 Medical new word discovery processing method based on representation learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周骏等: "HBx 研究文献可视化分析", 医学信息学杂志, vol. 36, no. 3, 31 December 2015 (2015-12-31), pages 47 - 51 *

Similar Documents

Publication Publication Date Title
CN108733689B (en) JSON text comparison method and device
US20150066383A1 (en) Collapsible modular genomic pipeline
Bell et al. Expa: a program for calculating extreme pathways in biochemical reaction networks
CN110263277B (en) Page data display method, page data updating device, page data equipment and storage medium
CN111338944B (en) Remote Procedure Call (RPC) interface testing method, device, medium and equipment
CN112835963A (en) Display method and device of flight data
CN114297278A (en) Method, system and device for quickly writing batch data
WO2023130960A1 (en) Service resource determination method and apparatus, and service resource determination system
CN113611424A (en) Method and device for knowledge mining of coronavirus associated data based on strain angle
CN113807056B (en) Document name sequence error correction method, device and equipment
CN113609250A (en) Method and device for mining knowledge of coronavirus associated data based on scientific angle
CN110727692B (en) Method and device for setting linkage chart
CN111126078B (en) Translation method and device
CN113393288A (en) Order processing information generation method, device, equipment and computer readable medium
CN110647623B (en) Method and device for updating information
CN113760969A (en) Data query method and device based on elastic search
CN113609252A (en) Display method and device of coronavirus associated data, electronic equipment and medium
CN112037857B (en) Strain genome annotation query method and device, electronic equipment and storage medium
JP2020035427A (en) Method and apparatus for updating information
CN112307061A (en) Method and device for querying data
CN113704394A (en) Coronavirus information searching and coronavirus information presenting methods and devices
CN110688295A (en) Data testing method and device
CN113611365B (en) Coronavirus information data processing method and device, electronic equipment and medium
CN114492413B (en) Text proofreading method and device and electronic equipment
CN110046171B (en) System, method and apparatus for obtaining information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination