CN116775896A - Map construction and retrieval method, device, electronic equipment and storage medium - Google Patents

Map construction and retrieval method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116775896A
CN116775896A CN202310572446.0A CN202310572446A CN116775896A CN 116775896 A CN116775896 A CN 116775896A CN 202310572446 A CN202310572446 A CN 202310572446A CN 116775896 A CN116775896 A CN 116775896A
Authority
CN
China
Prior art keywords
entity
disease
relationship
map
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310572446.0A
Other languages
Chinese (zh)
Inventor
周立运
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rubik's Cube Medical Technology Suzhou Co ltd
Original Assignee
Rubik's Cube Medical Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rubik's Cube Medical Technology Suzhou Co ltd filed Critical Rubik's Cube Medical Technology Suzhou Co ltd
Priority to CN202310572446.0A priority Critical patent/CN116775896A/en
Publication of CN116775896A publication Critical patent/CN116775896A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a map construction and retrieval method, a map construction and retrieval device, electronic equipment and a storage medium, wherein the method comprises the following steps: constructing a disease signal pathway map in units of a disease based on medical research literature; correlating the disease signal pathway patterns of each disease based on the same molecular entity in the disease signal pathway patterns of each disease to obtain a relationship pattern between diseases; constructing a drug-disease relationship map based on the relationship between the disease entity and the target entity in the relationship map and the relationship between the predetermined drug and the target. The map construction and retrieval method, the map construction and retrieval device, the electronic equipment and the storage medium can mine potential association relations between the medicine and the diseases based on the constructed medicine-disease relation map, so that a user can be helped to efficiently and accurately find new indications of the medicine.

Description

Map construction and retrieval method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for constructing and retrieving a map, an electronic device, and a storage medium.
Background
With the rapid development of the drug development market, competition for new drug development is increasing. The development of innovative drugs and imitation drugs faces a great deal of challenges, and needs to undergo a long-term development process, so that great amounts of development cost, manpower and material resources are input to the development process to be marketed. Development of new indications for the drug is a less risky and more beneficial option than development of innovative and imitation drugs. Unlike innovative and imitation drug development, the development of new indications for drugs does not need to be started from scratch, nor does it need to be costly to develop, and therefore the risk is low.
At present, as research and development information of medicines is dispersed in massive medical documents, the mining of new indication information of medicines is mostly realized through manual arrangement, namely, research and development information of various medicines is collected from massive document data and is screened, arranged and classified, the mode is labor-consuming and labor-consuming, low in efficiency and limited by data integrity or personal cognition, and the reliability and accuracy of medicine information mining are poor.
Disclosure of Invention
The invention provides a map construction and retrieval method, a map construction and retrieval device, electronic equipment and a storage medium, which are used for solving the defects of low efficiency, poor reliability and poor accuracy of drug information mining in the prior art.
The invention provides a map construction method, which comprises the following steps:
constructing a disease signal pathway map in units of a disease based on medical research literature;
correlating the disease signal pathway patterns of each disease based on the same molecular entity in the disease signal pathway patterns of each disease to obtain a relationship pattern between diseases, wherein the molecular entity is a biomarker entity and/or a target entity;
constructing a drug-disease relationship map based on the relationship between the disease entity and the target entity in the relationship map and the relationship between the predetermined drug and the target.
According to the map construction method provided by the invention, the determining step of the relationship between the disease entity and the target entity in the inter-disease relationship map comprises the following steps:
acquiring signal paths used for connecting each disease entity and each target entity in the inter-disease relation map;
and determining the relation between the disease entity and the target entity corresponding to the signal path based on the relation between the nodes in the signal path.
According to the map construction method provided by the invention, the determining the relationship between the disease entity and the target entity corresponding to the signal path based on the relationship between the nodes in the signal path comprises the following steps:
Counting the number of relationships between nodes in the signal path, wherein the relationships are negatively regulated;
based on the number of relationships, determining that the relationship between the disease entity corresponding to the signal path and the target entity is a positive regulation relationship or a negative regulation relationship.
According to the map construction method provided by the invention, the determining step of the relationship between the disease entity and the target entity in the inter-disease relationship map further comprises the following steps:
if the signal path exists in the disease signal path map, determining that the relationship between the disease entity corresponding to the signal path and the target entity is an actual association relationship;
otherwise, determining the relationship between the disease entity corresponding to the signal path and the target entity as a potential association relationship.
According to the map construction method provided by the invention, the construction of the disease signal path map taking the disease as a unit based on the medical research literature comprises the following steps:
performing entity recognition on sentences in the medical research literature to obtain entity pairs contained in the medical research literature, wherein the entity pairs comprise entities and entity relations among the entities;
constructing the disease signaling pathway map in units of disease based on the entity and the entity relationship.
According to the map construction method provided by the invention, the entity recognition is carried out on the sentences in the medical research literature to obtain the entity pairs contained in the medical research literature, and the method comprises the following steps:
acquiring title text and abstract text of the medical research literature;
inputting the title text and the abstract text into a sentence classifier to obtain sentence types of each sentence in the title text and the abstract text output by the sentence classifier;
and carrying out entity recognition on the sentences with the sentence types to be recognized to obtain entity pairs contained in the medical research literature.
According to the map construction method provided by the invention, the entity identification is carried out to obtain the entity pairs contained in the medical research literature, and the method comprises the following steps:
performing entity recognition on the sentence to obtain an entity and a pronoun in the sentence;
inputting the entity and the pronoun into a reference relation classifier to obtain a reference relation between the entity and the pronoun output by the reference relation classifier;
under the condition that the reference relation is yes, replacing the pronouns in the sentence with corresponding entities to obtain an optimized sentence;
and inputting the optimizing statement and the entity in the optimizing statement into an entity relation classifier to obtain the entity relation between the entities output by the entity relation classifier.
The invention also provides a retrieval method, which comprises the following steps:
acquiring a target entity to be queried;
information about knowledge about the target entity is determined based on a drug-disease relationship profile determined based on a profile construction method as described in any one of the above.
The invention also provides a map construction device, which comprises:
an initial map construction unit for constructing a disease signal pathway map in units of a disease based on medical study literature;
the map association unit is used for associating the disease signal path maps of the diseases based on the same molecular entities in the disease signal path maps of the diseases to obtain a relationship map between the diseases, wherein the molecular entities are biomarker entities and/or target entities;
a target map construction unit, configured to construct a drug-disease relationship map based on a relationship between a disease entity and a target entity in the inter-disease relationship map, and a predetermined relationship between a drug and a target.
The invention also provides a retrieval device, comprising:
the acquisition unit is used for acquiring a target entity to be queried;
and a retrieval unit for determining relevant knowledge information of the target entity based on a drug-disease relationship map determined based on a map construction method as described in any one of the above.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the map construction method or the retrieval method as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a map construction method or a retrieval method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a map construction method or a retrieval method as described in any one of the above.
According to the map construction and retrieval method, device, electronic equipment and storage medium, the disease signal path map taking the disease as a unit is constructed based on the medical research literature, so that information in a large amount of medical research literature can be comprehensively and accurately extracted into the disease signal path map of each disease, the efficiency of information mining and extraction is improved, potential association relations between the disease and the target point can be mined through associating the disease signal path map of each disease, and further the potential association relations between the drug and the disease are further mined based on the predetermined relation between the drug and the target point, and the constructed drug-disease relation map can help a user to efficiently and accurately mine new indications of the drug, so that the efficiency, reliability and accuracy of drug information mining are improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a map construction method provided by the invention;
FIG. 2 is a flow chart of a relationship determination step between a disease entity and a target entity in an inter-disease relationship map provided by the present invention;
FIG. 3 is a schematic flow chart of step 220 in the graph construction method provided by the present invention;
FIG. 4 is a schematic flow chart of step 110 in the graph construction method provided by the present invention;
FIG. 5 is a schematic flow chart of step 111 in the graph construction method provided by the present invention;
FIG. 6 is a schematic diagram of the structure of a disease signal pathway map provided by the present invention;
FIG. 7 is a schematic diagram of the relationship map between diseases provided by the invention;
FIG. 8 is a schematic diagram of the structure of a target-disease relationship map provided by the present invention;
FIG. 9 is a schematic diagram of the drug-target association and drug-disease relationship profile provided by the present invention;
FIG. 10 is a flow chart of the search method provided by the invention;
FIG. 11 is a schematic diagram of the structure of the map construction apparatus provided by the present invention;
FIG. 12 is a schematic diagram of a search device according to the present invention;
fig. 13 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Drug development is a challenging and risky task. The research and development of innovative drugs require years of research and development, require huge capital and a great deal of manpower and resources, and are closely monitored by a supervision organization; the research and development market of the imitation pharmacy also faces great challenges, the life cycle of the imitation pharmacy product is short, the time from marketing to failure is short, and the requirements of consistency evaluation and the like lead to great rise of the research and development cost of the imitation pharmacy, so that enterprises face great dilemma of competitive pressure and rise of the research and development cost in the imitation pharmacy market.
Development of new indications for the drug is a less risky and more beneficial option than development of innovative and imitation drugs. The medicine indication is mined, namely, on the basis of known pharmacological actions and medicine safety, patient groups which have potential disease treatment requirements and can be accepted are found, and the application range of the medicine is further widened. The development of new indications for medicines does not need to be started from scratch, nor does it need to be costly to develop, and therefore the risk is low.
At present, research and development information of various medicines is mostly dispersed in massive medical documents, under the condition, the mining of new indication information of the medicines is completed by leaning on professionals, namely, the research and development information of various medicines is collected from massive data, and is screened, sorted and classified, so that the indication information of the medicines is obtained. The dependence of the method on manual experience is too high, the efficiency is low, and the reliability and the accuracy of drug information mining are difficult to ensure. In this regard, the embodiment of the invention provides a map construction method, which can help a user to efficiently and accurately mine out new indications of medicines by constructing a medicine-disease relation map, thereby improving the sales of medicines and increasing the income and profits of enterprises.
Fig. 1 is a schematic flow chart of a map construction method provided by the invention, as shown in fig. 1, the method includes:
step 110, constructing a disease signal path map in units of diseases based on medical research literature;
specifically, the research direction of medicine includes basic medicine, clinical medicine, preventive medicine, health care medicine, rehabilitation medicine, and the like, and the medical research literature may be literature related to the above-mentioned various kinds of medical research. The medical study document may be obtained from a document retrieval database including a large number of medical, biological, health or nursing related documents, for example, a medical study document may be obtained from a document retrieval database such as PubMed, web of Science, medPeer, etc., which is not particularly limited in the embodiment of the present invention.
It should be appreciated that a disease signal path graph is essentially a semantic network consisting of nodes representing entities and edges representing relationships between entities. Therefore, after a huge amount of medical research literature is obtained, biological entities related to disease pathogenesis in the medical research literature and entity relations among the entities can be mined, and then a disease signal path map taking the disease as a unit is constructed. Here, in order to efficiently, comprehensively and accurately mine the related biological entities and entity relationships of the disease pathogenesis from the medical research literature, the entities may be defined in advance, for example, the related biological entities of the disease pathogenesis may include disease entities, molecular entities, and disease physiological mechanism entities, and the disease signaling pathway pattern is a disease pathogenesis pattern.
After numerous entities are obtained based on the mining of the medical research literature, the relationship between the entities can be judged based on the text information of the medical research literature, and the entities are associated, so that the entity relationship between the entities is obtained. To ensure accurate entity relationships are obtained, entity relationships may be defined in advance based on expert experience, which may include, but are not limited to: positive (up-regulation, activation, promotion) regulation, negative (down-regulation, deactivation, inhibition) regulation, etc. For example, the physical relationship between molecule a and disease B may be positively or negatively regulated; the physical relationship between molecule a and pathophysiological mechanism C may be activation modulation or the like.
After the entity and the entity relation are obtained based on the medical research literature mining, a disease signal path map can be constructed based on the entity and the entity relation, each entity forms a node in the disease signal path map, each entity relation forms edge connection between nodes in the disease signal path map, and specific relation between the entities (namely, specific content of the entity relation) forms the attribute of the edge. The disease signal path map constructed by the method takes a disease entity as a unit, and can provide conditions for subsequent construction to obtain a medicine-disease relation map.
Step 120, correlating the disease signal pathway patterns of each disease based on the same molecular entities in the disease signal pathway patterns of each disease to obtain a relationship pattern between the diseases, wherein the molecular entities are biomarker entities and/or target entities;
in particular, the entities in the disease signaling pathway map may include disease entities, molecular entities, pathophysiological mechanism entities, etc., wherein the molecular entities are biomarker entities and/or target entities, and wherein the targets and/or biomarkers may be labeled at molecular entity nodes in the disease signaling pathway map for differentiation. The biomarker entity refers to a biological molecular entity such as nucleic acid, protein (polypeptide), metabolite and the like which can reflect physiological and pathological states of an organism; target entities refer to specific molecular entities that exist inside and outside tissue cells and interact with drugs and impart drug effects, including biological macromolecules such as receptors, enzymes, ion channels, nucleic acids, and the like. Because the target point and the medicine have an action relation, in order to realize the excavation of new indication information of the medicine, the potential association relation between the disease and the target point can be excavated first, and then the medicine and the disease are associated through the target point, so that the potential association relation between the medicine and the disease is excavated, and further the new indication information of the medicine is excavated.
It can be understood that the disease signal path patterns are in units of diseases, that is, the disease signal path patterns are in one-to-one correspondence with the diseases, and the number of constructed disease signal path patterns is correspondingly large due to the variety of diseases, so that in order to realize the excavation of the potential association relationship between the diseases and the targets, the disease signal path patterns of each disease need to be associated. Because the target entities are contained in molecular entities, disease signaling pathway patterns for each disease can be correlated based on the same molecular entity, thereby generating an inter-disease relationship pattern.
When the disease signal path patterns of all diseases are correlated, a certain disease can be determined as a target disease, the disease signal path pattern corresponding to the target disease is taken as a target disease signal path pattern, and other disease signal path patterns are taken as disease signal path patterns to be correlated; all molecular entities in the target disease signal path patterns are obtained as target molecular entities, each target molecular entity is compared with the molecular entity in each disease signal path pattern to be correlated in sequence, and the disease signal path patterns to be correlated are correlated to the target disease signal path patterns based on the target molecular entity under the condition that the target molecular entity exists in the disease signal path patterns to be correlated, so that the inter-disease relationship patterns are generated.
And 130, constructing a medicine-disease relation map based on the relation between the disease entity and the target entity in the relation map between the diseases and the relation between the predetermined medicine and the target.
Specifically, the inter-disease relationship map is obtained by correlating the inter-disease relationship map based on the disease signal pathway map of each disease, and thus, the entity in the inter-disease relationship map is consistent with the entity in the disease signal pathway map of each disease, and may include a disease entity, a target entity, a biomarker entity, a pathophysiological mechanism entity, and the like. Based on the relationship map between diseases, the relationship between the disease entity, the target entity and the relationship between the disease entity and the target entity can be obtained by mining, wherein the relationship between the disease entity and the target entity comprises a specific entity relationship between the disease entity and the target entity and also comprises an actual association relationship and a potential association relationship between the disease entity and the target entity. Based on the relationship among the disease entity, the target entity, the disease entity and the target entity, a target-disease relationship map can be constructed, and based on the relationship between the target-disease relationship map and the predetermined drug-target, a drug-disease relationship map can be constructed.
Here, the relationship between the predetermined drug and the target point refers to the association relationship between the drug and the target point which is pre-constructed based on the already defined drug action target point. The key part of modern new medicine research and development is to select and determine novel effective medicine targets, and the medicine indication is mined on the basis of the new medicine research and development, so that the relation between the medicine and the targets is determined before the medicine indication is mined, and the relation between the medicine and the targets can be constructed in advance on the basis of the relation.
In order to construct a drug-disease relationship profile, the relationship between the target-disease relationship profile and the drug and the target can be associated based on the same target entity, specifically, the association can be performed by the following ways: acquiring all target entities based on the target-disease relation map, and taking the target entities as target entities; aiming at each target entity, firstly acquiring a disease entity with an association relation with the target entity based on a target-disease relation map, and taking the disease entity as a target disease entity; then based on the relation between the medicine and the target point, searching a medicine entity with an association relation with the target point entity, and taking the medicine entity as a target medicine entity; and associating the target disease entity with the target drug entity.
It will be appreciated that there may be one or more of the target disease entities and target drug entities, and that each target drug entity and each target disease entity need to be associated when the association is performed, since the target disease entity and target drug entity are both derived based on the same target entity. After the association relations between all target disease entities and target drug entities are obtained based on all target entities, a drug-disease relation map can be constructed.
According to the map construction method provided by the embodiment of the invention, the map of the disease signal path taking the disease as a unit is constructed based on the medical research literature, so that information in a large amount of medical research literature can be comprehensively and accurately extracted into the map of the disease signal path of each disease, the efficiency of information mining and extraction is improved, the potential association relationship between the disease and the target point can be mined by associating the map of the disease signal path of each disease, and the potential association relationship between the drug and the disease is further mined based on the relationship between the predetermined drug and the target point, and therefore, the constructed drug-disease relationship map can help a user to efficiently and accurately mine new indications of the drug, and the efficiency, reliability and accuracy of drug information mining are improved.
Based on the above embodiment, fig. 2 is a flow chart of a step of determining a relationship between a disease entity and a target entity in an inter-disease relationship map provided by the present invention, and based on the inter-disease relationship map, a relationship between the disease entity and the target entity may be mined, where the relationship includes a specific entity relationship between the disease entity and the target entity, and also includes an actual association relationship and a potential association relationship between the disease entity and the target entity. In the embodiment of the present invention, by executing the following steps 210 and 220, a specific entity relationship between a disease entity and a target entity may be determined based on a relationship map between diseases. As shown in fig. 2, the step of determining the relationship between the disease entity and the target entity in the inter-disease relationship map includes:
step 210, obtaining signal paths for connecting each disease entity and each target entity in the relationship map between diseases;
specifically, the signaling pathway refers to an information pathway between a disease entity and a target entity in the inter-disease relationship map, for example, a molecule 1 is connected to a disease a sequentially through a pathophysiological mechanism 1 and a molecule 2, wherein the molecule 1 is marked as a target, and the molecule 2 is marked as a biomarker, so that the signaling pathway between the disease a (i.e. the disease entity) and the molecule 1 (i.e. the target entity) can be obtained as follows: molecular 1-pathophysiological mechanism 1-molecular 2-disease a; in another example, molecule 3 is directly linked to disease a, and molecule 3 is labeled as a target, then the signaling pathway between disease a (i.e., the disease entity) and molecule 3 (i.e., the target entity) is: molecular 3-disease a.
Step 220, determining a relationship between the disease entity and the target entity corresponding to the signal path based on the relationship between the nodes in the signal path.
Specifically, each node refers to each entity in the signal path, and the relationship between each node and the relationship between the disease entity and the target entity refer to an entity relationship, where the entity relationship may include, but is not limited to: positive (up-regulation, activation, promotion) regulation, negative (down-regulation, deactivation, inhibition) regulation, etc. Under the condition that the signal paths of the disease entity and the target entity are obtained, the correlation relationship between the target entity and the disease entity is shown, so that the specific entity relationship between the corresponding disease entity and the target entity can be determined based on the signal paths.
When determining the relationship between the disease entity and the target entity, the relationship between the disease entity and the target entity can be determined based on the relationship between the nodes in the signal path, the entity relationship between every two adjacent nodes in the signal path is sequentially acquired, and then the relationship between the disease entity and the target entity is determined based on all the entity relationships acquired in the signal path. For example, the signaling pathway to obtain disease a (i.e., the disease entity) and molecule 1 (i.e., the target entity) is: molecular 1-pathophysiological mechanism 1-molecular 2-disease a, wherein the physical relationship of molecular 1 and pathophysiological mechanism 1 is a promoting relationship, the physical relationship of pathophysiological mechanism 1 and molecular 2 is an inhibiting relationship, and the physical relationship of molecular 2 and disease a is a positive regulation relationship; based on the signal path, the relationship between the nodes can be respectively a promoting relationship, a suppressing relationship and a positive regulation relationship, and based on the relationship between the nodes, the relationship between the disease A (i.e. the disease entity) corresponding to the signal path and the molecule 1 (i.e. the target entity) can be determined to be a negative regulation relationship. For another example, the signaling pathway to acquire disease a (i.e., the disease entity) and molecule 3 (i.e., the target entity) is: molecule 3-disease a, wherein the physical relationship of molecule 3 to disease a is a positive regulatory relationship, based on which the relationship between disease a (i.e., the disease entity) and molecule 3 (i.e., the target entity) can be determined to be a positive regulatory relationship.
In the embodiment of the invention, the signal paths of each disease entity and each target entity are acquired based on the relationship map among the diseases, so that the entity relationship between the disease entity and the target entity can be determined and obtained based on the relationship among the nodes in the signal paths, and conditions are provided for the subsequent excavation and obtaining of the entity relationship between the drug entity and the disease entity, so that the drug-disease relationship map can be constructed and obtained.
Based on the above embodiment, fig. 3 is a schematic flow chart of step 220 in the map construction method provided by the present invention, and as shown in fig. 3, step 220 specifically includes:
step 221, counting the number of relationships between nodes in the signal path as negative-going adjusted relationships;
step 222, determining that the relationship between the disease entity corresponding to the signal path and the target entity is a positive regulation relationship or a negative regulation relationship based on the relationship number.
Specifically, the relationship between each node in the signal path may be a positive (up-regulation, activation, promotion) regulation relationship or a negative (down-regulation, deactivation, inhibition) regulation relationship, when the relationship between the disease entity and the target entity is determined based on the relationship between each node in the signal path, the entity relationship between every two adjacent nodes in the signal path may be sequentially obtained first, then the number of relationships that are negative regulation in all the entity relationships obtained in the signal path is counted, and under the condition that the number of relationships is even, the relationship between the disease entity and the target entity corresponding to the signal path may be determined to be a positive regulation relationship; under the condition that the number of the relations is odd, the relation between the disease entity corresponding to the signal path and the target entity can be determined to be a negative regulation relation.
Illustratively, the signaling pathway to obtain disease a (i.e., the disease entity) and molecule 1 (i.e., the target entity) is: molecular 1-pathophysiological mechanism 1-molecular 2-disease a, wherein the physical relationship of molecular 1 and pathophysiological mechanism 1 is a promoting relationship, the physical relationship of pathophysiological mechanism 1 and molecular 2 is an inhibiting relationship, and the physical relationship of molecular 2 and disease a is a positive regulation relationship; because the promotion relationship belongs to the positive regulation relationship and the inhibition relationship belongs to the negative regulation relationship, the relationship among the nodes in the signal path is counted, so that the number of the negative regulation relationships is 1, namely the number of the negative regulation relationships is an odd number, and the relationship between the disease A (namely the disease entity) and the molecule 1 (namely the target entity) can be determined to be the negative regulation relationship.
In the embodiment of the invention, the relationship between each node in the signal path is counted to be the relationship quantity of negative regulation, and the specific entity relationship between the disease entity and the target entity, namely the positive regulation relationship or the negative regulation relationship, can be determined based on the relationship quantity, so that a basis is provided for the subsequent specific entity relationship between the drug entity and the disease entity.
Based on any of the above embodiments, the step of determining the relationship between the specific disease entity and the target entity may further include:
If the signal path exists in the disease signal path map, determining that the relationship between the disease entity corresponding to the signal path and the target entity is an actual association relationship;
otherwise, determining the relationship between the disease entity corresponding to the signal path and the target entity as a potential association relationship.
Specifically, the signal path is an information path for connecting a disease entity and a target entity in the inter-disease relationship map, and under the condition that the disease entity and the target entity exist in the same signal path, the association relationship between the disease entity and the target entity can be determined. In order to further discover the potential association relationship between the disease and the target, the signal path needs to be further identified and judged, so as to determine that the relationship between the disease entity and the target entity is an actual association relationship or a potential association relationship.
Because the disease relationship patterns are obtained by correlating the disease signal path patterns of all diseases based on the same molecular entity, compared with the original disease signal path patterns of all diseases, new signal paths can be added in the disease relationship patterns, and the relationship between the disease entity and the target entity can be determined as potential correlation relationship based on the newly added signal paths.
In order to identify each signal path in the inter-disease relationship map, after all signal paths in the inter-disease relationship map are obtained, the signal paths are used as target signal paths, each target signal path is sequentially compared with the signal paths in the disease signal path map of each disease, and when the target signal path is the same as the signal path in the disease signal path map, the target signal path is shown to exist in the disease signal path map, namely, the target signal path is not a newly added signal path, and the relationship between the disease entity corresponding to the target signal path and the target entity exists originally, so that the relationship between the disease entity corresponding to the target signal path and the target entity can be determined to be the actual relationship; under the condition that the target signal path is different from the signal path in the disease signal path map, the target signal path is not in the disease signal path map, namely the target signal path is a newly added signal path, so that the relationship between the disease entity and the target entity corresponding to the target signal path can be determined to be a potential association relationship.
In the embodiment of the invention, the potential association between the diseases and the targets can be excavated by further identifying and judging each signal path in the relationship graph based on the relationship graph between the diseases and the disease signal path graph of each disease, so that the potential association between the drugs and the diseases can be conveniently and further excavated based on the relationship between the predetermined drugs and the targets, the drug-disease relationship graph can be constructed, and the user can be helped to efficiently and accurately excavate new indications of the drugs.
Based on any of the above embodiments, prior to performing step 110, medical research literature may be obtained in advance from a literature retrieval database, with specific literature obtaining steps including:
acquiring an initial document set;
screening secondary processing documents from the initial document collection based on the publishing types of all the initial documents in the initial document collection to obtain a candidate document collection;
screening the candidate literature collection to obtain medical research literature.
Specifically, the initial document set refers to all document sets in the document retrieval database, and the initial document set may be obtained from document retrieval databases of different sources, such as PubMed, web of Science, med peer, and the like, which is not particularly limited in the embodiment of the present invention.
The secondary processing literature is a product obtained by processing and finishing the primary literature, the literature content is not comprehensive, if the entity and entity relation is mined based on the secondary processing literature, the constructed disease signal path graph is not comprehensive enough, and in order to improve the comprehensiveness and accuracy of the entity and entity relation mining, the primary screening can be performed on the initial literature collection based on the publishing type, the secondary processing literature can be screened out, the number of the documents needing to be mined is reduced, and therefore the time and cost consumed by information mining are reduced.
In an embodiment, the initial document set is initially screened based on the publication types, which may be implemented by a rule matching manner, where each initial document in the initial document set corresponds to at least one publication type, and thus, a specific rule matching manner may be: if the publication type of the initial document contains at least one of the first publication types and does not contain any of the second publication types, then the initial document is considered not a secondary processed document; otherwise, the original document is considered to be a secondary processing document and is deleted from the original document collection. The first publication type herein may include "Journal Article", "Letter", and "quick report"; the second publication type may include "Systematic Review", "Meta-Analysis", "Practice Guideline", "guide line".
In another embodiment, the initial document collection is initially screened based on the publication type, and the initial document collection can also be realized through a text classification model, wherein the text classification model is a common text classification model, all initial documents in the initial document collection can be divided into two categories according to the publication type, namely secondary processing documents and non-secondary processing documents, each initial document in the initial document collection is input into the text classification model, the yes or no of the initial document output by the text classification model can be obtained as the secondary processing documents, and if the initial document is the secondary processing document, the initial document is deleted from the initial document collection.
In yet another embodiment, the initial document set is primarily screened based on the publication type, and may also be implemented by combining rule matching with a text classification model, which is not described herein.
Because the publication types provided in the document retrieval database have certain time hysteresis, in order to ensure that the latest published medical research documents are accurately screened, the candidate document set can be accurately screened after the initial document set is screened to obtain the candidate document set, and the error of the screened document types caused by the hysteresis of the publication types is avoided, so that the obtained documents are all medical research documents, the follow-up efficient and accurate mining of entity and entity relation from each medical research document is facilitated, and the efficiency and accuracy of constructing the disease signal path map are improved.
When accurately screening a candidate document set, a document classifier based on a pre-training model and fine tuning can be used for screening, firstly, text data of each candidate document in the candidate document set is preprocessed, converted into a form capable of being input into a model, the converted text data is input into the document classifier, the document classifier is used for classifying whether the candidate document belongs to a secondary processing document, if the candidate document belongs to the secondary processing document, the candidate document is deleted from the candidate document set, and accordingly, a medical research document can be obtained based on the finally obtained candidate document set.
For example, in the case where the initial document set is obtained from PubMed, the pre-training model may be a PubMed bert pre-training model, which is a BERT (Bidirectional Encoder Representation from Transformers) model trained in the medical field, and the pre-training data is derived from abstract and full text articles of PubMed, so that the model can understand the technical terms and languages of the medical field. When the pretraining model is obtained, the PubMedBert pretraining model can be downloaded and imported from a model library of the Hugging Face website.
After the pubmed bert pre-training model is obtained, fine-tuning can be performed on the target dataset using the pre-training model as a basis. The target data set is training data which is automatically constructed based on the rule matching mode, the document data meeting the rule is added into the positive examples, the negative examples are not added, and the training data does not need to be manually marked, so that a large amount of time and resources are saved. In order to enable the model to classify the training sample and output whether the training sample belongs to the secondary processing literature, a fully connected output layer can be added above the pre-training layer of the model to classify the training sample and judge whether the training sample belongs to the secondary processing literature, wherein the training sample refers to the sample literature in the training data. During fine tuning training, preprocessing data in a target data set, converting the data into a form capable of being input into a model, obtaining converted text data, inputting the text data into a pre-training model, processing the text data through an input layer and a transducer encoder of the pre-training model, extracting features on the pre-training layer, and inputting the extracted features into a full-connection layer for classification judgment. The weights and biases of the full connection layer are updated according to the training data to minimize the loss function on the target data set, thereby obtaining the literature classifier based on the pre-training model plus fine tuning.
It should be noted that, for a document provided by PubMed and having a publication type of "Journal art" and a publication time of less than 12 months, it is generally considered that the publication type of the document is still to be updated, so in order to avoid a document screening error of this type, a document classifier may be directly used to screen the document of this type, so as to ensure that all documents obtained by final screening are medical research documents, and further ensure that entities and entity relationships obtained by subsequent mining from each medical research document are more accurate and comprehensive.
According to the embodiment of the invention, the initial literature collection is primarily screened based on the publishing type to obtain the candidate literature collection, and then the candidate literature collection is accurately screened by the literature classifier based on the pre-training model and fine tuning, so that not only can the obtained literature be ensured to be the medical research literature, but also the number of the literature to be mined can be reduced, and the efficient, comprehensive and accurate subsequent mining of the entity and entity relation from each medical research literature is facilitated, thereby improving the efficiency and accuracy of constructing the disease signal path map.
Based on any of the above embodiments, fig. 4 is a schematic flow chart of step 110 in the map construction method provided by the present invention, and as shown in fig. 4, step 110 specifically includes:
Step 111, performing entity recognition on sentences in the medical research literature to obtain entity pairs contained in the medical research literature, wherein the entity pairs comprise entities and entity relations among the entities;
specifically, the sentence herein refers to text information in the basic research literature, the entity is the most basic element in the disease signal path map, different relationships exist between different entities, namely, entity relationships, and when the two entities have relationships, the two entities and the entity relationships between the two entities form entity pairs.
In order to efficiently, comprehensively and accurately mine the biological entities related to the pathogenesis of the disease and the entity relationship among the entities from medical research literature, the entities can be defined in advance, and can comprise disease entities, molecular entities and pathophysiological mechanism entities, for example. For each type of entity predefined above, a standard dictionary may be pre-built for extraction and normalization of the entity. When the standard dictionary is constructed in advance, the standard dictionary may be constructed based on MeSH (Medical Subject Headings, medical subject vocabulary), HGNC (HUGO Gene Nomenclature Committee, human gene naming database) and internal data autonomously accumulated by the enterprise, which is not particularly limited in the embodiment of the present invention. For example, when acquiring the related entity of the standard dictionary from the MeSH, the adaptive inclusion and exclusion rule may be preset based on the target requirement, and filtering may be performed from the MeSH tree structure. Taking the pre-constructed disease entity in the standard dictionary as an example, when the MeSH contains the topic vocabulary of various diseases and the related entity of the diseases is obtained from the MeSH, inclusion and exclusion rules which are suitable for the disease entity can be preset, and the disease vocabulary in the MeSH tree structure is filtered, so that the disease entity in the standard dictionary is constructed.
Based on numerous entities in the standard dictionary that have been previously constructed, the statements in the medical research literature can be entity-identified, resulting in all entities contained in the medical research literature. After obtaining a plurality of entities, the relationship between the entities needs to be determined, and the entities are associated, so that the entity relationship between the entities is obtained. In order to accurately determine the relationship between the entities and improve the reliability of the finally obtained entity relationship, the entity relationship may be defined in advance based on expert experience, where the entity relationship includes, but is not limited to: positive (up-regulation, activation, promotion) regulation, negative (down-regulation, deactivation, inhibition) regulation, etc. For example, the physical relationship between molecule a and disease B may be positively or negatively regulated; the relationship between molecule a and pathophysiological mechanism C may be activation modulation, etc.
After extracting all the entities contained in the medical research literature, the relationships among different entities can be judged based on the predefined entity relationships and combined with sentences in the medical research literature to obtain entity relationships among the entities, so that a plurality of entity pairs are obtained.
Step 112, constructing a disease signal pathway map in units of disease based on the entity and entity relationship.
Specifically, after obtaining the entity and the entity relationship based on each medical research literature, a disease signal path graph may be constructed based on the entity and the entity relationship, each entity constituting a node in the disease signal path graph, each entity relationship constituting an edge connection between the entities in the disease signal path graph, and a specific relationship between the entities (i.e., a specific content of the entity relationship) constituting an attribute of the edge. In order to make the constructed disease signal path map clearer, more comprehensive and more accurate, the disease signal path map can be constructed based on the relationship between the entities and the heat degree of the entity pair, for example, the heat degree of the entity pair can be represented by the thickness of the edge, so that information display is more visual, and a user can conveniently and quickly acquire related knowledge information from the map.
It should be appreciated that the heat of an entity pair may be determined based on literature reference information of a medical research literature, where literature reference information refers to information and data related to the medical research literature being referenced, and may be used to evaluate the impact and usefulness of the medical research literature. Literature citation information includes, but is not limited to: impact factors, JCR (Journal Citation Reports, journal reference report) partition, center department partition, reference number, month average reference number, etc. When determining the heat degree of the entity pair, each medical research document can be scored based on document citation information to obtain the influence score of each medical research document, and then the heat degree of the entity pair can be obtained based on the influence scores of the entity pair on all the medical research documents.
When the influence score of each medical research document is obtained, the document reference information of each medical research document, that is, indexes such as an influence factor, a JCR partition, a department of Chinese science partition, a reference number, an average monthly reference number and the like, can be obtained first, then the medical research document can be scored after comprehensively considering all the indexes, the score can be set to be 1-5, and the obtained score is the influence score of the medical research document. Each medical research document may be scored based on the respective indexes to obtain a score corresponding to each index, for example, when the medical research document is scored based on the influence factor, the influence factor may be divided into five steps of 0 to 20%, 21 to 40%, 41 to 60%, 61 to 80%, 81 to 100%, each step corresponds to 1 minute, 2 minutes, 3 minutes, 4 minutes, and 5 minutes in order, and when the influence factor of the medical research document is 57%, the medical research document may be scored 3 minutes, and after the score corresponding to each index is obtained, the sum of the scores corresponding to all the indexes is used as the influence score of the medical research document. The manner in which the impact scores of the medical research literature are obtained in the embodiments of the present invention is not particularly limited.
After obtaining the impact scores for each medical study, a secondary weighted score may be performed on all medical study occurrences based on the entity pair, and the heat of the entity pair may be obtained, thereby constructing a disease signaling pathway map based on the heat of the entity pair.
According to the embodiment of the invention, the entity relationship among the entities is comprehensively and accurately obtained by carrying out entity identification on sentences in each medical research literature, and the disease signal path map is constructed based on the entity relationship and the heat, so that information in massive biomedical data can be comprehensively and accurately extracted into the disease signal path map, the efficiency and reliability of information extraction are improved, the time and cost required by information extraction are reduced, and the heat of the entity relationship is marked in the disease signal path map, so that information display is more visual.
Based on the above embodiment, fig. 5 is a schematic flow chart of step 111 in the map construction method provided by the present invention, and as shown in fig. 5, step 111 specifically includes:
step 1111, acquiring title text and abstract text of the medical research literature;
specifically, when the entity and entity relation are mined from the text of each medical research document, since the text of each medical research document is long in length and most of the text is redundant information, in order to improve the efficiency of entity recognition extraction, the extraction range of the text can be set as a title text and a abstract text. After obtaining the text information of each medical research document, the title information and the abstract information can be extracted based on the text information, so that the title text and the abstract text are obtained.
In order to further improve the efficiency of entity mining and extraction, the following steps 1112 and 1113 may be executed to perform classification judgment on the title text and the abstract text, distinguish the primary information and the secondary information from the title text and the abstract text, reserve sentences containing the primary information in the title text and the abstract text, and discard sentences containing the secondary information in the title text and the abstract text, thereby performing entity recognition on the sentences in the reserved title text and abstract text, and further improving the efficiency of entity extraction.
Step 1112, inputting the headline text and the abstract text into a sentence classifier to obtain sentence types of each sentence in the headline text and the abstract text output by the sentence classifier;
specifically, the sentence classifier is obtained by training a pre-training model based on manually-marked training data and is used for judging and identifying sentence types of sentences in the headline text and the abstract text. The sentence types herein include sentences to be recognized, which are sentences containing main information in the index question text and the abstract text, and sentences not to be recognized, which are sentences not containing main information in the index question text and the abstract text, which are text information related to the entity and entity relation to be extracted.
Inputting the title text and abstract text of the medical research document obtained into a sentence classifier, classifying by the sentence classifier, judging the sentence types of each sentence in the title text and abstract text of the medical research document, if the sentence contains main information, retaining the sentence, and outputting the sentence types of the sentence as the sentence to be identified; if the statement does not contain the main information, the statement is discarded.
Before step 1112 is performed, a sentence classifier may be trained, and in particular, the sentence classifier may be trained as follows: the method comprises the steps of obtaining a pre-training model, for example, a PubMedBert pre-training model, collecting a large number of sample title texts and abstract texts, manually marking and identifying the sentence types of each sentence in the sample title texts and the abstract texts, and training the pre-training model based on the sample title texts and the abstract texts and the manually marked sentence types to obtain a sentence classifier.
And step 1113, performing entity recognition on the sentences with the types of the sentences to be recognized to obtain entity pairs contained in the medical research literature.
Specifically, the sentences to be identified are sentences containing main information which are reserved in the index question text and the abstract text after being classified and judged by a sentence classifier. When entity recognition is carried out on the sentences of the type, firstly, entity extraction can be carried out by using a rule matching scheme based on a pre-constructed standard dictionary, and all entities which appear in the pre-constructed standard dictionary are extracted, so that the accuracy of entity extraction based on the standard dictionary is ensured; and secondly, as a plurality of entities or entity aliases possibly not included in the pre-constructed standard dictionary, the named entity recognition model can be adopted to further perform entity recognition extraction on the sentences to be recognized, and the semantics in the sentences are analyzed through the named entity recognition model, so that the entities required for constructing the disease signal path map are extracted, and the finally extracted entities are ensured to be more comprehensive and accurate.
It should be noted that, the named entity recognition model herein is a basic information extraction task model in the field of natural language processing, and is used to recognize named entities in a given text and classify the entities. Because the entity to be extracted is of various types including disease entity, molecular entity, disease physiological mechanism entity, etc., in order to realize the extraction of various entities, the extraction can be realized by adopting a multi-task architecture on a named entity recognition model, the input of the named entity recognition model is a sentence to be recognized in a title text and a abstract text, the output of the named entity recognition model is the result of a plurality of tasks, and each task is the extraction of an entity, such as a disease entity extraction task, a molecular entity extraction task, etc.
According to the embodiment of the invention, the extraction range of the texts is set to be the title text and the abstract text by acquiring the title text and the abstract text of each medical research document, so that the efficiency of entity identification and entity relation extraction can be improved, the sentence classifier is used for classifying and judging the title text and the abstract text, the extraction range of the texts can be further reduced, the efficiency of entity and entity relation extraction is improved, and the efficiency of constructing the disease signal path map is further improved.
Based on the above embodiment, since the title text and the abstract text contain a large number of pronouns, these pronouns may also represent some entities, and in order to further ensure that the entity and entity relationship identification extraction is comprehensive and accurate, a new task may be added in the named entity identification model of the multitasking architecture, i.e. the pronouns in the title text and the abstract text are extracted. For this purpose, the entity recognition is performed in step 111 or step 1113 to obtain the entity pairs included in the medical study document, including:
performing entity recognition on the sentence to obtain an entity and a pronoun in the sentence;
inputting the entity and the pronoun into a reference relation classifier to obtain a reference relation between the entity and the pronoun output by the reference relation classifier;
under the condition that the index relation is yes, replacing the pronouns in the sentence with corresponding entities to obtain an optimized sentence;
and inputting the optimized statement and the entities in the optimized statement into the entity relation classifier to obtain the entity relation among the entities output by the entity relation classifier.
Specifically, when entity recognition is performed on sentences in the title text and the abstract text, entity extraction can be performed by utilizing a rule matching mode based on a pre-constructed standard dictionary; and extracting by using a named entity recognition model of the multitasking architecture to obtain all entities and pronouns contained in the sentence. After obtaining numerous entities and related pronouns, it is necessary to associate entities with entities and entities with pronouns. Firstly, for all pronouns and entities, a reference relation classifier can be used for judging whether each pronoun and each entity have a reference relation, the identified entity and pronoun as well as title text and abstract text are input into the reference relation classifier, the reference relation classifier classifies the relation between the pronouns and the entity in a two-classification mode, whether the pronouns and the entity have the reference relation is judged, and in the case that the reference relation is yes, the fact that the pronouns refer to the entity is indicated, namely the pronouns are equivalent to the entity, and the pronouns can be replaced by the entity; if the reference is no, indicating that the pronoun is not equivalent to the entity, the pronoun may be deleted, thereby obtaining an optimized sentence.
And secondly, after the entity is associated with the pronoun, the entity is also required to be associated with the entity, so that the entity relationship between the entities is obtained. Because there may be many cases where the same entity has different representations, such as a full name and abbreviation of one entity, in the same medical research document, after obtaining the optimized statement, the relationships between different entities in the same entity class may be determined based on the entity relationship classifier. For example, the entity A and the entity B in the optimization statement are input into the entity relation classifier, and the entity relation between the entity A and the entity B output by the entity relation classifier can be obtained. The entity relationship herein may be any one of the following: entity a is equal to entity B, entity a includes entity B, entity B includes entity a, and entity B are independent. Because the relationship types of different entities among the same entity category are more, an entity relationship classifier based on a multi-classification model can be adopted to classify and judge the entity relationship of the type.
Finally, after associating the entities with the pronouns and the entities with different expressions among the same category, the relationships among all the entities need to be judged, so as to obtain the entity relationships among the entities. Here, the discrimination of the relationships between all the entities includes discrimination of the relationships between the entities of the same entity class, such as the disease entity 1 and the disease entity 2; discrimination of entity relationships between entities of different entity classes, such as an entity in a disease and an entity in a molecule, is also included. After the optimization statement is obtained, the optimization statement and two entities of different entity categories in the optimization statement are input into the entity relation classifier, and the entity relation between the two entities output by the entity relation classifier can be obtained. In order to judge the entity relation among all the entities, the entity relation classifier based on the token-classification model can be adopted for implementation.
In the embodiment of the invention, the pronouns in the sentences are processed firstly, and under the condition that the reference relationship between the pronouns and the entities is yes, the pronouns in the sentences are replaced by the corresponding entities, so that the semantics of the obtained optimized sentences are clearer, the entity relationship among the entities is conveniently obtained by the subsequent recognition based on the optimized sentences, and the entity relationship can be recognized and mined more efficiently, comprehensively and accurately.
Based on the above embodiment, the entity recognition in step 111 or step 1113 is performed to obtain the entity pair included in the medical study document, which further includes:
filling the optimized statement and the entities in the optimized statement into a relation query template to obtain a relation query statement;
and inputting the relation query sentence into the question-answer language model to obtain the entity relation output by the question-answer language model.
Specifically, when entity recognition is performed and entity relation is obtained through mining, the method can also be obtained through prediction by using a question-answer language model. Here, the question-answering language model may be a LLM (Large Language Model) language model. After the optimized statement is obtained, the optimized statement and the entity extracted from the optimized statement are filled into a relation query template, namely, according to the optimized statement and the entity extracted from the optimized statement, relevant contents in the relation query template are correspondingly replaced, so that the relation query statement is obtained, the relation query statement is used as input of a question-answer language model, and analysis is carried out on the basis of a given format through the question-answer language model, so that the entity relation output by the question-answer language model is obtained.
Illustratively, the relational query template may be:
"hereinafter, a section of biomedical science and the entities contained therein is given, please judge the possible binary relation between the entities according to the text semantics. Input [ text content ] input [ entity content ], entity relationships that may exist: [ entity relationship description ]. Please output the results row by row in the following format "(entity 1, entity 2, relationship). "
In the relation query template, the content in the brackets needs to be replaced correspondingly according to the optimization statement and the entity obtained by extraction, so that the relation query statement can be obtained.
In order to further obtain a more comprehensive and accurate entity relationship, the entity relationship output by the question-answering language model may be combined with the entity relationship obtained based on the entity relationship classifier in the above embodiment.
Based on any one of the above embodiments, the embodiment of the present invention provides a method for constructing a map, by efficiently, comprehensively and accurately combing a disease pathogenesis based on a medical research literature to generate a disease signal path map with a disease as a unit, correlating the disease signal path maps of each disease based on the same molecular entity in the disease signal path maps of each disease to obtain a disease relationship map, mining a potential association relationship between a disease and a target point based on the disease relationship map, and further mining a potential association relationship between a drug and the disease in combination with a predetermined relationship between the drug and the target point, thereby constructing a drug-disease relationship map to help a user to efficiently and accurately find new indication information of the drug.
Specifically, fig. 6 is a schematic structural diagram of a disease signal path map provided by the present invention, which may be constructed as follows: firstly, acquiring medical research documents, wherein the medical research documents are obtained based on secondary processing documents such as document exclusion reviews, guidelines and the like in a PubMed document retrieval database; based on medical research literature, acquiring target entities and entity relationships among the target entities, wherein the target entities can comprise disease entities, molecular entities and pathophysiological mechanism entities, and the entity relationships can comprise positive (up-regulation, activation and promotion) regulation relationships, negative (down-regulation, inactivation and inhibition) regulation relationships and the like; based on literature reference information of medical research literature, such as influence factors, JCR partitions, chinese department partitions, reference numbers, month average reference numbers and the like, the heat of entity relationship between target entities can be obtained; thus, based on the target entity, entity relationship and heat, a disease signal path map in units of disease can be constructed.
As shown in fig. 6, in the embodiment of the present invention, disease signal path maps of disease a, disease B, and disease C are respectively constructed, in which each target entity forms a node, and entity names, such as molecule 1, pathophysiological mechanism 1, molecule 2, disease a, and the like, may be labeled at the node; the entity relationship between entities forms an edge connection, and the entity relationship between the entities can be represented by an arrow direction or an arrow style as positive adjustment or negative adjustment, and the entity relationship between the entities is represented by an arrow style as shown in fig. 6, namely, the entity relationship between the entities is represented by a solid arrow as positive adjustment, and the entity relationship between the entities is represented by a hollow arrow as negative adjustment; the specific relationship between entities constitutes the property of an edge, and the physical relationship of molecule 1 and pathophysiological mechanism 1 as shown in fig. 6 is a promoting relationship, and thus, "promoting" constitutes the property of a connecting edge between these two nodes. Furthermore, the number of medical research documents related to the entity may be represented by a node size, e.g. a larger node for disease a, indicating a larger number of medical research documents related to disease a; the heat of the entity relationship can also be represented by the thickness of the edge, and the thicker the edge is, the higher the heat of the entity relationship is.
Further, the molecular entity may be a target entity and/or a biomarker entity, and thus, a target and/or a biomarker may be labeled at a molecular node in the disease signaling pathway map of each disease, as shown in fig. 6, both molecule 1 and molecule 3 are labeled as "targets", and both molecule 2 and molecule 4 are labeled as "biomarkers".
FIG. 7 is a schematic diagram of the relationship pattern between diseases, which is provided by the invention, after the relationship pattern between diseases is constructed, the relationship pattern between diseases can be generated by correlating the relationship pattern between diseases based on the same molecular entity. As shown in fig. 6, when the disease signal pathway patterns of the respective diseases are correlated, the disease signal pathway pattern corresponding to the disease a may be selected as the target disease signal pathway pattern, and based on the target disease signal pathway pattern, all the molecular entities may be obtained as: and comparing each obtained molecular entity with the disease signal path patterns corresponding to the diseases B and C in sequence to judge whether the disease signal path patterns corresponding to the diseases B and C contain the molecular entity. Since the disease signal pathway pattern corresponding to disease C comprises molecule 2, disease C can be correlated into the target disease signal pathway pattern based on molecule 2; since the disease signal pathway pattern corresponding to disease B contains molecule 3, disease B can be correlated into the target disease signal pathway pattern based on molecule 3. After the association is completed, a relationship graph between diseases shown in fig. 7 can be generated, wherein the heat degree of the entity relationship in the graph is the same as that of the entity relationship in the disease signal path graph.
Based on the relationship map between diseases, the potential association relationship between the diseases and the target points can be mined, and particularly the mining can be performed in the following mode: based on the relationship map between diseases, a molecular entity which is connected with at least two disease entities is obtained, and the molecular entity is only marked as a biomarker (such as a molecule 2), if one of all the disease signal paths (such as signal paths of a disease A and a disease C) connected with the molecular entity contains a target entity (such as signal path of a disease A contains a molecule 1), and the target entity is not related with disease entities of other disease signal paths (such as molecule 1 and disease C are not related in the disease signal path map), the relationship between the target entity and the disease entity can be determined to be a potential relationship (such as the relationship between the molecule 1 and the disease C is a potential relationship).
As shown in fig. 7, based on the relationship map between diseases, the potential association relationship between the diseases and the targets can be also mined by the following ways: based on the relationship graph between diseases, signal paths for connecting each disease entity and each target entity, such as a molecular 1-pathophysiological mechanism 1-molecular 2-disease A, a molecular 1-pathophysiological mechanism 1-molecular 2-disease C, a molecular 3-disease A and a molecular 3-disease B, are obtained, each signal path is sequentially compared with the signal paths in the disease signal path graph of each disease shown in fig. 6, whether the signal paths exist in the disease signal path graph is judged, and because the signal path molecular 1-pathophysiological mechanism 1-molecular 2-disease A, the molecular 3-disease A and the molecular 3-disease B exist in the disease signal path graph shown in fig. 6, the relationship between the target entity corresponding to the signal paths and the disease entity can be determined to be an actual relationship, namely, the relationship between the molecular 1 and the disease A, the molecular 3 and the disease A and the relationship between the molecular 3 and the disease B are all actual relationships; since the signaling pathway molecule 1-pathophysiological mechanism 1-molecule 2-disease C does not exist in the disease signaling pathway map shown in fig. 6, it can be determined that the relationship between the target entity corresponding to the signaling pathway and the disease entity is a potential association relationship, that is, the relationship between the molecule 1 and the disease C is a potential association relationship.
After the potential association relationship between the diseases and the target points is obtained through excavation, the prompt can be carried out on the relationship map between the diseases in a highlighting mode and the like, for example, in fig. 7, the nodes of the molecule 1 and the disease C can be highlighted, so that information display is more visual.
FIG. 8 is a schematic diagram of the structure of the target-disease relationship map provided by the present invention, and the target-disease relationship map can be constructed based on the relationship map between diseases, as shown in FIG. 8, the target-disease relationship map includes: target entity, disease entity, entity relationship between target entity and disease entity. Based on the inter-disease relationship profile, a target-disease relationship profile can be generated by: as shown in fig. 7, signal paths for connecting each disease entity and each target entity, such as a molecular 1-pathophysiological mechanism 1-molecular 2-disease a, a molecular 1-pathophysiological mechanism 1-molecular 2-disease C, a molecular 3-disease a, and a molecular 3-disease B, are acquired first, for each signal path, entity relationships between nodes are acquired sequentially, and then the number of relationships between all the entity relationships acquired in the signal path, which are negative regulation, is counted, and under the condition that the number of relationships is even, it can be determined that the relationship between the disease entity corresponding to the signal path and the target entity is a positive regulation relationship; under the condition that the number of the relations is odd, the relation between the disease entity corresponding to the signal path and the target entity can be determined to be a negative regulation relation.
For example, for signaling pathway molecule 1-pathophysiological mechanism 1-molecule 2-disease a, the physical relationships between the nodes obtained include: promoting, inhibiting and positively regulating, and the promoting belongs to a positively regulating relationship and the inhibiting belongs to a negatively regulating relationship, so that the relationship between each node in the signal path is that the number of the negatively regulating relationships is 1, and the relationship between the molecule 1 and the disease A can be determined to be the negatively regulating relationship. Similarly, the relationship between molecule 1 and disease C may be determined to be a negative regulatory relationship, the relationship between molecule 3 and disease A may be determined to be a positive regulatory relationship, and the relationship between molecule 3 and disease B may be determined to be a negative regulatory relationship.
Because the entities in the target-disease relationship map include target entities and disease entities, for a clearer display of target entities, "molecule 1" may be replaced with "target 1" and "molecule 3" may be replaced with "target 2" for display. In addition, the relationship between the target point 1 and the disease C is a potential association relationship obtained based on the mining of the relationship map between diseases, and the potential association relationship and the actual association relationship can be distinguished by changing line patterns and the like (for example, the potential association relationship is represented by a dotted line, the actual association relationship is represented by a solid line) during association, so that the target point-disease relationship map shown in fig. 8 is obtained.
Fig. 9 is a schematic structural diagram of a drug-target association and a drug-disease association map provided by the present invention, wherein the drug-target association shown in fig. 9 (a) is a predetermined drug-target association based on an already defined drug action target, for example, drug 1-target 1 and drug 2-target 2. After the target-disease relationship map is constructed, the target-disease relationship map and the relationship between the predetermined drug and the target can be associated based on the same target entity, so that the drug-disease relationship map is constructed, and the specific association mode is as follows:
as shown in fig. 8 and fig. 9 (a), since the association relationship between the drug 1 and the target 1 exists, the negative regulation and the actual association relationship exist between the target 1 and the disease a, and the negative regulation and the potential association relationship exist between the target 1 and the disease C, the association is performed based on the target 1, so that the relationship between the drug 1 and the disease a is the negative regulation and the actual association relationship, and the relationship between the drug 1 and the disease C is the negative regulation and the potential association relationship. Similarly, the relationship between the drug 2 and the disease a can be determined as positive regulation and actual association, and the relationship between the drug 2 and the disease B as negative regulation and actual association, whereby a drug-disease relationship map as shown in fig. 9 (B) can be constructed.
The relationship between the medicine and the disease and the relationship between the target point and the disease are the same, so that the form of the medicine-disease relationship graph and the form of the target point-disease relationship graph are the same, after the potential relationship between the medicine and the disease is excavated, the potential relationship and the actual relationship can be distinguished through different line patterns, and the potential relationship between the medicine and the disease can be rapidly known by a user through highlighting and other obvious modes.
Based on any of the above embodiments, fig. 10 is a schematic flow chart of a search method provided by the present invention, as shown in fig. 10, the method includes:
step 1010, obtaining a target entity to be queried;
step 1020, determining relevant knowledge information of the target entity based on a drug-disease relationship profile, which is determined based on a profile construction method as described in any of the above.
Specifically, for medical research literature from various sources, the disease pathogenesis can be efficiently, comprehensively and accurately combed by the method provided by the embodiment to generate a disease signal path map, and the relationship between the disease and the target point is excavated by correlating the disease signal path maps of all diseases, so that the drug-disease relationship map can be constructed based on the correlation relationship between the drug and the target point. Based on this, an information retrieval system can be constructed so that a user can quickly inquire and acquire relevant knowledge information of diseases, targets or medicines.
The user can input the target entity to be queried through the user terminals in the forms of mobile phones, computers, tablet computers and the like, and the target entity to be queried is sent to the server side of the information retrieval system. Here, the target entity to be queried may be any one or a combination of several kinds of entity information such as a disease, a target point, a drug, and the like.
In the process of constructing and obtaining the medicine-disease relation map, a disease signal path map, a relation map between diseases and a target-disease relation map are also generated, so that after receiving a target entity to be queried, a server can search based on the disease signal path map, the relation map between diseases, the target-disease relation map and the medicine-target relation map, locate a node corresponding to the target entity and other nodes or all levels of sub-nodes in connection with the node, intercept a local map containing the node and the connection relation of the target entity from the maps, and return relevant knowledge information of the target entity to a user terminal for the user to check. For example, a user retrieving a target may output a map of a disease signal pathway and/or a map of relationships between diseases and/or a map of target-disease relationships that includes the target.
According to the retrieval method provided by the embodiment of the invention, the rapid retrieval and inquiry of relevant knowledge information such as diseases, medicines and targets are realized based on various constructed maps, so that a user can be helped to efficiently and accurately acquire new indication information of medicines, the user can optimize the product structure, the market share is improved, the profit of enterprises is improved, and the sustainable development goal is realized.
Based on any of the above embodiments, fig. 11 is a schematic structural diagram of a map building apparatus according to the present invention, as shown in fig. 11, the apparatus includes:
an initial map construction unit 1110 for constructing a disease signal pathway map in units of a disease based on medical study literature;
the map association unit 1120 is configured to associate the disease signal pathway maps of each disease based on the same molecular entity in the disease signal pathway maps of each disease to obtain an inter-disease relationship map, where the molecular entity is a biomarker entity and/or a target entity;
a target profile construction unit 1130 for constructing a drug-disease relationship profile based on the relationship between the disease entity and the target entity in the inter-disease relationship profile and the relationship between the predetermined drug and the target.
According to the map construction device provided by the embodiment of the invention, the map of the disease signal path taking the disease as a unit is constructed based on the medical research literature, so that information in a large amount of medical research literature can be comprehensively and accurately extracted into the map of the disease signal path of each disease, the efficiency of information mining and extraction is improved, the potential association relationship between the disease and the target point can be mined by associating the map of the disease signal path of each disease, and the potential association relationship between the drug and the disease is further mined based on the relationship between the predetermined drug and the target point, and therefore, the constructed drug-disease relationship map can help a user to efficiently and accurately mine new indications of the drug, and the efficiency, reliability and accuracy of drug information mining are improved.
Based on any of the above embodiments, the target map construction unit 1130 includes a first relationship determination unit including:
the signal path acquisition subunit is used for acquiring signal paths for connecting each disease entity and each target entity in the inter-disease relation map;
and the relationship determination subunit is used for determining the relationship between the disease entity corresponding to the signal path and the target entity based on the relationship between the nodes in the signal path.
Based on any of the above embodiments, the relationship determination subunit is specifically configured to:
counting the number of relationships between nodes in a signal path, wherein the relationships are negatively regulated relationships;
based on the number of relationships, determining that the relationship between the disease entity corresponding to the signal pathway and the target entity is a positive regulation relationship or a negative regulation relationship.
Based on any of the above embodiments, the target map construction unit 1130 further includes a second relationship determination unit for:
if the signal path exists in the disease signal path map, determining that the relationship between the disease entity corresponding to the signal path and the target entity is an actual association relationship;
otherwise, determining that the relationship between the disease entity corresponding to the signal path and the target entity is a potential association relationship
Based on any of the above embodiments, the apparatus further comprises a document acquisition unit for:
acquiring an initial document set;
screening secondary processing documents from the initial document collection based on the publishing types of all the initial documents in the initial document collection to obtain a candidate document collection;
screening the candidate literature collection to obtain medical research literature.
Based on any of the above embodiments, the initial map construction unit 1110 includes:
The entity acquisition unit is used for carrying out entity identification on sentences in the medical research literature to obtain entity pairs contained in the medical research literature, wherein the entity pairs comprise entities and entity relations among the entities;
and the map construction unit is used for constructing a disease signal path map taking the disease as a unit based on the entity and the entity relationship.
Based on any of the above embodiments, the entity obtaining unit includes:
a text acquisition subunit for acquiring title text and abstract text of the medical study document;
the sentence classification subunit is used for inputting the title text and the abstract text into the sentence classifier to obtain sentence types of each sentence in the title text and the abstract text output by the sentence classifier;
and the entity recognition subunit is used for carrying out entity recognition on sentences with the sentence types to be recognized to obtain entity pairs contained in the medical research literature.
Based on any of the above embodiments, the entity obtaining unit or the entity identifying subunit is configured to:
performing entity recognition on the sentence to obtain an entity and a pronoun in the sentence;
inputting the entity and the pronoun into a reference relation classifier to obtain a reference relation between the entity and the pronoun output by the reference relation classifier;
Under the condition that the index relation is yes, replacing the pronouns in the sentence with corresponding entities to obtain an optimized sentence;
and inputting the optimized statement and the entities in the optimized statement into the entity relation classifier to obtain the entity relation among the entities output by the entity relation classifier.
Based on any of the above embodiments, the entity obtaining unit or the entity identifying subunit is further configured to:
filling the optimized statement and the entities in the optimized statement into a relation query template to obtain a relation query statement;
and inputting the relation query sentence into the question-answer language model to obtain the entity relation output by the question-answer language model.
Based on any of the above embodiments, fig. 12 is a schematic structural diagram of a search device provided by the present invention, as shown in fig. 12, the device includes:
an obtaining unit 1210, configured to obtain a target entity to be queried;
a retrieving unit 1220 for determining relevant knowledge information of the target entity based on a drug-disease relationship map, which is determined based on a map construction method as described above.
According to the retrieval device provided by the embodiment of the invention, through realizing rapid retrieval and inquiry of relevant knowledge information such as diseases, medicines and targets based on various constructed maps, a user can be helped to efficiently and accurately acquire new indication information of medicines, so that the user can optimize the product structure, the market share is improved, the profit of enterprises is improved, and the sustainable development goal is realized.
Fig. 13 illustrates a physical structure diagram of an electronic device, as shown in fig. 13, which may include: processor 1310, communication interface (Communications Interface) 1320, memory 1330 and communication bus 1340, wherein processor 1310, communication interface 1320, memory 1330 communicate with each other via communication bus 1340. Processor 1310 may invoke logic instructions in memory 1330 to perform a graph construction method comprising: constructing a disease signal pathway map in units of a disease based on medical research literature; correlating the disease signal pathway patterns of each disease based on the same molecular entity in the disease signal pathway patterns of each disease to obtain a relationship pattern between the diseases, wherein the molecular entity is a biomarker entity and/or a target entity; constructing a drug-disease relationship graph based on the relationship between the disease entity and the target entity in the inter-disease relationship graph and the relationship between the predetermined drug and the target.
Further, the processor 1010 may invoke logic instructions in the memory 1030 to perform a retrieval method comprising: acquiring a target entity to be queried; and determining relevant knowledge information of the target entity based on a medicine-disease relation map, wherein the medicine-disease relation map is determined based on the map construction method.
Further, the logic instructions in the memory 1330 can be implemented in the form of software functional units and can be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the map construction method provided by the methods described above, the method comprising: constructing a disease signal pathway map in units of a disease based on medical research literature; correlating the disease signal pathway patterns of each disease based on the same molecular entity in the disease signal pathway patterns of each disease to obtain a relationship pattern between the diseases, wherein the molecular entity is a biomarker entity and/or a target entity; constructing a drug-disease relationship graph based on the relationship between the disease entity and the target entity in the inter-disease relationship graph and the relationship between the predetermined drug and the target.
In addition, the computer can also execute the searching method provided by the methods, and the method comprises the following steps: acquiring a target entity to be queried; and determining relevant knowledge information of the target entity based on a medicine-disease relation map, wherein the medicine-disease relation map is determined based on the map construction method.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method of constructing a map provided by the above methods, the method comprising: constructing a disease signal pathway map in units of a disease based on medical research literature; correlating the disease signal pathway patterns of each disease based on the same molecular entity in the disease signal pathway patterns of each disease to obtain a relationship pattern between the diseases, wherein the molecular entity is a biomarker entity and/or a target entity; constructing a drug-disease relationship graph based on the relationship between the disease entity and the target entity in the inter-disease relationship graph and the relationship between the predetermined drug and the target.
The computer program, when executed by a processor, is implemented to perform the retrieval method provided by the methods described above, the method comprising: acquiring a target entity to be queried; and determining relevant knowledge information of the target entity based on a medicine-disease relation map, wherein the medicine-disease relation map is determined based on the map construction method.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A map construction method, characterized by comprising:
constructing a disease signal pathway map in units of a disease based on medical research literature;
correlating the disease signal pathway patterns of each disease based on the same molecular entity in the disease signal pathway patterns of each disease to obtain a relationship pattern between diseases, wherein the molecular entity is a biomarker entity and/or a target entity;
constructing a drug-disease relationship map based on the relationship between the disease entity and the target entity in the relationship map and the relationship between the predetermined drug and the target.
2. The map construction method according to claim 1, wherein the determining step of the relationship between the disease entity and the target entity in the inter-disease relationship map comprises:
Acquiring signal paths used for connecting each disease entity and each target entity in the inter-disease relation map;
and determining the relation between the disease entity and the target entity corresponding to the signal path based on the relation between the nodes in the signal path.
3. The map construction method according to claim 2, wherein the determining the relationship between the disease entity and the target entity corresponding to the signal path based on the relationship between the nodes in the signal path comprises:
counting the number of relationships between nodes in the signal path, wherein the relationships are negatively regulated;
based on the number of relationships, determining that the relationship between the disease entity corresponding to the signal path and the target entity is a positive regulation relationship or a negative regulation relationship.
4. The method of claim 2, wherein the step of determining the relationship between the disease entity and the target entity in the inter-disease relationship map further comprises:
if the signal path exists in the disease signal path map, determining that the relationship between the disease entity corresponding to the signal path and the target entity is an actual association relationship;
Otherwise, determining the relationship between the disease entity corresponding to the signal path and the target entity as a potential association relationship.
5. The method of claim 1, wherein constructing a disease signal pathway map in units of disease based on medical research literature comprises:
performing entity recognition on sentences in the medical research literature to obtain entity pairs contained in the medical research literature, wherein the entity pairs comprise entities and entity relations among the entities;
constructing the disease signaling pathway map in units of disease based on the entity and the entity relationship.
6. The method for constructing a map according to claim 5, wherein the entity recognition of the sentence in the medical research literature to obtain the entity pair contained in the medical research literature includes:
acquiring title text and abstract text of the medical research literature;
inputting the title text and the abstract text into a sentence classifier to obtain sentence types of each sentence in the title text and the abstract text output by the sentence classifier;
and carrying out entity recognition on the sentences with the sentence types to be recognized to obtain entity pairs contained in the medical research literature.
7. The method for constructing a map according to claim 5 or 6, wherein said performing entity recognition to obtain the entity pairs contained in the medical study document comprises:
performing entity recognition on the sentence to obtain an entity and a pronoun in the sentence;
inputting the entity and the pronoun into a reference relation classifier to obtain a reference relation between the entity and the pronoun output by the reference relation classifier;
under the condition that the reference relation is yes, replacing the pronouns in the sentence with corresponding entities to obtain an optimized sentence;
and inputting the optimizing statement and the entity in the optimizing statement into an entity relation classifier to obtain the entity relation between the entities output by the entity relation classifier.
8. A retrieval method, comprising:
acquiring a target entity to be queried;
determining relevant knowledge information of the target entity based on a drug-disease relationship profile determined based on the profile construction method according to any one of claims 1 to 7.
9. A map construction apparatus, comprising:
an initial map construction unit for constructing a disease signal pathway map in units of a disease based on medical study literature;
The map association unit is used for associating the disease signal path maps of the diseases based on the same molecular entities in the disease signal path maps of the diseases to obtain a relationship map between the diseases, wherein the molecular entities are biomarker entities and/or target entities;
a target map construction unit, configured to construct a drug-disease relationship map based on a relationship between a disease entity and a target entity in the inter-disease relationship map, and a predetermined relationship between a drug and a target.
10. A search device, comprising:
the acquisition unit is used for acquiring a target entity to be queried;
a retrieval unit for determining relevant knowledge information of the target entity based on a drug-disease relationship profile determined based on the profile construction method according to any one of claims 1 to 7.
CN202310572446.0A 2023-05-19 2023-05-19 Map construction and retrieval method, device, electronic equipment and storage medium Pending CN116775896A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310572446.0A CN116775896A (en) 2023-05-19 2023-05-19 Map construction and retrieval method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310572446.0A CN116775896A (en) 2023-05-19 2023-05-19 Map construction and retrieval method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116775896A true CN116775896A (en) 2023-09-19

Family

ID=87992067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310572446.0A Pending CN116775896A (en) 2023-05-19 2023-05-19 Map construction and retrieval method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116775896A (en)

Similar Documents

Publication Publication Date Title
CN111986770B (en) Prescription medication auditing method, device, equipment and storage medium
CN109213870B (en) Document processing
Meystre et al. Automatic trial eligibility surveillance based on unstructured clinical data
Berk et al. Forecasts of violence to inform sentencing decisions
CN114026651A (en) Automatic generation of structured patient data records
Sarker et al. Automatic evidence quality prediction to support evidence-based decision making
US20210375488A1 (en) System and methods for automatic medical knowledge curation
Maraut et al. Identifying author–inventors from Spain: methods and a first insight into results
CN113742493A (en) Method and device for constructing pathological knowledge map
Dos Santos et al. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: A scoping review
Xie et al. A network embedding-based scholar assessment indicator considering four facets: Research topic, author credit allocation, field-normalized journal impact, and published time
CN116775897A (en) Knowledge graph construction and query method and device, electronic equipment and storage medium
Norman Systematic review automation methods
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
CN114547346B (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN115938608A (en) Clinical decision early warning method and system based on prompt learning model
CN116775896A (en) Map construction and retrieval method, device, electronic equipment and storage medium
CN105956362B (en) A kind of believable case history structural method and system
CN112735584A (en) Malignant tumor diagnosis and treatment auxiliary decision generation method and device
CN114121293A (en) Clinical trial information mining and inquiring method and device
CN114238639A (en) Construction method and device of medical term standardized framework and electronic equipment
Cevallos et al. Fake news detection on COVID 19 tweets via supervised learning approach
CN113870998A (en) Interrogation method, device, electronic equipment and storage medium
Kasthurirathne et al. Machine Learning Approaches to Identify Nicknames from A Statewide Health Information Exchange
CN116895385A (en) Target information analysis method and target information query method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination