CN114927168B - Construction method of biomechanical regulation and control bone reconstruction text mining interaction website - Google Patents

Construction method of biomechanical regulation and control bone reconstruction text mining interaction website Download PDF

Info

Publication number
CN114927168B
CN114927168B CN202210606098.XA CN202210606098A CN114927168B CN 114927168 B CN114927168 B CN 114927168B CN 202210606098 A CN202210606098 A CN 202210606098A CN 114927168 B CN114927168 B CN 114927168B
Authority
CN
China
Prior art keywords
text
gene
biomechanical
database
mining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210606098.XA
Other languages
Chinese (zh)
Other versions
CN114927168A (en
Inventor
经典
蔡靖仪
赵志河
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202210606098.XA priority Critical patent/CN114927168B/en
Publication of CN114927168A publication Critical patent/CN114927168A/en
Application granted granted Critical
Publication of CN114927168B publication Critical patent/CN114927168B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Bioethics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a biomechanical regulation and control bone reconstruction text mining interaction website construction method, which comprises the following steps: s1, screening text words of gene information in a document according to related terms, acquiring a pair of gene-molecule interaction relations, and constructing a document database; s2, calculating the correlation between the target retrieval factor and the classical mechanical sensitive path by adopting a weight algorithm based on the gene-molecule interaction relation pair in the literature database; and S3, visually displaying the correlation between the target retrieval factors and the classical mechanical sensitive channels, displaying the gene molecules in the classical mechanical sensitive channels as interconnected nodes, and linking to corresponding documents in a document database by clicking on a connecting line between the nodes. The website of the application is based on open literature resources, utilizes a natural language processing strategy to mine a text database, creates a first bone-related biomechanical text database, innovatively introduces a visual mode, and establishes a brand new analysis strategy based on biological paths.

Description

Construction method of biomechanical regulation and control bone reconstruction text mining interaction website
Technical Field
The application relates to the technical field of biomechanical website construction, in particular to a biomechanical regulation and control bone reconstruction text mining interaction website construction method.
Background
Bone tissue congenital hypogenesis, dysplasia, bone tissue defects or deletions are relatively common clinical problems, and have great influence on the appearance, psychological health and quality of life of patients. In this regard, the treatment means based on biomechanical principles such as mechanical stimulation and stress stretching are relatively safe, reliable, efficient and economical countermeasures at present. Therefore, the biological molecular mechanism of bone reconstruction under the definite mechanical stimulation is a primary premise for further developing accurate treatment and high-efficiency treatment. At present, the research field of biomechanical regulation and control bone reconstruction has massive research data, but information is dispersed and difficult to integrate, so that an important means is provided for rapidly promoting research and development in the field by constructing a knowledge network technology platform for efficiently acquiring important information.
Elucidating the response process of bone-related cells to biomechanical stimuli is a fundamental precondition for bone physiology and pathology studies. The open shared knowledge platform greatly promotes the development of modern science, but the ever-increasing number of publications and massive information make document combing and mining by researchers more difficult through manual document arrangement. In the big data age, a machine language processing mode is adopted, and a natural language processing tool (NLP) is called to integrate and comb related biological medicine documents, so that the method is an efficient, reliable and highly potential application mode.
Currently, computer language tools such as Tagger, iTextMine, geneshot can be used to distinguish terms and expressions in biomedical texts, providing the possibility for computer language processing strategies for biomedical texts. In recent years, LION LBD, GLAD4U, and the like, utilize NLP tools to perform bio-text mining, integrate and comb data, and provide research-related information.
However, in the field of bone-related biomechanics research, the above-mentioned text research tools have difficulty in playing an effective role, and are mainly embodied in the following aspects:
1. programming capability limitations: most existing text processing tools are directed to users with certain programming capabilities, such as Tagger, iTextMine, geneshot, which require the user to have certain natural language processing knowledge, and are difficult for most biomedical researchers to operate.
2. Background database redundancy: biological processes are accurate and conditionally limited, and while existing NLP tools are able to extract and structure large amounts of stored data information, most use of unfiltered background databases can result in the inclusion of irrelevant information, causing false positives in the results. For specific biological fields, especially for relatively small research fields such as biomechanics, it is difficult to obtain better search results in a general medical research background library. Thus, researchers need a targeted NLP tool that is more suitable for bone-related biomechanical studies.
3. Lack of visual display: for complex interactive network structures, plain text information is difficult to provide a clear and logical framework structure compared with a graphical display mode, so a visualization mode is needed in this embodiment to comb connection and interaction relations between molecules, so that researchers can quickly understand access information and locate a required target.
Disclosure of Invention
The application provides a method for constructing a biomechanical regulation and control bone reconstruction text mining interactive website for solving the technical problems.
The application is realized by the following technical scheme:
a biomechanically regulated bone reconstruction text-mining interactive website construction method, the method comprising:
s1, screening text words of gene information in a document according to related terms, acquiring a pair of gene-molecule interaction relations, and constructing a document database;
s2, calculating the correlation between the target retrieval factor and the classical mechanical sensitive path by adopting a weight algorithm based on the gene-molecule interaction relation pair in the literature database;
and S3, visually displaying the correlation between the target retrieval factors and the classical mechanical sensitive channels, displaying the gene molecules in the classical mechanical sensitive channels as interconnected nodes, and linking to corresponding documents in a document database by clicking on a connecting line between the nodes.
Further, between the step S1 and the step S2, deep neural network training is further performed on the PMC database, text keywords with biological information are screened, and a corpus is constructed.
Further, the biological information includes mechanical type, research species, cell type.
Preferably, the related term in step S1 includes biomechanical, bone related terms.
Further, the step S1 includes performing computer language normalization and preprocessing on the text words of the genetic information.
Further, the step S1 further includes identifying the text word of the genetic information by using PubTator, and converting the text word of the genetic information into a formal name by calling API of NCBI genetic database.
Further, the formula of the weight algorithm in the step S2 is as follows:
wherein r (g, p) is the correlation coefficient between the gene g and the classical mechanical sensitive pathway p, N i Representing the total number of related entities of the ith gene in the classical mechanical sensitive pathway p in a literature database, N p For the total number of related entities of all genes of classical mechanical sensitive pathway p in a literature database, Ω g 、Ω p The set of gene g and classical mechanical sensitive pathway p are shown, respectively.
Preferably, the classical mechanical sensitive pathway comprises at least one of Hippo, BMP, TGF beta, wnt, notch, PI3K/Akt, MAPK, ras.
Further, in the step S3, the method further includes visually displaying the path information of the target retrieval factor in the KEGG database.
Further, in the step S3, a pair of gene-molecule interaction relations of the target search factor in the String data is visually displayed.
Compared with the prior art, the application has the following beneficial effects:
1. the web tool is used for providing an open search port, so that a user can conveniently define a search range, and the user does not need to master complex computer programming capability.
2. By setting strict literature database inclusion criteria, bone-related biomechanical information is clarified. For a complex bone-related biomechanical regulation network, false positive information can be filtered to a great extent, so that the result is more reliable and effective.
3. The mode of the visual network diagram is adopted, so that the interactive operation of the user is ensured, the computer literature is mined for researchers, and the information transmission and understanding are promoted in a more user-friendly mode.
4. Combining classical mechanical sensitive pathways with text mining results allows users to locate a target gene or set of genes through biological pathways. Based on a literature database and a weight algorithm, the association degree between the target retrieval factor and each classical mechanical sensitive channel is calculated, and meanwhile, the channel and gene interaction search are provided, so that gene navigation is more convincing and meaningful.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application.
FIG. 1 is a flow chart of the present application;
FIG. 2 is a schematic diagram of deep neural network training of a corpus;
FIG. 3 is a search window interface diagram of the present application;
FIG. 4 is a diagram of a search result interface of the present application;
FIG. 5 is a schematic view of panels 1-3 of FIG. 4;
FIG. 6 is a schematic view of panels 4-6 of FIG. 4;
figure 7 is a schematic view of the panel 7 of figure 4.
Detailed Description
In order to make the objects, technical solutions and advantageous effects of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments. It will be apparent that the described embodiments are some, but not all, of the embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
1. Database construction and text tagging
The website of the application centers on the biomechanics and bone related research, and takes the keyword set as the recording standard of the literature resource. Screening 34937 articles published between 1 in 2010 and 31 in 2020, and after normalization and preprocessing of text words in a computer language, gene information in each article is first identified by pubTator and then converted into a formal name by calling an API of NCBI gene database.
Ncbi provides the Ncbi enterrez system with e-units api and allows access to all the enterrez databases, including pubmed, pmc, gene and protein, which facilitates batch processing and large text word retrieval (https:// www.ncbi.nlm.nih.gov/home/devilop/api /). The text data is composed of the headlines and summaries of each article, first marked, parsed and normalized by the text processing library natural language toolkit (nltk, http:// www.nltk.org /), thereby avoiding ambiguous descriptions and ensuring the recognizability of subsequent processing. A name entity identification (ner) is then performed to extract the required details of each paper. On the one hand, pubtator (https:// www.ncbi.nlm.nih.gov/research/pubtator /) is used as a mature biomedical term recognition tool, achieves good effect in recognizing ambiguous and complex biomedical term names, and is used for labeling genes and proteins appearing in a text database. The gene id is then converted by biopython (https:// biopython. Org /) into a standard name based on access to the ncbi gene database.
On the other hand, for other special terms such as force type, cell type and category, a self-organized corpus is established, information about mechanical type, research species, cell type and the like is extracted, and then normalized text content is compared with the corpus to identify name entities. By self-building a corpus, based on categorization of the self-building repository and data retrieval, a user can alter the search range within web page options to specify a given force condition or set of cell lines, helping to achieve more specific results.
The self-built corpus is realized by the following modes: as shown in fig. 2, a deep neural network based on a pre-trained language model BERT is designed, and network parameters are optimized and improved, and main parameters are as follows: batch size 32; epochs 4; a learning rate of 5e-5; hidden_size 128. Training 13.5 million words in the whole english corpus PMC and 4.5 million words in the biological literature corpus PubMed to obtain a text keyword extraction model with biological information.
2. Mechanical biological pathway interactions
The manner in which cells and tissues sense and transmit mechanical information depends on the interactions between genes, and the cascade of interactions constitutes a biological signal pathway. As shown in fig. 2, in the path navigation section, the present embodiment first demonstrates the typical path of this process and the interactions between them.
As shown in fig. 3, in the path navigation part, the website in the embodiment shows classical mechanical sensitive paths, such as Hippo, BMP, TGF beta, wnt, notch, PI K/Akt, MAPK, ras signal paths and the like, and explores the interaction relationship in mechanical transduction, so as to provide background information in the field of mechanical biology for users.
In this mode, the present embodiment provides the user with general background information by combing out trusted paths and their interactions. By combining hippo, bmp, wnt, gpcr, tgf-beta, igf, integrins and cell connection related trusted pathways, the navigation of genes in mechanoreception and mechanical transduction is more convincing and meaningful.
Second, understanding a single molecule is generally more monolithic and limited, and in contrast, linking molecules to pathways is more beneficial to researchers' understanding and further exploration of their mechanism of action. Thus, in one possible design, the normal path is combined with text mining results to enable a user to locate his target gene or set of genes through biological processes. By matching the submitted genes to annotated gene sets for each pathway, correlations between genes and mechanically related pathways are scored and possible links are provided based on text mining techniques.
In order to obtain a reasonable scoring system, the application calculates the relevance between the target retrieval factors and each classical mechanical sensitive channel based on the molecular interaction relation pairs in the literature database, and can help researchers to quickly locate relevant biological signal transduction modes. The scoring calculation is as follows:
in the above formula, r (g, p) is the correlation coefficient between the gene g and the classical mechanical sensitive pathway p, N i Representing the total number of related entities of the ith gene in the classical mechanical sensitive pathway p in a literature database, N p For the total number of related entities of all genes of classical mechanical sensitive pathway p in a literature database, Ω g 、Ω p The set of gene g and classical mechanical sensitive pathway p are shown, respectively.
The importance of the pathway star molecule can be highlighted by using a weight algorithm, the text data mining logic is met, and when the target search molecule and the pathway star molecule are shared, the target search molecule is considered to be more likely to be associated with the pathway.
3. Visual website framework
In order to support cross-platform visualization, the Web architecture of the website is based on the Django framework, the back-end database is implemented by using MySQL, and the semantic UI is used for the front-end architecture.
As an NLP Web tool, the Web site of the present application combines demonstration and predictive strategies, and proposes an effective and reliable method to comb the connections and crosstalk between molecules that are mechanically felt and mechanically conductive in the bone.
The website of the application uses a graph network to display molecules in all mechanical paths as nodes connected with each other, and can be linked to corresponding original documents through a connecting line between clicking nodes, and the function is realized through the interaction technology of the front end of a webpage and a server database, which is the prior art and is not repeated here.
Entities retrieved from the document database can be sub-categorized by the self-built corpus described above, allowing the user to select a type or a particular cell line that is focused on a particular force, thereby facilitating more accurate and targeted document-based discovery. Meanwhile, the website creatively adopts a path fitting method, and based on a weight algorithm, the system can display the correlation score of the target retrieval molecule and the classical mechanical path according to the NLP result, and the targeting molecule of the user is connected with the components of the classical path, so that the system is more suitable for biomedical research.
4. Correlation identification and visualization
According to the scope defined by the user, the website can automatically search the entities related to the target search molecule and the relevance between the entities and the path, and visualize the entities and the path. The interactive operation is suitable for graphic illustration, and user-defined desirable layout and detailed information of each entity can be realized. After clicking on the edge between the entities, the pop-up window may display the confirmation information and the resource article with the corresponding sentence highlighted in red. The collection of raw text enables a user to decide on the importance and reliability of the connection found by artificial intelligence, which may be efficient and accurate. The hierarchical search enables the second and third layer relation extraction to amplify the network, facilitating the development of new molecules.
The management pathway map and bone localization mechanism biology are largely dependent on several pathways of sequential reactions and interactions, as described above. In view of this, the present example determines evidence of classical pathways and trust that involve mechanical sensitivity and mechanical transduction. Summary path and its interaction with standing proof is visualized through charts and svg.js, a lightweight library for manipulating and animating svg files. The elements of each path are searched on KEGG (Kyoto Encyclopedia of Genes and Genomes) and compared to the data sets of this embodiment, which contribute to the list of items each path contains, and the relevance between the target gene and the pathway can be ranked and visually displayed by a correlation coefficient.
In addition to scoring, biomechanically-regulated bone reconstruction text-mining interactive websites also provide an interactive option to add all/selective components of the target pathway to the nlp network, forming a molecular-to-pathway network, thereby discovering more indirect connections.
The resulting interface content will be described in detail below:
referring to fig. 4-7, the left side of the result interface collectively shows the mechanical path association search results, which are specifically as follows: as shown in fig. 5, panel 1 shows the association of the target retrieval molecule with each classical mechanical pathway; at panel 2, the user can select the interested passage, add the passage molecule to combine and search in the network; at panel 3, the user can quickly understand the access information of the target retrieval molecule in the KEGG database in order to more fully understand the route of action of the molecule. As shown in fig. 6, the middle of the results page primarily visualizes molecular interaction information while providing a variety of "String" button options. String is a database containing protein interaction information predicted based on research evidence and algorithms, and the results of String are integrated with the original NLP network, so that ideas can be provided for the research of emerging molecules. By clicking the hierarchical search function provided by the website, a user can enlarge the relation network to the 2 nd layer and the 3 rd layer, so that the network search range is enlarged, and the discovery of new molecules in a channel is facilitated. Corresponding icons in the website provide functions of changing the presentation mode, downloading the picture, and resetting the picture. The display mode of the molecular association diagram can be changed, and the current association diagram is downloaded and saved, and the previous association diagram display is restored. For retrieval of the corresponding source document, the user can click on the connection between the nodes by means of a mouse, as shown in fig. 7, the popup window will show its relevance and the corresponding document, the corresponding sentence also being highlighted in red.
Instead of identifying grammar data by a machine learning method to perform relation extraction, the present embodiment selects to let the user tell the relation embedded in the corpus, instead of the machine, in order to guarantee the credibility of the prediction result. Based on the relation retrieval and visualization of each normalized sentence, entities are associated through co-occurrence and then the corresponding sentence is marked. The co-occurrence score records the number of items that correspond to the co-occurrence tag. These sentences and corresponding entities are stored in a relational database and are implemented by sqlite (https:// www.sqlite.org/index. Html). The string (https:// string-db. Org /) is used for a comprehensive search for targets, and the present embodiment provides a secondary and/or string search option that can facilitate more results.
The application method of the website comprises the following steps: the user inputs the target retrieval factors in the retrieval window interface of the website, and a part of classical mechanical sensitive paths are displayed in the retrieval window interface, so that background information in the field of mechanical biology is provided for the user, and the user can be helped to quickly determine the classical mechanical sensitive paths. After the input is completed, the interface is converted into a search result interface, and the search result interface is divided into 7 plates for display. The left side of the interface is provided with the plates 1-3, the middle of the interface is provided with the plates 4-6, and the right side of the interface is provided with the plates 1-7.
The plate 1 adopts a histogram to show the correlation between the target retrieval molecules and each classical mechanical sensitive channel; plate 2 shows the interesting path selected by the user, and path molecules can be added into the network for merging and searching; at panel 3, the user can quickly learn the access information of the target retrieval molecule in the KEGG database to more fully understand the route of action of the target retrieval molecule.
By clicking the hierarchical search function provided by the tile 5, the user can zoom in on the relational network to layers 2 and 3, expanding the network search range. The plate 6 provides functional icons for changing the display mode, downloading pictures and resetting pictures, and clicking the icons can change the display mode of the molecular association diagram, download and save the current association diagram and restore the display of the previous association diagram.
For retrieval of the corresponding source document, the user can click on the connections between nodes in the tile 4 by means of a mouse, and the popup window in the tile 7 will show its relevance and the corresponding document, the corresponding sentence also being highlighted in red.
In summary, the website of the application utilizes Natural Language Processing (NLP) strategy to mine text database based on open literature resources and constructs a web page interaction tool. The first bone related biomechanical text database is created, a visual mode is innovatively introduced, and complex and obscure text information is imaged; calculating the correlation between the target retrieval factor and the classical mechanical sensitive path by adopting a self-created weight algorithm, and establishing a brand new analysis strategy based on the biological path; meanwhile, a webpage interaction tool is introduced, the biological literature exploration process is visualized and simplified, the molecular mechanism research of the bone related mechanics biology can be greatly promoted, and more effective data processing and knowledge sharing modes are promoted.
The application no longer relies on unfiltered resources, but rather targets in the skeletal mechanical biology process and retrieves classification information from self-organizing libraries. In this way, users can choose to explore in all mechanically related articles and even specify their goals to force types, cell lines or species, a discussion-oriented biomedical platform that facilitates knowledge sharing provides researchers with unprecedented scope and awkward amounts of information. Meanwhile, the application adopts a strategy taking a path as a center, and adopts a weighted scoring and combination algorithm to enable navigation and exploration of a single gene or a set in the mechanical biological process to be possible.
The foregoing detailed description of the application has been presented for purposes of illustration and description, and it should be understood that the application is not limited to the particular embodiments disclosed, but is intended to cover all modifications, equivalents, alternatives, and improvements within the spirit and principles of the application.

Claims (9)

1. A biomechanical regulation and control bone reconstruction text mining interaction website construction method is characterized by comprising the following steps of: the method comprises the following steps:
s1, screening text words of gene information in a document according to related terms, acquiring a pair of gene-molecule interaction relations, and constructing a document database;
s2, calculating the correlation between the target retrieval factor and the classical mechanical sensitive path by adopting a weight algorithm based on the gene-molecule interaction relation pair in the literature database;
s3, visually displaying the correlation between the target retrieval factors and the classical mechanical sensitive channels, displaying the gene molecules in the classical mechanical sensitive channels as interconnected nodes, and linking to corresponding documents in a document database through connecting lines between clicking nodes;
the formula of the weight algorithm in the step S2 is as follows:
in the method, in the process of the application,is the correlation coefficient of gene g and classical mechanical sensitive pathway p->Representing the total number of related entities of the ith gene in the classical mechanical sensitive pathway p in the literature database,/->For the total number of related entities of all genes of classical mechanical sensitive pathway p in the literature database,/for the total number of related entities in the literature database>、/>The set of gene g and classical mechanical sensitive pathway p are shown, respectively.
2. The biomechanical-regulated bone reconstruction text-mining interactive website construction method according to claim 1, wherein the method comprises the following steps: and between the step S1 and the step S2, further comprising the step of training a PMC database by a deep neural network, screening text keywords with biological information, and constructing a corpus.
3. The biomechanical-regulated bone reconstruction text-mining interactive website construction method according to claim 2, wherein the method comprises the following steps: the biological information includes mechanical type, research species, cell type.
4. The biomechanical-regulated bone reconstruction text-mining interactive website construction method according to claim 1 or 2, characterized in that: the related term in step S1 includes biomechanical terms and bone related terms.
5. The biomechanical-regulated bone reconstruction text-mining interactive website construction method according to claim 4, wherein the method comprises the following steps: the step S1 comprises the steps of carrying out computer language normalization and preprocessing on the text words of the genetic information.
6. The biomechanical-regulated bone reconstruction text-mining interactive website construction method according to claim 5, wherein the method comprises the following steps: the step S1 further comprises the steps of identifying the text words of the gene information by using the PubTator, and converting the text words of the gene information into formal names by calling the API of the NCBI gene database.
7. The biomechanical-regulated bone reconstruction text-mining interactive website construction method according to claim 1, wherein the method comprises the following steps: the classical mechanical sensitive pathway comprises at least one of Hippo, BMP, TGF beta, wnt, notch, PI3K/Akt, MAPK, ras.
8. The biomechanical-regulated bone reconstruction text-mining interactive website construction method according to claim 1, wherein the method comprises the following steps: in step S3, the method further includes visually displaying the path information of the target retrieval factor in the KEGG database.
9. The biomechanical-regulated bone reconstruction text-mining interactive website construction method according to claim 1 or 8, wherein: in the step S3, the method further comprises the step of visually displaying the gene molecule interaction relation pair of the target retrieval factors in the String data.
CN202210606098.XA 2022-05-31 2022-05-31 Construction method of biomechanical regulation and control bone reconstruction text mining interaction website Active CN114927168B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210606098.XA CN114927168B (en) 2022-05-31 2022-05-31 Construction method of biomechanical regulation and control bone reconstruction text mining interaction website

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210606098.XA CN114927168B (en) 2022-05-31 2022-05-31 Construction method of biomechanical regulation and control bone reconstruction text mining interaction website

Publications (2)

Publication Number Publication Date
CN114927168A CN114927168A (en) 2022-08-19
CN114927168B true CN114927168B (en) 2023-08-29

Family

ID=82813152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210606098.XA Active CN114927168B (en) 2022-05-31 2022-05-31 Construction method of biomechanical regulation and control bone reconstruction text mining interaction website

Country Status (1)

Country Link
CN (1) CN114927168B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7428554B1 (en) * 2000-05-23 2008-09-23 Ocimum Biosolutions, Inc. System and method for determining matching patterns within gene expression data
CN104978347A (en) * 2014-04-11 2015-10-14 中国中医科学院中医临床基础医学研究所 Data mining method and data mining system for sensitive keywords in Chinese biomedical literature database
CN107346372A (en) * 2017-06-19 2017-11-14 苏州班凯基因科技有限公司 A kind of database and its construction method understood applied to gene mutation
CN109545284A (en) * 2018-10-16 2019-03-29 中国人民解放军军事科学院军事医学研究院 Drug integrated information database building method and system based on drug and target information
CN110286233A (en) * 2019-06-27 2019-09-27 山西大学 A kind of biomarker metabolic pathway and analysis method and application
CN111797296A (en) * 2020-07-08 2020-10-20 中国人民解放军军事科学院军事医学研究院 Method and system for mining poison-target literature knowledge based on network crawling
CN112029710A (en) * 2020-08-31 2020-12-04 上海交通大学医学院附属第九人民医院 Screening method of direct mechanical response cell subset and application thereof
CN112289372A (en) * 2020-12-15 2021-01-29 武汉华美生物工程有限公司 Protein structure design method and device based on deep learning
CN114168708A (en) * 2021-11-15 2022-03-11 哈尔滨工业大学 Personalized biological channel retrieval method based on multi-domain characteristics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11948662B2 (en) * 2017-02-17 2024-04-02 The Regents Of The University Of California Metabolite, annotation, and gene integration system and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7428554B1 (en) * 2000-05-23 2008-09-23 Ocimum Biosolutions, Inc. System and method for determining matching patterns within gene expression data
CN104978347A (en) * 2014-04-11 2015-10-14 中国中医科学院中医临床基础医学研究所 Data mining method and data mining system for sensitive keywords in Chinese biomedical literature database
CN107346372A (en) * 2017-06-19 2017-11-14 苏州班凯基因科技有限公司 A kind of database and its construction method understood applied to gene mutation
CN109545284A (en) * 2018-10-16 2019-03-29 中国人民解放军军事科学院军事医学研究院 Drug integrated information database building method and system based on drug and target information
CN110286233A (en) * 2019-06-27 2019-09-27 山西大学 A kind of biomarker metabolic pathway and analysis method and application
CN111797296A (en) * 2020-07-08 2020-10-20 中国人民解放军军事科学院军事医学研究院 Method and system for mining poison-target literature knowledge based on network crawling
CN112029710A (en) * 2020-08-31 2020-12-04 上海交通大学医学院附属第九人民医院 Screening method of direct mechanical response cell subset and application thereof
CN112289372A (en) * 2020-12-15 2021-01-29 武汉华美生物工程有限公司 Protein structure design method and device based on deep learning
CN114168708A (en) * 2021-11-15 2022-03-11 哈尔滨工业大学 Personalized biological channel retrieval method based on multi-domain characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
鲍振申.疾病相关信号通路富集分析方法研究及其应用.《中国博士学位论文全文数据库 基础科学辑》.2022,(第2期),A006-136. *

Also Published As

Publication number Publication date
CN114927168A (en) 2022-08-19

Similar Documents

Publication Publication Date Title
Rebholz-Schuhmann et al. Text-mining solutions for biomedical research: enabling integrative biology
Fiorini et al. How user intelligence is improving PubMed
Fleuren et al. Application of text mining in the biomedical domain
Song et al. Identifying the landscape of Alzheimer’s disease research with network and content analysis
Ramasamy et al. Disease prediction in data mining using association rule mining and keyword based clustering algorithms
Clark et al. Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
Qassimi et al. The role of collaborative tagging and ontologies in emerging semantic of web resources
Gürcan Major research topics in big data: A literature analysis from 2013 to 2017 using probabilistic topic models
CN115114445A (en) Cell knowledge graph construction method and device, computing equipment and storage medium
Zhang et al. A comparative evaluation of biomedical similar article recommendation
Oliva et al. A computational system based on ontologies to automate the mapping process of medical reports into structured databases
Wildgaard et al. Advancing PubMed? A comparison of third-party PubMed/Medline tools
Kim et al. PubChem: A Large‐Scale Public Chemical Database for Drug Discovery
Ranjan et al. Profile generation from web sources: an information extraction system
Hong et al. BioPREP: deep learning-based predicate classification with SemMedDB
Feng et al. E-TSN: an interactive visual exploration platform for target–disease knowledge mapping from literature
CN114927168B (en) Construction method of biomechanical regulation and control bone reconstruction text mining interaction website
CN113946647A (en) DDIs (distributed denial of service) search engine based on medical entity vector and construction method thereof
Asaad et al. AsthmaKGxE: An asthma–environment interaction knowledge graph leveraging public databases and scientific literature
Johnsi et al. A concise survey on datasets, tools and methods for biomedical text mining
Zhang et al. Construction of MeSH-like obstetric knowledge graph
Morine et al. A Comprehensive and Holistic Health Database
Alachram Knowledge Integration and Representation for Biomedical Analysis
Ernst Biomedical knowledge base construction from text and its applications in knowledge-based systems
Krishnappa et al. A Bibliometric Study on Bioinformatics: An Analytical Study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant