CN117494811A - Knowledge graph construction method and system for Chinese medicine books - Google Patents

Knowledge graph construction method and system for Chinese medicine books Download PDF

Info

Publication number
CN117494811A
CN117494811A CN202311549672.3A CN202311549672A CN117494811A CN 117494811 A CN117494811 A CN 117494811A CN 202311549672 A CN202311549672 A CN 202311549672A CN 117494811 A CN117494811 A CN 117494811A
Authority
CN
China
Prior art keywords
knowledge
medicinal material
keyword
classical
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311549672.3A
Other languages
Chinese (zh)
Other versions
CN117494811B (en
Inventor
赵静
赵亚茹
吴冰
樊静
刘松
刘冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Dajing Tcm Information Technology Co ltd
Original Assignee
Nanjing Dajing Tcm Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Dajing Tcm Information Technology Co ltd filed Critical Nanjing Dajing Tcm Information Technology Co ltd
Priority to CN202311549672.3A priority Critical patent/CN117494811B/en
Publication of CN117494811A publication Critical patent/CN117494811A/en
Application granted granted Critical
Publication of CN117494811B publication Critical patent/CN117494811B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a knowledge graph construction method and a system for Chinese traditional medicine books, which relate to the technical field of knowledge graph construction and comprise the following steps: acquiring a first classmark knowledge category from a classmark knowledge database; a first classical catalogue of a first classical in a preset classical set is called, and a first keyword set is crawled and built; acquiring a first page code range corresponding to a first keyword, wherein the first keyword is any keyword in a first keyword set; when the first keyword exists in the first professional word set, a first page code range in the first classbook is intercepted, and the first page code range is stored in the first classbook knowledge category to form a first map relation; and constructing a first spectrum branch based on the first spectrum relation, and constructing a target knowledge spectrum according to the first spectrum branch. The invention solves the technical problems that the traditional method is difficult to arrange and associate a great amount of traditional Chinese medicine knowledge contained in traditional Chinese medicine books, and lacks structural information, and has low searching efficiency for specific information.

Description

Knowledge graph construction method and system for Chinese medicine books
Technical Field
The invention relates to the technical field of knowledge graph construction, in particular to a method and a system for constructing a knowledge graph of Chinese medical classics.
Background
The field of traditional Chinese medicine covers a plurality of classics, contains a large amount of traditional medical knowledge, but the knowledge is usually scattered in different classics and exists in an unstructured or semi-structured form, so that various problems exist, on one hand, the large amount of traditional Chinese medical knowledge contained in the traditional Chinese medicine classics is scattered in different classics, so that the integration and searching of specific information become difficult, and the content of the traditional classics usually exists in a text form and lacks structured information, so that the knowledge is difficult to quickly access and utilize; on the other hand, the information in Chinese medicine books is huge, manual arrangement and induction are time-consuming and error-prone, and the books lack correlation, so that Chinese medicine knowledge often needs to be fully understood across a plurality of books.
Therefore, a new method is needed, and knowledge in a plurality of books can be integrated into a structured knowledge graph, so that the traditional Chinese medicine knowledge is easier to search, understand and apply, and further association and communication of the cross-book knowledge are facilitated.
Disclosure of Invention
The method and the system for constructing the knowledge graph of the Chinese traditional medicine books are provided, and the technical problems that the traditional method is difficult to effectively arrange and associate a great amount of Chinese traditional medicine knowledge contained in the Chinese traditional medicine books, and is lack of structural information, so that the searching efficiency of specific information is low are solved.
In view of the above problems, the present application provides a knowledge graph construction method and system for Chinese medical classics.
In a first aspect of the present disclosure, a method for constructing a knowledge graph of a classical book of traditional Chinese medicine is provided, the method comprising: acquiring a first classmark knowledge category from a classmark knowledge database, wherein the first classmark knowledge category corresponds to a first professional vocabulary set; a first classical catalogue of a first classical in a preset classical set is called, and a first keyword set for constructing the first classical catalogue is crawled; acquiring a first page code range corresponding to a first keyword in the first classbook, wherein the first keyword is any keyword in the first keyword set; when the first keyword exists in the first professional word set, the first page code range in the first classbook is intercepted, and the first page code range is stored in the first classbook knowledge category to form a first map relation; and constructing a first map branch based on the first map relation, and constructing a target knowledge map according to the first map branch.
In another aspect of the disclosure, a knowledge graph construction system for a Chinese medical literature is provided, where the system is used in the above method, and the system includes: the knowledge category acquisition module is used for acquiring a first classmark knowledge category from the classmark knowledge database, and the first classmark knowledge category corresponds to a first professional vocabulary set; the keyword set building module is used for retrieving a first classical book catalog of a first classical book in a preset classical book set and crawling a first keyword set for building the first classical book catalog; the page range acquisition module is used for acquiring a first page range corresponding to a first keyword in the first classbook, wherein the first keyword is any keyword in the first keyword set; the page range storage module is used for intercepting the first page range in the first classbook and storing the first page range to the first classbook knowledge category when the first keyword exists in the first professional word set, so as to form a first map relation; the knowledge graph construction module is used for constructing a first graph branch based on the first graph relation and constructing a target knowledge graph according to the first graph branch.
One or more technical solutions provided in the present application have at least the following technical effects or advantages:
obtaining a first class of classrooms from a class knowledge database, the class corresponding to a first set of specialized vocabulary, which helps organize and categorize knowledge, providing a clear knowledge structure by extracting key information and specialized terms in traditional Chinese medicine classrooms; selecting a first classbook from a preset classbook set, then crawling the catalogue of the classbook, extracting keywords therein for subsequent knowledge extraction, selecting a first keyword in the first keyword set for accurately extracting knowledge, and acquiring a corresponding first page code range thereof, which is helpful for limiting the extraction range; when the first keyword exists in the first professional word collection, a corresponding first page code range in the first classrooms is intercepted, and then the information is stored in the first classrooms knowledge category to form a first map relation, so that the professional terms in the traditional Chinese medicine field can be accurately extracted, and the accuracy of the knowledge map is ensured; based on the first map relation, the method constructs a first map branch, related knowledge is organized in a structural mode, and a target traditional Chinese medicine knowledge map is constructed according to the organization mode and the structure of the first map branch, so that the integration and the structural presentation of classical knowledge are realized, and the knowledge in the traditional Chinese medicine field is more easily searched, understood and applied. In summary, the knowledge graph construction method of the Chinese medicine books automatically extracts, sorts and organizes knowledge from the Chinese medicine books and constructs the knowledge graph, so that the Chinese medicine knowledge can be better organized, searched and applied, and the research and application of the Chinese medicine field are facilitated.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
Fig. 1 is a schematic flow chart of a knowledge graph construction method for Chinese traditional medical classics according to an embodiment of the application;
fig. 2 is a schematic diagram of a knowledge graph construction system for Chinese traditional medical classics according to an embodiment of the present application.
Reference numerals illustrate: the system comprises a knowledge category acquisition module 10, a keyword set construction module 20, a page range acquisition module 30, a page range storage module 40 and a knowledge graph construction module 50.
Detailed Description
By providing the knowledge graph construction method of the Chinese medicine books, the technical problems that the traditional method is difficult to effectively arrange and associate a great amount of Chinese medicine knowledge contained in the Chinese medicine books, and is low in searching efficiency for specific information due to the lack of structural information are solved.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
Example 1
As shown in fig. 1, an embodiment of the present application provides a method for constructing a knowledge graph of a classical book of traditional Chinese medicine, where the method includes:
acquiring a first classmark knowledge category from a classmark knowledge database, wherein the first classmark knowledge category corresponds to a first professional vocabulary set;
accessing a classical book knowledge database, which is an established online classical book document database, wherein the database contains a plurality of classical books, and relates to various aspects of traditional Chinese medicine, and one classical book knowledge category is selected as a first classical book knowledge category according to the requirement of the project. And extracting the professional terms and words related to the class from the classical literature by a natural language processing method aiming at the first classical knowledge class, and acquiring a corresponding first professional word set. And establishing a corresponding relation between the first classrooms knowledge category and the first professional vocabulary set, namely associating the selected classrooms category with the professional vocabulary extracted from the classrooms literature to form a category-vocabulary mapping relation, and laying a foundation for subsequent knowledge graph construction.
Further, the classrooms knowledge database is a database constructed by knowledge data including preset classrooms knowledge categories including basic theory categories, medicinal material knowledge categories, acupuncture and moxibustion categories, clinical knowledge categories and health care categories.
The books knowledge database comprises preset books knowledge categories, wherein the preset books knowledge categories are used for organizing and classifying the knowledge in the Chinese medicine books and are divided according to the knowledge structure and the theme of the Chinese medicine field, and can comprise but not limited to the following books knowledge categories: basic theory categories, including basic theory of traditional Chinese medicine, such as yin and yang, five elements, meridians, etc.; the medicinal material knowledge category, including knowledge about herbs, prescriptions, and pharmaceuticals; the category of acupuncture and moxibustion, including knowledge of acupuncture and moxibustion and massage therapy; clinical knowledge categories, including diagnostic and therapeutic methods for various diseases; the health care category comprises knowledge of traditional Chinese medicine health care and health care.
And importing data of the corresponding books, including text data, pictures, reference documents and the like, aiming at each preset book knowledge category, matching the imported data with related categories of the books, classifying and marking the imported data, and distributing the imported data into the corresponding book knowledge category to establish and acquire the book knowledge database. By constructing such a classbook knowledge database, knowledge in Chinese medicine classbooks can be better organized, managed and retrieved, and research and practice in the field of Chinese medicine can be facilitated.
A first classical catalogue of a first classical in a preset classical set is called, and a first keyword set for constructing the first classical catalogue is crawled;
accessing a preset classical book set, wherein the set comprises but is not limited to "Huangdi's internal channel" Shennong's herbal channel "Lei Pacific Royal, shang Han hybrid theory" and the like, selecting a classical book from the preset classical books as a first classical book, and obtaining a catalog of the first classical book by reading a text file, wherein the catalog usually comprises classical book chapters, titles or other dividing structures and hierarchical relations of the classical books. And extracting and crawling keywords of each item in the directory by using natural language processing and text processing technology aiming at the first classbook directory, wherein the keywords can be titles of chapters or other appointed identification words, and sorting and combining the crawled keywords of the first classbook directory to form a unified keyword set, namely a first keyword set.
Acquiring a first page code range corresponding to a first keyword in the first classbook, wherein the first keyword is any keyword in the first keyword set;
further, the obtaining the first page code range corresponding to the first keyword in the first book includes:
locating a first position of the first keyword in the first classbook directory;
matching the first position with corresponding first page code information;
randomly extracting a first classmark page of the first classmark;
counting the first occurrence frequency of the first keyword in the first classbook page;
when the first occurrence frequency reaches a preset frequency value, the first classical book page is recorded as second page number information;
the first page information and the second page information together form the first page range.
And randomly selecting one keyword from the first keyword set as a first keyword, searching the first keyword in a first classbook catalog by adopting a text search technology, determining the specific position of the keyword in the catalog, and recording the specific position as a first position, wherein the first position comprises a chapter name or other identification information for subsequent use.
And matching the first position of the first keyword in the catalog with corresponding page information to determine first page information, wherein the first page information comprises a page range related to the first position or a page number of a specific page, so that matching accuracy is ensured, contents containing the keyword in the classbook are accurately extracted, and the matched first page information is recorded for later use.
And randomly selecting a certain page in the books as a first book page, and extracting text contents of the first book page by adopting a text extraction technology.
And identifying a first keyword in the content of the page by using the text content of the extracted first classical book page by adopting a text matching or searching technology, counting the occurrence times of the first keyword, and obtaining a first occurrence frequency, wherein the larger the first occurrence frequency is, the more the occurrence times of the first keyword in the page are.
Setting a preset frequency value according to actual conditions and specific requirements, wherein the preset frequency value is the frequency value which is required to be reached by the first keyword, comparing the first occurrence frequency of the first keyword with the preset frequency value, if the first occurrence frequency and the preset frequency value are equal to or larger than the preset frequency value, indicating that the preset frequency value is reached, indicating that the first occurrence frequency reaches the preset requirement, marking the first book page as second page number information, and recording the page number for subsequent use.
The first page information and the second page information recorded before are used to integrate them together to form a set of page numbers containing the first keyword as the first page range.
When the first keyword exists in the first professional word set, the first page code range in the first classbook is intercepted, and the first page code range is stored in the first classbook knowledge category to form a first map relation;
by comparing the first keyword with elements in the first set of specialized words, checking whether the first keyword is present in the first set of specialized words, if so, intercepting the first page range information and storing the intercepted first page range and corresponding text content in a first category of classics knowledge to form a first graph relationship, which may be a data structure, such as an associative array, with the keyword as an index and the corresponding page range as a value.
And constructing a first map branch based on the first map relation, and constructing a target knowledge map according to the first map branch.
Using the first map relation of the first classrooms knowledge category as a starting point, creating a new map branch, wherein the branch takes the first classrooms knowledge category as a core, and related information such as a first page range and the like is included. Repeating the steps for each classmark knowledge category to obtain the map branches of all classmark knowledge categories. And integrating all the map branches to form a target knowledge map, wherein in the target knowledge map, different branches represent different classrooms, and the branches can be connected through shared keywords, topics or contents. The obtained target knowledge graph comprises association relations among books and is used for better understanding and browsing knowledge in related fields.
Further, the constructing a first graph branch based on the first graph relationship includes:
calling a second classical catalogue of a second classical in the preset classical set;
when second keywords in a second keyword set corresponding to the second classbook catalog exist in the first professional word set, a second page number range of the second classbook is intercepted and stored in the first classbook knowledge category to form a second map relation;
and constructing the first map branch according to the first map relation and the second map relation.
The method completely same as the first atlas relation is adopted, and the second atlas relation is constructed for the second classrooms in the preset classrooms set, so that the description is omitted for brevity.
For the established first pattern relation, first basic classical style information of a first classical style is acquired, wherein the first basic classical style information comprises author information, writing time, classical style influence and the like, the first classical style is comprehensively evaluated according to the information, a first classical style comprehensive index is acquired, a second classical style comprehensive index of a second classical style is also acquired, and the first pattern relation and the second pattern relation are sequenced and integrated according to the first classical style comprehensive index and the second classical style comprehensive index, so that the first pattern branch is obtained.
Further, the constructing the first map branch according to the first map relation and the second map relation includes:
respectively acquiring first basic classmark information of the first classmark and second basic classmark information of the second classmark;
the first basic classics information comprises a first copyrighted time, a first copyrighted person influence factor and a first classics influence factor, and the second basic classics information comprises a second copyrighted time, a second copyrighted person influence factor and a second classics influence factor;
weighting the first copyrighted time, the first copyrighted person influence factor and the first classics influence factor to obtain a first classics comprehensive index of the first classics;
weighting the second copyrighted time, the second copyrighted person influence factor and the second classics influence factor to obtain a second classics comprehensive index of the second classics;
sequencing and adjusting the first map relation and the second map relation according to the first classical style comprehensive index and the second classical style comprehensive index to obtain a first branch sequence;
and combining the first branch sequences to obtain the first map branch.
Accessing a first classbook and acquiring basic information thereof, wherein the basic information comprises a first copyrighted time, a first copyrighted person influence factor and a first classbook influence factor; likewise, a second classbook is accessed and basic information thereof is acquired, including a second authoring time, a second author influencing factor, and a second classbook influencing factor.
The writing time is the time of the completion of the creation of the books and represents the time of the appearance of the books; the author influence factor is an index for measuring influence of a classificator in the related field, and can be evaluated according to academic achievements of the author, the number of references, the frequency of references by other scholars and the like; the classrooms influence factors are indexes for measuring influence of classrooms in the related fields, and can be evaluated according to the quotation quantity of the classrooms, the quotation frequency of other scholars, the wide application degree and the like.
Weights are set for the first authoring time, the first author influencing factor and the first classics influencing factor, respectively, according to specific requirements, to reflect their importance in the classics comprehensive index, and by way of example, the first authoring time is set to 0.2, the first author influencing factor is set to 0.3, and the first classics influencing factor is set to 0.5, which is just one possible example, and the specific weight allocation method is set according to the actual requirements. And respectively weighting the first copyrighted time, the first copyrighted person influence factor and the first classics influence factor by using the set weights, namely multiplying the first copyrighted time, the first copyrighted person influence factor and the first classics influence factor by the corresponding weights, and then adding the products to obtain a first classics comprehensive index of the first classics so as to comprehensively evaluate the first classics according to the importance degrees of different parts.
The second classics comprehensive index of the second classics is obtained by adopting the same method as the first classics comprehensive index, and is not described in detail herein for brevity of description.
And sorting the map relations corresponding to the indexes from high to low according to the magnitude of the comprehensive indexes by using the first classbook comprehensive indexes and the second classbook comprehensive indexes, for example, if the first classbook comprehensive indexes are larger, placing the corresponding first map relations in front, and thus, preferentially displaying the highly specialized and authoritative classbook knowledge in the knowledge map, and constructing a first branch sequence according to the sorted map relations.
And obtaining the first map branches according to the map relation in the first branch sequence so as to display high-quality and high-expertise classical book knowledge.
Further, the method further comprises the following steps:
reading a preset medicinal material type;
crawling a first medicinal material image set for constructing a first medicinal material in the preset medicinal material types;
a first medicinal material name set of the first medicinal material is established, wherein the first medicinal material name set comprises a first medicinal material name;
and positioning a first position of the first medicinal material name in the target knowledge graph, and linking the first medicinal material image set to the first position.
The preset medicinal material types are a predefined medicinal material type list, and the list contains different medicinal material types, such as heat clearing medicines, blood circulation promoting medicines, digestion promoting medicines and the like according to function division, and each type contains a plurality of medicinal material names, such as centella asiatica, dandelion, purslane and the like.
The method comprises the steps of randomly selecting a first medicinal material in one medicinal material from preset medicinal material types, searching the Internet by using a web crawler technology as a target to be processed to obtain medicinal material images related to the selected first medicinal material, wherein the images can cover images of different angles, different varieties and different states so as to represent the medicinal material as comprehensively as possible, storing the obtained medicinal material images in a folder, and constructing a first medicinal material image set, wherein each image is related to the first medicinal material.
The names of the first medicinal materials are extracted from the first medicinal materials selected previously, the names comprise common names and professional names of the medicinal materials, for example, honeysuckle is commonly called honeysuckle, all the names of the first medicinal materials are integrated into a first medicinal material name set, and one name in the first medicinal material name set can be the professional name of the first medicinal material.
The constructed target knowledge graph is opened, the name of the first medicinal material is used for searching and positioning, the position related to the name of the medicinal material is found, and the link pointing to the image set is created in the knowledge graph, so that the image set of the first medicinal material constructed before is linked to the position, the image set related to the first medicinal material can be conveniently accessed in the knowledge graph in the future, and the visualization and information presentation of the knowledge graph are enhanced.
Further, the method further comprises the following steps:
sequentially extracting features of images in the first medicinal material image set to obtain a first medicinal material image feature set;
constructing an automatic medicinal material identification module according to the first medicinal material image set and the first medicinal material image feature set;
acquiring an image of any medicinal material;
performing feature extraction on the arbitrary medicinal material image through a medicinal material feature extraction unit in the medicinal material automatic identification module to obtain arbitrary medicinal material features;
the automatic medicinal material identification module is used for determining any medicinal material type based on the characteristics of any medicinal material.
And extracting the characteristics of each image in the first medicinal material image set one by using a computer vision technology and an image processing algorithm, wherein the characteristics comprise a color histogram, texture characteristics, shape characteristics, edge characteristics and the like, and the characteristics of all the images are assembled into a set to form a first medicinal material image characteristic set.
The first medicinal material image set and the corresponding characteristic set are used as construction data sets, and the construction data sets are divided into training sets and verification sets according to a certain proportion, for example, 80% of the training sets and 20% of the verification sets. Selecting a suitable deep learning model, such as a Convolutional Neural Network (CNN), constructing a network structure of an automatic recognition model, training the automatic recognition model with a training set to learn how to recognize the first medicinal material, the model to learn and classify using image features; and (3) verifying the accuracy of the model by using the verification set, and adjusting and optimizing the model parameters according to the verification result so as to ensure that the model has high accuracy in the aspect of automatically identifying the first medicinal material. When model training is completed and the performance meets the requirements, the automatic medicinal material recognition module can be obtained, and the module can convert any input medicinal material image into characteristics and automatically recognize the characteristics.
Any medicinal material image to be subjected to feature extraction is selected, for example, an image uploaded by a user or an image obtained from other sources can be used, the target medicinal material image is input into a constructed medicinal material automatic identification module, and a medicinal material feature extraction unit in the module processes the target medicinal material image to extract features related to the target medicinal material, including different types of features such as color, texture, shape, edge and the like. After feature extraction is completed, the obtained features are combined into a feature set to be used as the feature representation of any medicinal material.
And matching the extracted characteristics of any medicinal material with the known medicinal material characteristics stored in the model, and determining which medicinal material the any medicinal material belongs to by comparing the similarity between the characteristic vectors, wherein the higher the similarity is, the more consistent the two medicinal materials are. Outputting the identification result, and marking the identified arbitrary medicinal material types to represent the specific types of the medicinal materials in the image.
Through accurate identification of the model, accurate medicinal material type information can be provided for a user, and the user is helped to better know and identify various medicinal materials.
Further, the medicinal material feature extraction unit comprises a convolution layer, an expansion convolution layer and a pooling layer, and the method further comprises:
the expansion convolution layers comprise N expansion convolution layers, wherein N is an integer greater than 1;
processing the image features of the arbitrary medicinal materials obtained by the convolution layers through a first expansion convolution layer in the N expansion convolution layers to obtain expansion features of the image of the arbitrary medicinal materials, wherein the first expansion convolution layer corresponds to a first expansion rate, and the image features of the arbitrary medicinal materials are the features of the image of the arbitrary medicinal materials extracted by the convolution layers;
the pooling layer analyzes the image expansion characteristics of the arbitrary medicinal materials and determines the kinds of the arbitrary medicinal materials.
The medicinal material feature extraction unit comprises a convolution layer, an expansion convolution layer and a pooling layer, wherein the convolution layer is one of image processing layers commonly used in deep learning, the convolution layer uses a convolution kernel to slide through an input image, and a group of specific features are calculated at each position, the features can capture local information of the image, such as edges, textures and color features, and abstract features with different levels can be extracted by stacking a plurality of convolution layers, so that the image classification and recognition are facilitated; the expanded convolution layer is an expanded form of the convolution layer that introduces a dilation rate to increase the receptive field without increasing the size of the convolution kernel, which allows the model to capture image information over a wider area, helping to process global and contextual information in the image, the expanded convolution layer being used for image segmentation and feature extraction; the pooling layer is used for reducing the dimension of the feature map, reducing the calculation amount and enhancing the invariance of the model, and is beneficial to reducing the size of the feature map and retaining the most important features.
Wherein, the expansion convolution layer comprises N expansion convolution layers, N is an integer greater than 1, N represents the number of expansion convolution layers used in the system, usually a super parameter, which can be adjusted according to the task requirement and the design of the model, and increasing the number of expansion convolution layers can increase the global information perception capability of the model to the image, thereby facilitating better understanding and identifying the features in the image.
N expansion convolution layers are used, wherein a first expansion convolution layer corresponds to a first expansion rate, the expansion rate defines the distance between pixels in a convolution kernel so as to enlarge a receptive field, and the expansion rate can be any positive integer and is set according to the requirements of a task and a model.
In the previous step, the convolution layer has extracted features from any of the drug images, which are in the form of feature maps, each feature map corresponding to a particular type of feature. The first expansion convolution layer processes the image features extracted by the convolution layer by using the first expansion rate, specifically, the feature map is subjected to convolution operation, the expansion rate is considered to enlarge the receptive field, the image features are helpful to capture wider image information including global features, the expansion features of any medicinal material image are obtained through the processing of the first expansion convolution layer, the features reflect wider information in the image, and the method has important significance for medicinal material automatic identification tasks.
The characteristics after expansion treatment are input into a pooling layer, the pooling layer uses the maximum pooling or average pooling operation to reduce the dimension of the characteristic map, so that the most obvious characteristics are reserved, the computational complexity is reduced, the pooling characteristics are obtained after the pooling operation, which are more abstract and compact representations of input images, the characteristics are used for final medicinal material classification decision, the obtained pooling characteristics are used for medicinal material classification, and a recognition result is output, namely, which type any medicinal material belongs to is determined, and the medicinal material type of any medicinal material is obtained.
In summary, the knowledge graph construction method and system for the Chinese medical classics provided by the embodiment of the application have the following technical effects:
1. obtaining a first class of classrooms from a class knowledge database, the class corresponding to a first set of specialized vocabulary, which helps organize and categorize knowledge, providing a clear knowledge structure by extracting key information and specialized terms in traditional Chinese medicine classrooms;
2. selecting a first classbook from a preset classbook set, then crawling the catalogue of the classbook, extracting keywords therein for subsequent knowledge extraction, selecting a first keyword in the first keyword set for accurately extracting knowledge, and acquiring a corresponding first page code range thereof, which is helpful for limiting the extraction range;
3. when the first keyword exists in the first professional word collection, a corresponding first page code range in the first classrooms is intercepted, and then the information is stored in the first classrooms knowledge category to form a first map relation, so that the professional terms in the traditional Chinese medicine field can be accurately extracted, and the accuracy of the knowledge map is ensured;
4. based on the first map relation, the method constructs a first map branch, related knowledge is organized in a structural mode, and a target traditional Chinese medicine knowledge map is constructed according to the organization mode and the structure of the first map branch, so that the integration and the structural presentation of classical knowledge are realized, and the knowledge in the traditional Chinese medicine field is more easily searched, understood and applied.
In summary, the knowledge graph construction method of the Chinese medicine books automatically extracts, sorts and organizes knowledge from the Chinese medicine books and constructs the knowledge graph, so that the Chinese medicine knowledge can be better organized, searched and applied, and the research and application of the Chinese medicine field are facilitated.
Example two
Based on the same inventive concept as the knowledge graph construction method of the Chinese traditional medicine classics in the foregoing embodiment, as shown in fig. 2, the present application provides a knowledge graph construction system of the Chinese traditional medicine classics, where the system includes:
the knowledge category obtaining module 10 is configured to obtain a first classmark knowledge category from a classmark knowledge database, where the first classmark knowledge category corresponds to a first specialized vocabulary set;
the keyword set building module 20, wherein the keyword set building module 20 is configured to retrieve a first classrooms directory of a first classrooms in a preset classrooms set, and crawl a first keyword set for building the first classrooms directory;
a page range obtaining module 30, where the page range obtaining module 30 is configured to obtain a first page range corresponding to a first keyword in the first classbook, where the first keyword is any keyword in the first keyword set;
the page range storage module 40 is configured to intercept the first page range in the first classbook and store the first page range in the first classbook knowledge category when the first keyword exists in the first keyword set, so as to form a first map relationship;
the knowledge graph construction module 50 is configured to construct a first graph branch based on the first graph relationship, and construct a target knowledge graph according to the first graph branch.
Further, the classrooms knowledge database is a database constructed by knowledge data including preset classrooms knowledge categories including basic theory categories, medicinal material knowledge categories, acupuncture and moxibustion categories, clinical knowledge categories and health care categories.
Further, the system also comprises a first page range acquisition module for executing the following operation steps:
locating a first position of the first keyword in the first classbook directory;
matching the first position with corresponding first page code information;
randomly extracting a first classmark page of the first classmark;
counting the first occurrence frequency of the first keyword in the first classbook page;
when the first occurrence frequency reaches a preset frequency value, the first classical book page is recorded as second page number information;
the first page information and the second page information together form the first page range.
Further, the system also comprises a first map branch building module for executing the following operation steps:
calling a second classical catalogue of a second classical in the preset classical set;
when second keywords in a second keyword set corresponding to the second classbook catalog exist in the first professional word set, a second page number range of the second classbook is intercepted and stored in the first classbook knowledge category to form a second map relation;
and constructing the first map branch according to the first map relation and the second map relation.
Further, the system also comprises a first map branch acquisition module for executing the following operation steps:
respectively acquiring first basic classmark information of the first classmark and second basic classmark information of the second classmark;
the first basic classics information comprises a first copyrighted time, a first copyrighted person influence factor and a first classics influence factor, and the second basic classics information comprises a second copyrighted time, a second copyrighted person influence factor and a second classics influence factor;
weighting the first copyrighted time, the first copyrighted person influence factor and the first classics influence factor to obtain a first classics comprehensive index of the first classics;
weighting the second copyrighted time, the second copyrighted person influence factor and the second classics influence factor to obtain a second classics comprehensive index of the second classics;
sequencing and adjusting the first map relation and the second map relation according to the first classical style comprehensive index and the second classical style comprehensive index to obtain a first branch sequence;
and combining the first branch sequences to obtain the first map branch.
Further, the system also includes a first position location module to perform the following operational steps:
reading a preset medicinal material type;
crawling a first medicinal material image set for constructing a first medicinal material in the preset medicinal material types;
a first medicinal material name set of the first medicinal material is established, wherein the first medicinal material name set comprises a first medicinal material name;
and positioning a first position of the first medicinal material name in the target knowledge graph, and linking the first medicinal material image set to the first position.
Further, the system further comprises a medicinal material type acquisition module for executing the following operation steps:
sequentially extracting features of images in the first medicinal material image set to obtain a first medicinal material image feature set;
constructing an automatic medicinal material identification module according to the first medicinal material image set and the first medicinal material image feature set;
acquiring an image of any medicinal material;
performing feature extraction on the arbitrary medicinal material image through a medicinal material feature extraction unit in the medicinal material automatic identification module to obtain arbitrary medicinal material features;
the automatic medicinal material identification module is used for determining any medicinal material type based on the characteristics of any medicinal material.
Further, the system further comprises a medicinal material type determining module for executing the following operation steps:
the expansion convolution layers comprise N expansion convolution layers, wherein N is an integer greater than 1;
processing the image features of the arbitrary medicinal materials obtained by the convolution layers through a first expansion convolution layer in the N expansion convolution layers to obtain expansion features of the image of the arbitrary medicinal materials, wherein the first expansion convolution layer corresponds to a first expansion rate, and the image features of the arbitrary medicinal materials are the features of the image of the arbitrary medicinal materials extracted by the convolution layers;
the pooling layer analyzes the image expansion characteristics of the arbitrary medicinal materials and determines the kinds of the arbitrary medicinal materials.
The knowledge graph construction system of the Chinese medical classics in the embodiment can be clearly known to those skilled in the art through the foregoing detailed description of the knowledge graph construction method of the Chinese medical classics, and for the device disclosed in the embodiment, the description is relatively simple because it corresponds to the method disclosed in the embodiment, and the relevant places refer to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. The knowledge graph construction method of the Chinese medicine books is characterized by comprising the following steps:
acquiring a first classmark knowledge category from a classmark knowledge database, wherein the first classmark knowledge category corresponds to a first professional vocabulary set;
a first classical catalogue of a first classical in a preset classical set is called, and a first keyword set for constructing the first classical catalogue is crawled;
acquiring a first page code range corresponding to a first keyword in the first classbook, wherein the first keyword is any keyword in the first keyword set;
when the first keyword exists in the first professional word set, the first page code range in the first classbook is intercepted, and the first page code range is stored in the first classbook knowledge category to form a first map relation;
and constructing a first map branch based on the first map relation, and constructing a target knowledge map according to the first map branch.
2. The method of claim 1, wherein the classrooms knowledge database is a database constructed from knowledge data including preset classrooms knowledge categories including basic theory categories, medicinal material knowledge categories, acupuncture and moxibustion categories, clinical knowledge categories, and health care categories.
3. The method of claim 1, wherein the obtaining the first page range corresponding to the first keyword in the first book comprises:
locating a first position of the first keyword in the first classbook directory;
matching the first position with corresponding first page code information;
randomly extracting a first classmark page of the first classmark;
counting the first occurrence frequency of the first keyword in the first classbook page;
when the first occurrence frequency reaches a preset frequency value, the first classical book page is recorded as second page number information;
the first page information and the second page information together form the first page range.
4. The method of claim 1, wherein the constructing a first atlas branch based on the first atlas relation comprises:
calling a second classical catalogue of a second classical in the preset classical set;
when second keywords in a second keyword set corresponding to the second classbook catalog exist in the first professional word set, a second page number range of the second classbook is intercepted and stored in the first classbook knowledge category to form a second map relation;
and constructing the first map branch according to the first map relation and the second map relation.
5. The method of claim 4, wherein constructing the first graph branch from the first graph relationship and the second graph relationship comprises:
respectively acquiring first basic classmark information of the first classmark and second basic classmark information of the second classmark;
the first basic classics information comprises a first copyrighted time, a first copyrighted person influence factor and a first classics influence factor, and the second basic classics information comprises a second copyrighted time, a second copyrighted person influence factor and a second classics influence factor;
weighting the first copyrighted time, the first copyrighted person influence factor and the first classics influence factor to obtain a first classics comprehensive index of the first classics;
weighting the second copyrighted time, the second copyrighted person influence factor and the second classics influence factor to obtain a second classics comprehensive index of the second classics;
sequencing and adjusting the first map relation and the second map relation according to the first classical style comprehensive index and the second classical style comprehensive index to obtain a first branch sequence;
and combining the first branch sequences to obtain the first map branch.
6. The method according to claim 1, wherein the method further comprises:
reading a preset medicinal material type;
crawling a first medicinal material image set for constructing a first medicinal material in the preset medicinal material types;
a first medicinal material name set of the first medicinal material is established, wherein the first medicinal material name set comprises a first medicinal material name;
and positioning a first position of the first medicinal material name in the target knowledge graph, and linking the first medicinal material image set to the first position.
7. The method of claim 6, wherein the method further comprises:
sequentially extracting features of images in the first medicinal material image set to obtain a first medicinal material image feature set;
constructing an automatic medicinal material identification module according to the first medicinal material image set and the first medicinal material image feature set;
acquiring an image of any medicinal material;
performing feature extraction on the arbitrary medicinal material image through a medicinal material feature extraction unit in the medicinal material automatic identification module to obtain arbitrary medicinal material features;
the automatic medicinal material identification module is used for determining any medicinal material type based on the characteristics of any medicinal material.
8. The method of claim 7, wherein the drug feature extraction unit comprises a convolution layer, an expanded convolution layer, and a pooling layer, the method further comprising:
the expansion convolution layers comprise N expansion convolution layers, wherein N is an integer greater than 1;
processing the image features of the arbitrary medicinal materials obtained by the convolution layers through a first expansion convolution layer in the N expansion convolution layers to obtain expansion features of the image of the arbitrary medicinal materials, wherein the first expansion convolution layer corresponds to a first expansion rate, and the image features of the arbitrary medicinal materials are the features of the image of the arbitrary medicinal materials extracted by the convolution layers;
the pooling layer analyzes the image expansion characteristics of the arbitrary medicinal materials and determines the kinds of the arbitrary medicinal materials.
9. The knowledge graph construction system of the Chinese medicine classics is characterized in that the knowledge graph construction method for implementing the Chinese medicine classics according to any one of claims 1-8 comprises the following steps:
the knowledge category acquisition module is used for acquiring a first classmark knowledge category from the classmark knowledge database, and the first classmark knowledge category corresponds to a first professional vocabulary set;
the keyword set building module is used for retrieving a first classical book catalog of a first classical book in a preset classical book set and crawling a first keyword set for building the first classical book catalog;
the page range acquisition module is used for acquiring a first page range corresponding to a first keyword in the first classbook, wherein the first keyword is any keyword in the first keyword set;
the page range storage module is used for intercepting the first page range in the first classbook and storing the first page range to the first classbook knowledge category when the first keyword exists in the first professional word set, so as to form a first map relation;
the knowledge graph construction module is used for constructing a first graph branch based on the first graph relation and constructing a target knowledge graph according to the first graph branch.
CN202311549672.3A 2023-11-20 2023-11-20 Knowledge graph construction method and system for Chinese medicine books Active CN117494811B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311549672.3A CN117494811B (en) 2023-11-20 2023-11-20 Knowledge graph construction method and system for Chinese medicine books

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311549672.3A CN117494811B (en) 2023-11-20 2023-11-20 Knowledge graph construction method and system for Chinese medicine books

Publications (2)

Publication Number Publication Date
CN117494811A true CN117494811A (en) 2024-02-02
CN117494811B CN117494811B (en) 2024-05-28

Family

ID=89682763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311549672.3A Active CN117494811B (en) 2023-11-20 2023-11-20 Knowledge graph construction method and system for Chinese medicine books

Country Status (1)

Country Link
CN (1) CN117494811B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101122905A (en) * 2006-08-08 2008-02-13 王宏源 Method for associating classical book database with historical geographic information system for supporting four bytes
CN103729402A (en) * 2013-11-22 2014-04-16 浙江大学 Method for establishing mapping knowledge domain based on book catalogue
CN108597587A (en) * 2018-04-26 2018-09-28 南京大经中医药信息技术有限公司 A kind of veteran TCM doctor's experience intelligence succession and clinical aid decision-making system and method
CN109190113A (en) * 2018-08-10 2019-01-11 北京科技大学 A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records
CN109740168A (en) * 2019-01-09 2019-05-10 北京邮电大学 A kind of classic of TCM ancient Chinese prose interpretation method based on knowledge of TCM map and attention mechanism
CN110888989A (en) * 2019-10-25 2020-03-17 江苏智风教育科技有限公司 Intelligent learning platform and construction method thereof
CN110888991A (en) * 2019-11-28 2020-03-17 哈尔滨工程大学 Sectional semantic annotation method in weak annotation environment
CN111723213A (en) * 2020-06-02 2020-09-29 广东小天才科技有限公司 Learning data acquisition method, electronic device and computer-readable storage medium
CN112614565A (en) * 2020-12-04 2021-04-06 杨茜 Traditional Chinese medicine classic famous prescription intelligent recommendation method based on knowledge-graph technology
CN112749284A (en) * 2020-12-31 2021-05-04 平安科技(深圳)有限公司 Knowledge graph construction method, device, equipment and storage medium
WO2021103492A1 (en) * 2019-11-28 2021-06-03 福建亿榕信息技术有限公司 Risk prediction method and system for business operations
CN113342989A (en) * 2021-05-24 2021-09-03 北京航空航天大学 Knowledge graph construction method and device of patent data, storage medium and terminal
CN114496119A (en) * 2022-01-27 2022-05-13 医灯续焰(上海)生物科技有限公司 Method and device for tracing evolution relationship of prescription and server
CN114595344A (en) * 2022-05-09 2022-06-07 北京市农林科学院信息技术研究中心 Crop variety management-oriented knowledge graph construction method and device
WO2022198756A1 (en) * 2021-03-23 2022-09-29 平安科技(深圳)有限公司 Information pushing method and apparatus based on hot event, computer device, and storage medium
CN116821376A (en) * 2023-08-30 2023-09-29 北京华琦远航国际咨询有限公司 Knowledge graph construction method and system in coal mine safety production field

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101122905A (en) * 2006-08-08 2008-02-13 王宏源 Method for associating classical book database with historical geographic information system for supporting four bytes
CN103729402A (en) * 2013-11-22 2014-04-16 浙江大学 Method for establishing mapping knowledge domain based on book catalogue
CN108597587A (en) * 2018-04-26 2018-09-28 南京大经中医药信息技术有限公司 A kind of veteran TCM doctor's experience intelligence succession and clinical aid decision-making system and method
CN109190113A (en) * 2018-08-10 2019-01-11 北京科技大学 A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records
CN109740168A (en) * 2019-01-09 2019-05-10 北京邮电大学 A kind of classic of TCM ancient Chinese prose interpretation method based on knowledge of TCM map and attention mechanism
CN110888989A (en) * 2019-10-25 2020-03-17 江苏智风教育科技有限公司 Intelligent learning platform and construction method thereof
WO2021103492A1 (en) * 2019-11-28 2021-06-03 福建亿榕信息技术有限公司 Risk prediction method and system for business operations
CN110888991A (en) * 2019-11-28 2020-03-17 哈尔滨工程大学 Sectional semantic annotation method in weak annotation environment
CN111723213A (en) * 2020-06-02 2020-09-29 广东小天才科技有限公司 Learning data acquisition method, electronic device and computer-readable storage medium
CN112614565A (en) * 2020-12-04 2021-04-06 杨茜 Traditional Chinese medicine classic famous prescription intelligent recommendation method based on knowledge-graph technology
CN112749284A (en) * 2020-12-31 2021-05-04 平安科技(深圳)有限公司 Knowledge graph construction method, device, equipment and storage medium
WO2022198756A1 (en) * 2021-03-23 2022-09-29 平安科技(深圳)有限公司 Information pushing method and apparatus based on hot event, computer device, and storage medium
CN113342989A (en) * 2021-05-24 2021-09-03 北京航空航天大学 Knowledge graph construction method and device of patent data, storage medium and terminal
CN114496119A (en) * 2022-01-27 2022-05-13 医灯续焰(上海)生物科技有限公司 Method and device for tracing evolution relationship of prescription and server
CN114595344A (en) * 2022-05-09 2022-06-07 北京市农林科学院信息技术研究中心 Crop variety management-oriented knowledge graph construction method and device
CN116821376A (en) * 2023-08-30 2023-09-29 北京华琦远航国际咨询有限公司 Knowledge graph construction method and system in coal mine safety production field

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王晓宇;张小凡;海兴华;刘芳;郭义;田露;孙庆;: "基于Citespace推拿治未病的可视化分析", 中华针灸电子杂志, no. 03, 15 August 2020 (2020-08-15) *
那一沙;袁玫;杜修平;: "基于词频分析和共词聚类的教学设计热点问题的研究", 现代教育技术, no. 03, 15 March 2013 (2013-03-15) *

Also Published As

Publication number Publication date
CN117494811B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN107391906B (en) Healthy diet knowledge network construction method based on neural network and map structure
Goëau et al. Pl@ ntnet mobile app
CN112487202A (en) Chinese medical named entity recognition method and device fusing knowledge map and BERT
CN111078852A (en) College leading-edge scientific research team detection system based on machine learning
Khoo et al. Augmenting Dublin core digital library metadata with Dewey decimal classification
Purificato et al. Multimedia and geographic data integration for cultural heritage information retrieval
Taneva et al. Gathering and ranking photos of named entities with high precision, high recall, and diversity
Benavent et al. FCA-based knowledge representation and local generalized linear models to address relevance and diversity in diverse social images
Tsai et al. Qualitative evaluation of automatic assignment of keywords to images
CN117494811B (en) Knowledge graph construction method and system for Chinese medicine books
CN103440261A (en) System and method for searching biomedical flow chart basing on content and structure
Zheng et al. Discovering discriminative patches for free-hand sketch analysis
Tong et al. A document exploring system on LDA topic model for Wikipedia articles
Barai et al. Image Annotation System Using Visual and Textual Features.
Pocco et al. Exploring scientific literature by textual and image content using DRIFT
c Neethu et al. Retrieval of images using data mining techniques
Chauhan et al. Efficient layer-wise feature incremental approach for content-based image retrieval system
Karczmarczyk et al. Linguistic query based quality evaluation of selected image search engines
Carta et al. CulturAI: Semantic Enrichment of Cultural Data Leveraging Artificial Intelligence
Miao Knowledge Mapping of Medicinal Plants Based on Artificial Neural Network.
Badghaiya et al. Image classification using tag and segmentation based retrieval
Yamamuro et al. Exsight-multimedia information retrieval system
Omar et al. WAY-LOOK4: A CBIR system based on class signature of the images' color and texture features
Vermilyer Intelligent user interface agents in content-based image retrieval
Jeyasekhar et al. Towards Effective Relevance Feedback Methods in Content-Based Image Retrieval Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant