CN117648424A - System for acquiring domain knowledge of natural medicinal materials - Google Patents

System for acquiring domain knowledge of natural medicinal materials Download PDF

Info

Publication number
CN117648424A
CN117648424A CN202311710151.1A CN202311710151A CN117648424A CN 117648424 A CN117648424 A CN 117648424A CN 202311710151 A CN202311710151 A CN 202311710151A CN 117648424 A CN117648424 A CN 117648424A
Authority
CN
China
Prior art keywords
user
dialogue
natural
medicinal material
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311710151.1A
Other languages
Chinese (zh)
Inventor
许田
张岳
杨子杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Westlake University
Original Assignee
Westlake University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Westlake University filed Critical Westlake University
Priority to CN202311710151.1A priority Critical patent/CN117648424A/en
Publication of CN117648424A publication Critical patent/CN117648424A/en
Pending legal-status Critical Current

Links

Abstract

The application relates to a system for acquiring knowledge of a natural medicinal material private domain. The system comprises a dialogue application program, a first learning model, a search engine and a natural medicinal material specific domain knowledge base, wherein the dialogue application program provides a user interaction interface for a user, receives a user question of the user, the user intention of which obtains the natural medicinal material specific domain knowledge, and presents an answer generated by the first learning model to the user. The first learning model generates answers to the latest user questions based on the dialogue history of the user, or performs information retrieval on the natural medicine specific domain knowledge base using at least one of co-finger-based graph search, vector search, and full-text search using a search engine to obtain background knowledge associated with the user questions, and generates answers to the user questions based on the dialogue history in which the background knowledge is embedded. The system can enable the user to intelligently and accurately acquire the required authoritative, accurate, standardized and comprehensive natural medicinal material specific domain knowledge in a friendly dialogue mode.

Description

System for acquiring domain knowledge of natural medicinal materials
Technical Field
The application relates to the field of information processing and application of natural medicinal materials, in particular to a system for acquiring knowledge of a natural medicinal material specific domain.
Background
Crude drugs (Natural Medicinal Material, NMM) have long been recognized as a powerful therapeutic agent library, with importance being represented by the diversity and biological relevance of the compounds they produce. These compounds play a key role in coping with various pathological conditions, covering a broad range from infectious diseases to cancers, and continue to be a rich source of new drug leads. In addition, natural medicinal materials have a wide clinical history throughout the world, such as in China, india and Arabic, demonstrating their long-lasting relevance in the global healthcare setting. Although they make a great contribution to healthcare, due to the complexity of the natural medicinal materials themselves, for example, even though the species-based source and the medicinal site are the same, the processing method is only different and actually corresponds to different natural medicinal materials, however, due to the fact that they are usually not strictly distinguished in terms of names, only knowledge on one side or even wrong is often obtained when information retrieval is performed by using internet retrieval tools which are conventionally used or in existing databases.
Taking "herba Ephedrae" as an example, the term "herba Ephedrae" is described in detail in the chinese pharmacopoeia (2020 edition) to refer to natural medicinal materials derived from several different species, including in particular Ephedra sinica (plant herba Ephedrae), ephedra intermedia (herba Ephedrae in plant) or Ephedra equisetina (plant herba Equiseti hiemalis), whereas when trying to query "Ma Huang" or "Ephedra" via an internet search engine to seek private domain knowledge about "herba Ephedrae" only incomplete or misleading entries, such as "Ephedra is a medicinal preparation from the plant Ephedra sinica" are often available. In addition, on the basis that the related knowledge of the natural medicinal materials cannot be correspondingly stored in an authoritative and strict manner, the existing dialogue-based platform based on internet search is not only not helpful in acquiring the knowledge of the specific domain of the natural medicinal materials, but may even aggravate the inaccuracy, for example, the platform may assert Ephedra as a single species-based source of Ephedra in positive language, the acquisition of the knowledge of the specific domain is incorrect or inaccurate, which causes great obstacle to the scientific research of the natural medicinal materials, and the reliability and the effectiveness of academic conclusions are also inevitably damaged, and even research progress in the field is hindered.
Therefore, no tool capable of enabling various users including professionals in the field to conveniently, efficiently and intelligently acquire accurate, standardized and comprehensive related knowledge in the field of natural medicinal materials is found in the prior art.
Disclosure of Invention
The present application is provided to solve the above-mentioned problems occurring in the prior art.
There is a need for a system for acquiring domain-specific knowledge of natural medicinal materials that enables various users, including professionals in the field, to acquire accurate, standardized and comprehensive domain-related knowledge of natural medicinal materials conveniently, efficiently, and intelligently.
According to a first aspect of the present application, there is provided a system for acquiring natural drug specific domain knowledge, the system comprising a dialog application, a first learning model, a search engine, and a natural drug specific domain knowledge base, the dialog application configured to: providing a user interaction interface for a user, and receiving dialogue information input by the user on the user interaction interface, wherein the dialogue information comprises a user question for acquiring natural medicinal material private domain knowledge; and presenting answers to the user questions generated by the first learning model to the user; the first learning model is configured to: based on the dialogue history with the user, judging whether the dialogue history is enough to answer the latest user question; generating an answer to the latest user question based on the dialogue history if the dialogue history is judged to be sufficient to answer the latest user question; under the condition that the first learning model judges that the dialogue history is insufficient to answer the latest user question, performing first processing on dialogue information containing the latest user question, and interacting with the search engine by using the dialogue information after the first processing; the search engine is configured to: based on the first processed dialogue information, performing information retrieval on the natural medicinal material private domain knowledge base by adopting at least one of graph search, vector search and full text search based on common fingers to acquire background knowledge associated with the latest user question, embedding the background knowledge into the dialogue information of the user to generate dialogue history consisting of each round of dialogue information and corresponding background knowledge, and returning the dialogue history to the first learning model.
According to the system for acquiring the natural medicinal material private domain knowledge, the dialogue application program, the learning model, the search engine and the private domain knowledge base which are specially used for the natural medicinal material private domain are combined and applied, a special system for acquiring the natural medicinal material private domain knowledge is provided for a user, a friendly interface of a dialogue mode can be provided for the user, the required authoritative, accurate, standardized and comprehensive natural medicinal material private domain knowledge can be intelligently and accurately acquired, scientific research in the natural medicinal material field has a consistent understanding and unified knowledge base, the validity and the credibility of academic conclusions are guaranteed, and the healthy development of research in the field is promoted.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. The same reference numerals with letter suffixes or different letter suffixes may represent different instances of similar components. The accompanying drawings illustrate various embodiments by way of example in general and not by way of limitation, and together with the description and claims serve to explain the disclosed embodiments. Such embodiments are illustrative and not intended to be exhaustive or exclusive of the present apparatus or method.
Fig. 1 shows a partial block diagram of a system for acquiring natural drug specific domain knowledge according to an embodiment of the present application.
Fig. 2 (a) shows a schematic diagram of a user interaction interface containing dialogue guidance information according to an embodiment of the present application.
Fig. 2 (b) shows a schematic diagram of an example question and its answers according to an embodiment of the present application.
Fig. 3 (a) shows a schematic diagram of dialog information containing user-customized information according to an embodiment of the present application.
Fig. 3 (b) shows another schematic diagram of dialog information containing user-customized information according to an embodiment of the present application.
Fig. 4 (a) shows a schematic diagram of a user interaction interface containing irrelevant questions according to an embodiment of the present application.
Fig. 4 (b) shows another schematic diagram of a user interaction interface containing irrelevant questions according to an embodiment of the present application.
Fig. 5 (a) shows a schematic diagram of a user interaction interface containing multiple rounds of dialog information according to an embodiment of the present application.
Fig. 5 (b) shows another schematic diagram of a user interaction interface containing multiple rounds of dialog information according to an embodiment of the present application.
Fig. 6 shows a schematic diagram of a co-referent master word graph according to an embodiment of the present application.
Fig. 7 shows a schematic flow diagram of co-finger based graph searching according to an embodiment of the present application.
Fig. 8 shows a flow diagram of vector search according to an embodiment of the present application.
Fig. 9 shows a flow diagram of full text searching according to an embodiment of the present application.
Detailed Description
In order to make the technical solution of the present application better understood by those skilled in the art, the present application will be described in detail with reference to the accompanying drawings and detailed description. Embodiments of the present application will now be described in further detail with reference to the accompanying drawings and specific examples, but are not intended to be limiting of the present application.
The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises" and the like means that elements preceding the word encompass the elements recited after the word, and not exclude the possibility of also encompassing other elements. The order in which the steps of the methods described in the present application with reference to the accompanying drawings are performed is not intended to be limiting. As long as the logical relationship between the steps is not affected, several steps may be integrated into a single step, the single step may be decomposed into multiple steps, or the execution order of the steps may be exchanged according to specific requirements.
It should also be understood that the term "and/or" in this application is merely an association relationship describing the associated object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In this application, the character "/" generally indicates that the associated object is an or relationship.
A system for acquiring natural drug specific domain knowledge according to an embodiment of the present application is provided, and fig. 1 shows a partial block diagram of a system for acquiring natural drug specific domain knowledge according to an embodiment of the present application.
As shown in fig. 1, a system 100 for acquiring natural drug specific domain knowledge includes at least a dialog application 101, a first learning model 102, a search engine 103, and a natural drug specific domain knowledge base 104.
Wherein the dialog application 101 may be configured to: a user interaction interface 1011 is provided for a user, and dialog information entered by the user on the user interaction interface 1011 is received. In some embodiments, the dialogue information may include, but is not limited to, a user question intended to acquire knowledge of the natural drug specific domain. Further, the dialog application 101 is further configured to present answers to the user questions generated by the first learning model 102 to the user.
The first learning model 102 may be configured to determine whether the dialogue history is sufficient to answer a latest user question based on the dialogue history with the user, and then, in the case where the dialogue history is determined to be sufficient to answer a latest user question, generate an answer to the latest user question based on the dialogue history. For example only, assume that the dialog history is as follows:
user_message_1: the system name of ephedra is?
background_knowledge_1:
{
"nmmsn": {
"nmmsn": "Ephedra equisetina vel intermedia vel sinica Stem-herbaceous",
"nmmsn_zh": {
"zh" ("equisetum or Chinese ephedra or herb ephedra grass stem",
"pinyin": "mù zéi má huáng huò zhōng má huáng huò cǎo má huáng cǎo zhì jīng"
},
"nmmsn_name_element": {
"nmm_type": "plant",
"species_origins": [
[ "Ephedra equisetina", "herba Equiseti hiemalis" ],
"or",
[ "Ephedra intermedia", "Chinese ephedra" ],
"or",
[ "Ephdra sinica", "herba Ephedrae" ] and
],
"media_parts" [ [ "stem hereunder", "grass stems" ] ],
"special_descriptions": [],
"processing_methods": []
}
}
}
the system name of the ephedra is Ephedra equisetina vel intermedia vel sinica Stem-herebase.
New User Message:
user_message_2-Chinese System name of ephedra is?
It can be seen that the above dialog history already covers information about "system chinese name of ephedra", i.e.:
{
"nmmsn_zh": {
"zh" ("equisetum or Chinese ephedra or herb ephedra grass stem",
"pinyin": "mù zéi má huáng huò zhōng má huáng huò cǎo má huáng cǎo zhì jīng"
}
}
thus, the first model may generate answers to the latest user questions based on the dialog history described above:
the Chinese name of the Chinese ephedra in the system is herba equiseti, chinese ephedra or herb stem of herba Ephedrae.
In other embodiments, in the event that the first learning model 102 determines that the dialogue history is insufficient to answer the latest user question, a first process is performed on dialogue information that includes the latest user question, and the search engine 103 is interacted with using the dialogue information after the first process. The purpose of the first processing on the dialogue information is to enable the processed dialogue information to be better adapted to the search engine 103, including but not limited to extracting keywords of the natural medicine specific domain knowledge intended by the user from the dialogue information, or converting the dialogue information into a specific format of search words that are convenient for the search engine 103 to search in the natural medicine specific domain knowledge base 104, and the like, which is not particularly limited in this application.
The search engine 103 is configured to: based on the first processed dialogue information, at least one of graph search, vector search and full text search based on common fingers is adopted to perform information retrieval on the natural medicinal material private domain knowledge base 104 so as to acquire background knowledge associated with the latest user question, the background knowledge is embedded into the dialogue information of the user, so as to generate dialogue history consisting of each round of dialogue information and corresponding background knowledge, and the dialogue history is returned to the first learning model 102.
In the application, the dialogue application program, the learning model, the search engine and the domain knowledge base which are specially used in the natural medicinal material professional field are combined and applied, so that a special system for acquiring the natural medicinal material domain knowledge is provided for a user, the user does not need to use a general search engine to perform blind search on the Internet and automatically judge the accuracy of information when the user wants to know and learn the natural medicinal material professional knowledge, but can utilize the system of the embodiment of the application to question in a dialogue manner in a friendly interface without knowing the process of the knowledge search, and the required authoritative, accurate, standardized and comprehensive natural medicinal material domain knowledge can be intelligently and accurately acquired. Therefore, the system for acquiring the knowledge of the natural medicinal material private domain skillfully fuses the knowledge search process in the dialogue process, so that the system not only has good user experience, but also enables scientific research in the natural medicinal material field to have consistent understanding and unified knowledge base, thereby ensuring the validity and credibility of academic conclusions and further promoting the healthy development of research in the field.
Unlike general dialog applications, dialog applications according to embodiments of the present application may provide dialog guide information to a user on the user interface, as a result of knowledge acquisition specific to the field of natural medicinal materials, where the dialog guide information includes example questions for the user to view answers to the example questions by clicking.
Fig. 2 (a) shows a schematic diagram of a user interaction interface containing dialogue guidance information according to an embodiment of the present application. As shown in fig. 2 (a), the dialog application provides "is the system name of the ephedra? "," NMM ID of herba Ephedrae is? Is the species-based source of "and" ephedra? "three example questions, and the user can view the answers to the example questions by clicking on any one of them. Fig. 2 (b) shows a schematic diagram of an example question and its answers according to an embodiment of the present application. As shown in fig. 2 (b), when the user clicks on the first example question "system name of ephedra is? "in the case of a question, the system will present the answer corresponding to the question on the user interface, namely: the system name of herba Ephedrae is herba Equiseti hiemalis or herba Ephedrae, or herba Ephedrae grass stem (Ephedra equisetina vel intermedia vel sinica Stem-hereabove).
By providing the user with dialogue guidance information, the user, in particular a new user or a user with less field experience, can be facilitated to learn a typical usage method of a dialogue application. In addition, the dialogue application program according to the application can also construct a multi-angle and fully covered example questioning library based on the typical natural medicinal material expertise available in the system, and provide a switching function such as 'change of example questioning' for the user, so that the user can obtain more accurate questioning assistance as much as possible.
In some embodiments, the dialogue information between the user and the system may be dialogue information in natural language, such as text information input by the user, voice information, text information converted from language information, and the like. Further, the dialogue information may also be dialogue information in natural language form of different languages. That is, the system according to the embodiments of the present application has learning and processing capabilities of natural language, particularly cross-language natural language, which is of great significance in promoting international communication and standardization in the field of natural medicinal materials.
The dialog application according to embodiments of the present application also supports user customization through dialog. By way of example only, in the case where answer style related information desired by a user is included in the dialogue information, an answer to the user question that matches the answer style related information generated by a first learning model is presented to the user, wherein the answer style related information includes at least one of a language of the answer, a format of the answer, and a length of the answer. Fig. 3 (a) shows a schematic diagram of dialog information containing user-customized information according to an embodiment of the present application. In fig. 3 (a), the user specifies the language of the answer in the dialogue information by "please answer in french", and the dialogue application gives an answer to the user's question in french in response to the user's customization of the language of the answer. In some embodiments, the language that the user specifies for the answer includes, but is not limited to, chinese and english, for example, english-to-chinese and chinese-to-english controls, etc., and the specification of the language of the answer is not limited by the language used by the current user interface.
In other embodiments, the user may also specify the format of the answer, including but not limited to JSON format. Fig. 3 (b) shows another schematic diagram of dialog information containing user-customized information according to an embodiment of the present application. In fig. 3 (b), since the user specified answers in JSON format in the dialogue information, in response to the user's specification, the application program is "is the species-based source of ephedra? "gives an answer in JSON format, i.e.: { "species-based source" [ "Ephedra equisetina", "or", "Ephedra intermedia", "or", "Ephedra sinica" ] }. In other embodiments, under the circumstance that the natural medicinal material specific domain knowledge base and the like can be supported, the system can also respond to the specification of the user for answering in other formats, so that the user can conveniently integrate the natural medicinal material specific domain knowledge in a specific format into other programs or databases, and the practical value of the system in the application is further improved.
In other embodiments, the user may specify the length of the answer according to the requirement, which is beneficial to specify "short answer", "detailed answer", specify the number of words contained in the answer, and so on, which is not particularly limited in this application.
In other embodiments, the dialog application according to embodiments of the present application does not provide answers to user questions unrelated to natural medicinal materials, chinese medicinal materials, or chinese medicine. Fig. 4 (a) and 4 (b) respectively show schematic diagrams of a user interaction interface for chinese and english containing irrelevant questions according to an embodiment of the present application.
As shown in fig. 4 (a) and 4 (b), the dialogue application receives chinese and english dialogue information that is not related to the natural medicine specific domain knowledge via the user interaction interface, respectively, and the dialogue application will reject the answer in this case, and may additionally give dialogue information prompting the user to make a question related to the natural medicine specific domain knowledge. In this way, the security and expertise of the conversation application may be enhanced, avoiding providing the user with any inaccurate information and misdirection.
In some embodiments, the dialog information of the user with the dialog application is multiple rounds, the dialog application being further configured to receive multiple rounds of dialog information entered by the user on the user interaction interface and to present answers to user questions contained in the rounds of dialog information generated by the first learning model to the user. Fig. 5 (a) and 5 (b) are schematic diagrams illustrating a chinese and english, respectively, user interaction interface containing multiple rounds of dialogue information according to an embodiment of the present application. The questions of the user and the dialog application in the multiple rounds of dialog information may be associated or independent of each other, which is not limited in this application. In addition, in combination with the function of providing dialogue guiding information for the user on the user interactive interface, in the case of multi-round dialogue, the method can also selectively or inferring the generated example questions according to the dialogue information of the previous round with the user in the process of interactive dialogue with the user, and provide the updated example questions for the user along with the progress of multi-round dialogue, thereby helping the user to acquire and learn the needed natural medicine specific domain knowledge more accurately and deeply.
Furthermore, the conversation application may be further configured to: and providing options of collection, downloading and reference of the history dialogue records of the user on the user interaction interface so that the user can add the history dialogue records to the private data of the user, download the original data associated with the natural medicinal material knowledge or reference pages of each dialogue record.
In other embodiments, the dialog application may be further configured to set user rating options on the user interaction interface for improved training of the first learning model using the user-submitted rating information. Taking fig. 5 (a) as an example, it can be seen that for each round of answers to a user question, a user rating option can be set to enable the user to rate for each question answer.
Each of the dialog applications according to the embodiments of the present application shown in fig. 2 (a) -5 (b) described above may be invoked by a user by accessing a web page, that is, without installing a special application, only by accessing a specific web address, so that the user can use the system of the present application conveniently on various terminal devices.
The natural medicine specific domain knowledge base according to the embodiment of the application is specially constructed for facilitating acquisition of natural medicine specific domain knowledge, and is configured to include natural medicine system naming, structured and standardized natural medicine knowledge, natural medicine terminology, natural medicine relation set, natural medicine related text and the like.
The related text of the natural medicinal materials at least comprises Chinese pharmacopoeia: version 2020: an updated implementation version of Yi section or Chinese pharmacopoeia.
The structured and standardized natural medicinal material knowledge is obtained by the following steps: version 2020: the Chinese pharmacopoeia is obtained by structuring and standardizing related information of natural medicinal materials in the updated implementation version of the first part or the Chinese pharmacopoeia, and covers the Chinese pharmacopoeia: version 2020: all natural medicinal materials in the updated implementation version of Yi section or Chinese pharmacopoeia.
Therefore, the system for acquiring the knowledge of the specific domain of the natural medicinal material integrates the most authoritative information source in the field of the natural medicinal material, which is also the basis for providing the user with the professional, standardized and comprehensive knowledge of the specific domain of the natural medicinal material.
In addition, in order to enable the joint application of the search engine and the natural medicinal material specific domain knowledge base to better acquire natural medicinal material specific domain knowledge, the natural medicinal material specific domain knowledge base is further configured to store the natural medicinal material specific domain knowledge in JSON format, for example, and the basic data structure is as follows:
{
“_id”: “NMM ID”,
"field": "value"
}
Wherein, the NMM ID is the unique ID number of the natural medicinal material in the natural medicinal material specific domain knowledge base, and the 'field' value can be any attribute associated with the NMM ID and the corresponding value thereof.
Further, JSON of natural drug specific domain knowledge may also have a hierarchical nested data structure, as shown in the following examples:
{
“_id”: “NMM ID”,
"field 1": {
"field 2": {
"field 3": "value",
"field 4" value "
},
"field 5": value "
}
}
In the embodiment of the application, the nesting depth of the data structure of the natural medicinal material specific domain knowledge JSON is set to be as few as possible, for example, the nesting layer number is not more than 4 layers, or 8 layers, or 16 layers, or 32 layers, or 64 layers, in addition, for the fields with close relations, the fields can be stored in an upper-level field, such as the field 3 and the field 4 in the example, as the fields with close relations, and stored in the upper-level field 2, so that when the field 3 and the field 4 need to be searched, only the value of the field 2 needs to be called, and therefore, when the search engine is used for searching, the search efficiency is higher. By controlling the nesting depth, the situation that a user cannot navigate well due to excessive layers when browsing knowledge and performance degradation caused by deeper structural search when searching knowledge is effectively avoided.
The natural medicinal material relation set in the natural medicinal material private domain knowledge base according to the embodiment of the application comprises a plurality of groups of natural medicinal material relations, and each group of natural medicinal material relations can be expressed as a manually annotated [ source object, relation, target object ] triplet, wherein the source object and the target object at least comprise natural medicinal material terms, and the relations at least comprise synonym relations, inclusion relations, derivative/superior relations, derivative/inferior relations and peer relations. By way of example only, synonym relationships include, for example: [ traditional Chinese medicine, synonym (Synonym), traditional Chinese medicine ]; the inclusion relationships include, for example: [ Natural medicinal material, comprising, processed natural medicinal material ]; the derivative/superior relationships include, for example: [ Artemisia annua, derived/superior, plant Artemisia annua ]; the derived/subordinate relationships include, for example: [ Artemisia annua, derivative/inferior, artemisia annua segment ]; the peer relationship includes, for example: [ Ginseng radix, peer, ginseng radix leaf ]. In addition, the source object and the target object may be nouns and terms, respectively, in different languages, etc., e.g., the synonym relationship may also include cross-language synonyms, such as [ natural medicinal materials, cross-language synonyms, natural Medicinal Material ], etc., which are not listed herein.
In some embodiments, each of the natural drug terms in the natural drug specific domain knowledge base has a unique co-referent primary word corresponding thereto. The "co-index word" refers to a group of words having the same concept, and the main word of the natural medicinal material term corresponding to a certain representative word in the group of words is called as the co-index main word of the concept. The natural medicinal material terms are marked by the corresponding unique co-pointing main words, so that ambiguity can be effectively avoided, and the standardization degree of the natural medicinal material specific domain knowledge can be improved.
On the basis of the natural medicinal material relation set, searching each group of natural medicinal material relations with synonym relations in the natural medicinal material relation set by using a search engine, so as to construct a co-indicated main word graph (Coreference Primary Term Graph, CPTG), wherein the constructed co-indicated main word graph contains all co-indicated main words and can embody the synonym relations among all natural medicinal material terms. From this, CPTG can be basically considered as a directed acyclic graph (Directed Acyclic Graph).
Fig. 6 shows a schematic diagram of a co-referent master word graph according to an embodiment of the present application. In the case of searching for each group of natural medicinal relationships having a Synonym relationship in the set of natural medicinal relationships using a search engine, for example, the Synonym relationship associated with the term "Herba Ephedrae" in the search result may include [ Epheat Herba, synonym, herba Ephedrae ], [ Epheat, synonym, herba Ephedrae ], [ Ma huang, synonym, herba Ephedrae ], and [ Herba Ephedrae, synonym, nmm-0006], [ Ma-huang, synonym, nmm-0006], [ Herba Equiseti hiemalis or Chinese Ephedra or herb stems, synonym, nmm-0006], [ Ephedra equisetina vel intermedia vel sinica Stem-herebaceous, synonym, nmm-0006], based on the above Synonym search result, a commonly referred master word map (part) as shown in FIG. 6 may be constructed. As can be seen from fig. 6, NMM ID is used as a main word of each natural medicinal material term in the natural medicinal material knowledge base, and in the constructed co-pointing main word graph, as a co-pointing main word of each natural medicinal material term, all natural medicinal material terms with the same concept and natural medicinal material terms with synonym relationships (including cross-language synonym relationships, etc.) with the natural medicinal material terms can be finally pointed to the co-pointing main word by a search engine, specifically to fig. 6, ephedrae Herba, ephedra, ma huang, ephedra, ma-huang, horsetail Ephedra or Chinese Ephedra or Ephedra stem, ephedra equisetina vel intermedia vel sinica Stem-hereball all point to the co-pointing main word NMM-0006.
In the case where the natural medicine specific domain knowledge base and the search engine according to the embodiment of the present application have the above configuration, a process of searching in the natural medicine specific domain knowledge base by using the search engine to obtain natural medicine specific domain knowledge required by the user will be described in detail with reference to fig. 7 to 9. By way of example only, a simplified NMM knowledge document about "ephedra" is as follows:
[
{
"_id": "nmm-0006",
"snnmm": {
"nmm_id": "nmm-0006",
"nmmsn": {
"nmmsn": "Ephedra equisetina vel intermedia vel sinica Stem-herbaceous",
"nmmsn_zh": {
"zh" ("equisetum or Chinese ephedra or herb ephedra grass stem",
"pinyin": "mù zéi má huáng huò zhōng má huáng huò cǎo má huáng cǎo zhì jīng"
},
"nmmsn_name_element": {
"nmm_type": "plant",
"species_origins": [
[ "Ephedra equisetina", "herba Equiseti hiemalis" ],
"or",
[ "Ephedra intermedia", "Chinese ephedra" ],
"or",
[ "Ephdra sinica", "herba Ephedrae" ] and
],
"media_parts" [ [ "stem hereunder", "grass stems" ] ],
"special_descriptions": [],
"processing_methods": []
}
},
"nmmgn": {
"nmmgn": "Ma-huang",
"nmmgn_zh": {
"zh": "ephedra",
"pinyin": "má huáng"
}
}
}
}
]
fig. 7 shows a schematic flow diagram of co-finger based graph searching according to an embodiment of the present application. In some embodiments, the first learning model is further configured with information extraction capabilities, such that first processing of dialogue information containing the most recent user questions by the first learning model may specifically include extracting natural medicinal entities and search fields from the dialogue information. Thus, in the case where the first learning model determines that the dialogue history is insufficient to answer the latest user question, interaction with the search engine is required, and before the interaction, a first process needs to be performed on dialogue information containing the latest user question, where the first process may be associated with a different search mode of the search engine, for example, a co-fingered graph-based search mode may correspond to the first process of extracting the natural medicinal entity and the search field from the dialogue information. For example only, assume that the dialog history is as follows:
user_message_1: NMM ID of herba Ephedrae is?
background_knowledge_1:
{
"nmm_id": "nmm-0006"
}
The NMM ID of herba Ephedrae is NMM-0006.
New User Message:
user_message_2: the system name of ephedra is?
Since the above-mentioned dialogue history does not cover information about the "system name of ephedra", the first learning model needs to interact with the search engine. As shown in fig. 7, in step 701, in the case of performing information retrieval on the natural medicine private domain knowledge base using co-fingered based graph search, a natural medicine entity and a search field may be first extracted from dialogue information including the latest user question by a first learning model.
For "is the systematic name of ephedra? "such user questions," ephedra "is a natural medicinal entity, and" system name "is a search field, that is, the first learning model may extract the following NMM entity names and field names:
{
"nmm_list": [ "ephedra" ]
}
{
"field_list" [ "System name" ] and
}
next, in step 702, a search engine searches the natural medicinal entity and the search field in a co-pointing dominant word graph to obtain a first co-pointing dominant word corresponding to the natural medicinal entity and a second co-pointing dominant word corresponding to the search field. By taking the co-indicated main word of the natural medicinal material entity as NMM ID and the co-indicated main word of the search field as field ID as an example, searching 'ephedra' serving as the natural medicinal material entity and 'system name' serving as the search field in the co-indicated main word graph by a search engine, the first co-indicated main word 'NMM-0006' and the second co-indicated main word 'nmmsn' can be obtained:
{
"nmm_to_nmm_id": {
"ephedra": "nmm-0006"
}
}
And
{
"field_to_field_id": {
"System name" nmmsn "
}
}
It should be noted that the co-pointing main word of the natural medicinal material entity and the co-pointing main word of the search field may be preset as other co-pointing words, so long as the co-pointing main word is the only representation in the system.
In step 703, a search instruction is constructed by a search engine based on the first co-term and the second co-term. For the above example, the following search instruction may be constructed by concatenating "herba Ephedrae" as a natural medicinal entity and "system name" as a search field:
{
"queries": [
{
"nmm": "ephedra",
"field_list" [ "System name" ],
"nmm_id": "nmm-0006",
"field_id_list": ["nmmsn"]
}
]
}
notably, in the search instruction "queries", nmm and field_list are still retained, and these 2 fields are not directly used in subsequent searches, but may be embedded in the dialog history in subsequent answers generated by the first learning model, which helps to prompt the context of the first learning model search, thereby improving the accuracy of the first learning model.
In some embodiments, 2 copies of the first learning model may be used to extract NMM entities and search fields in parallel, and then the NMM entities and the search fields may be spliced, so on one hand, the construction time of the search instruction may be greatly reduced, and on the other hand, since the NMM entities and the search fields are extracted independently from each other, when the first learning model is trained, the NMM entities and the search fields may be respectively extracted for training, so that the accuracy of the first learning model may be better improved. Of course, in this manner, when more than one NMM entity and/or search field is included in the user question, all fields need to be searched for each NMM entity.
In other embodiments, the NMM entity and the search field may be extracted simultaneously by using a single first learning model, so as to construct a search instruction, so that the field to be searched for can be specified for each NMM, thereby greatly reducing the search space and improving the search speed.
In other embodiments, a serial extraction manner of NMM entities and search fields may be used, for example, NMM entities may be preferentially extracted, then user questions and the extracted NMM entities are spliced together, extraction of search fields is performed again by using the first learning model, and then a search instruction including each NMM entity and its corresponding search field is generated. Or, the search field may be preferentially extracted, then the user question and the extracted search field are spliced together, the NMM entity is extracted again by using the first learning model, and then a search instruction including each NMM entity and the corresponding search field is generated.
The construction method of the various search instructions can be trained in a supervised manner as described above, and is not described in detail herein.
In step 704, based on the search instruction, the search engine performs information search on the natural medicinal material private domain knowledge base, and takes the information search result as background knowledge associated with the latest user question; the co-pointing main word graph is constructed by searching each group of natural medicinal material relationships with synonym relationships in the natural medicinal material relationship set in advance or in real time by utilizing the search engine, and comprises all co-pointing main words and reflects the synonym relationships among the natural medicinal material terms.
It should be noted that, under the condition of pre-constructing the co-pointing main word graph by using the search engine, the constructed co-pointing main word graph can be updated together when other sets such as terms in the natural medicinal material private domain knowledge base are updated, so as to ensure that the co-pointing main word graph is in the latest state.
Based on the search instruction, the search engine searches the information in the natural medicinal material private domain knowledge base to obtain the following background knowledge:
{
"search_results": [
{
"nmm_id": "nmm-0006",
"field_results": [
{
"field_id": "nmmsn",
"field_value": [["Ephedra equisetina vel intermedia vel sinica Stem-herbaceous"]]
}
]
}
}
the dialogue information of the user is embedded in the background knowledge obtained in the above manner, so as to generate the following dialogue history consisting of each round of dialogue information and the corresponding background knowledge:
Chat History:
……
New User Message:
user_message_2: the system name of ephedra is?
background_knowledge_2:
{
"queries": [
{
"nmm": "ephedra",
"field_list" [ "System name" ],
"nmm_id": "nmm-0006",
"field_id_list": ["nmmsn"]
}
]
}
{
"search_results": [
{
"nmm_id": "nmm-0006",
"field_results": [
{
"field_id": "nmmsn",
"field_value": [["Ephedra equisetina vel intermedia vel sinica Stem-herbaceous"]]
}
]
}
}
the dialog history is then returned to the first learning model, such that the first learning model generates answers to the user questions, as follows:
the system name of the ephedra is Ephedra equisetina vel intermedia vel sinica Stem-herebase.
The co-fingered based graph search shown in fig. 7 is a preferred search method in a system for acquiring knowledge of a natural drug specific domain according to an embodiment of the present application, which can make the most use of structured knowledge in a natural drug specific domain knowledge base in the present application. By referring to the main words together, the corresponding standard information in other sets of the natural medicinal material professional knowledge base is very applicable to application scenes such as vocabulary and term translation among multiple languages, standardized query or disambiguation and the like.
In addition to the co-fingered based graph search described above, the search engine of the present application may also perform vector searches. To support vector search by a search engine, the natural drug private knowledge base may be further configured to: the natural medicinal material knowledge is stored as a slice data set composed of a plurality of slice data, a first vector embedded representation corresponding to each slice data is stored in a vector database, and each slice data is associated with a corresponding co-mingled dominant word.
Slice refers to slicing the knowledge of the natural medicinal material into a plurality of small blocks in advance, each small block is slice data, and granularity of slice data may be various, for example, may include one or more fields related to the natural medicinal material, or may be a part of information of one field, etc., so that the maximum flexibility may be provided, and the application is not particularly limited. Still taking the aforementioned nmm-0006 as an example, its knowledge set can be converted into the following slice dataset:
[
{
"content": {
"nmm_id": "nmm-0006",
"nmmsn": "Ephedra equisetina vel intermedia vel sinica Stem-herbaceous",
"nmmsn_zh": {
"zh" ("equisetum or Chinese ephedra or herb ephedra grass stem",
"pinyin": "mù zéi má huáng huò zhōng má huáng huò cǎo má
huáng cǎo zhì jīng"
}
},
"embedding": [...]
},
{
"content": {
"nmm_id": "nmm-0006",
"nmmgn": "Ma-huang",
"nmmgn_zh": {
"zh": "ephedra",
"pinyin": "má huáng"
}
},
"embedding": [...]
},
{
"content": {
"nmm_id": "nmm-0006",
"nmm_type": "plant"
},
"embedding": [...]
},
{
"content": {
"nmm_id": "nmm-0006",
"species_origins": [
[ "Ephedra equisetina", "herba Equiseti hiemalis" ],
"or",
[ "Ephedra intermedia", "Chinese ephedra" ],
"or",
[ "Ephdra sinica", "herba Ephedrae" ] and
]
},
"embedding": [...]
},
{
"content": {
"nmm_id": "nmm-0006",
"media_parts" [ [ "stem hereunder", "straw" ] ]
},
"embedding": [...]
},
{
"content": {
"nmm_id": "nmm-0006",
"special_descriptions": []
},
"embedding": [...]
},
{
"content": {
"nmm_id": "nmm-0006",
"processing_methods": []
},
"embedding": [...]
}
]
It can be seen from the above-described slice data set that each slice data is associated with a corresponding co-referent and corresponds to a first vector embedded representation that is to be stored in a vector database for retrieval.
Correspondingly, the first learning model is further configured to have an information extraction capability and a capability of generating a vector embedded representation of text, and the first processing of the dialogue information including the latest user question specifically includes: extracting natural medicinal material entities from the dialogue information, searching in a co-pointing main word graph by utilizing the search engine to obtain a first co-pointing main word corresponding to the natural medicinal material entities, constructing a user question containing the first co-pointing main word, and generating a second vector embedded representation corresponding to the user question containing the first co-pointing main word. It should be noted that the first vector Embedding representation corresponding to the slice data in the slice dataset of the natural drug knowledge and the second vector Embedding representation corresponding to the user question should be the same Embedding method, for example, the same Word Embedding (Word Embedding) model or any other applicable model capable of generating a vector Embedding representation based on text, which is not limited in this application.
Fig. 8 shows a flow diagram of vector search according to an embodiment of the present application. As shown in fig. 8, in step 801, when information retrieval is performed on the domain knowledge base of natural medicinal materials by vector search, a first learning model may first extract natural medicinal material entities from the dialogue information, search the co-pointing main word graph by using the search engine to obtain first co-pointing main words corresponding to the natural medicinal material entities, construct a user question containing the first co-pointing main words, and generate a second vector embedded representation corresponding to the user question containing the first co-pointing main words. Taking the following user questions as examples of dialogue information:
{
what is the species-based source of "user_message" ephedra? "
}
The first learning model extracts natural medicinal material entities from the dialogue information:
[
{
"nmm_list": [ "ephedra" ]
}
]
Further, the search engine is utilized to search the co-indexed main word graph to obtain a first co-indexed main word corresponding to the natural medicinal material entity, namely NMM ID:
{
"nmm_to_nmm_id": {
"ephedra": "nmm-0006"
}
}
As shown in fig. 7, the co-index main word graph is constructed by searching each group of natural medicinal material relationships with synonym relationships in the natural medicinal material relationship set in advance or in real time by using the search engine, and the co-index main word graph contains all co-index main words and reflects the synonym relationships among the natural medicinal material terms.
Then, a user question in natural language containing the first co-referenceed main word is constructed as follows:
{
what is the species-based source of "user_message" ephedra? N { \ "nmm_to_nmm_id\":
{ \ "ephedra\" nmm-0006\ } "and"
}
Next, a second vector embedding representation corresponding to the user question containing the first co-referring primary word is generated as follows:
{
what is the species-based source of "user_message" ephedra? N { \ "nmm_to_nmm_id\":
{ \ephedra\ "\" nmm-0006\ "},
"user_message_embedding": [...]
}
as shown in fig. 8, next, in step 802, a representation may be embedded by the search engine based on the second vector, namely: user_message_embedding, retrieving the matching first vector embedded representations in the vector database, and taking the set of corresponding slice data (content of each match) of each matching first vector embedded representation as background knowledge associated with the latest user question. The first vector embedded representation matched in this embodiment corresponds to the slice data as follows:
[
{
"content": {
"nmm_id": "nmm-0006",
"species_origins": [
[ "Ephedra equisetina", "herba Equiseti hiemalis" ],
"or",
[ "Ephedra intermedia", "Chinese ephedra" ],
"or",
[ "Ephdra sinica", "herba Ephedrae" ] and
]
},
"embedding": [...]
},
]
specifically, when comparing the similarity between the user_message_embedding and all the first vectors in the vector database, for example, a method including, but not limited to, calculating cosine similarity between vectors may be used to obtain all slice data with a correlation degree reaching a preset standard with a user question, and on the basis of quickly and accurately identifying natural medicinal material related information associated with the user question, parameters of a search engine may be configured to allow 1 or n slice data to be flexibly and variably returned.
On this basis, background knowledge can be further embedded into the dialogue information of the user to generate the following dialogue history consisting of each round of dialogue information together with the corresponding background knowledge:
Chat History:
……
New User Message:
user_message_2: is the species-based source of ephedra?
background_knowledge_2:
{
"nmm_to_nmm_id": {
"ephedra": "nmm-0006"
}
}
[
{
"content": {
"nmm_id": "nmm-0006",
"species_origins": [
[ "Ephedra equisetina", "herba Equiseti hiemalis" ],
"or",
[ "Ephedra intermedia", "Chinese ephedra" ],
"or",
[ "Ephdra sinica", "herba Ephedrae" ] and
]
}
}
]
the dialog history may then be returned to the first learning model, such that the first learning model generates answers to the user questions, as follows:
the species-based source of Ephedra is Ephedra equisetina or Ephedra intermedia or Ephedra.
The vector search method shown in fig. 8 can accept the whole sentence or paragraph as a search input, and convert the text into a vector space through embedding, so that the converted vector embedded representation contains the semantic essence of the sentence or paragraph, and then, through the judgment of the similarity between the vector embedded representations, the knowledge and information closely related to the text to be queried in terms of semantics can be rapidly identified and retrieved, thereby providing a more relevant accurate answer for the user.
In other embodiments, a full text search may also be employed to obtain answers to user questions. To this end, the natural drug private knowledge base may be further configured to pre-define index fields for full-text searching, store the natural drug private knowledge as a searchable document having inverted indexes (inverted indexes) of the respective index fields. The inverted index is to sort the searchable documents with the same index field according to the correlation degree/importance degree of each document, and store each index field as a dictionary tree which is easy to search, so that the index field can be found in the dictionary tree rapidly, and the corresponding document can be found according to the index field. And the first learning model is further configured to have word segmentation capability, information extraction capability and document abstract generation capability, and the first processing of the dialogue information containing the latest user question comprises extracting full-text search keywords based on word segmentation of the dialogue information.
Fig. 9 shows a flow diagram of full text searching according to an embodiment of the present application. As shown in fig. 9, in step 901, a full-text search keyword is first extracted by a first learning model on the basis of word segmentation of the dialogue information.
In step 902, in the case of performing information retrieval on the natural medicinal material private domain knowledge base by using full-text search, a search engine retrieves a matched searchable document based on the full-text search keyword and the inverted index.
In step 903, the searchable documents are used as background knowledge associated with the most recent user question, or the first learning model is used to extract document summaries and/or key information from each set of matching searchable documents, and the generated document summaries and/or key information are used as background knowledge associated with the most recent user question.
In other embodiments, where a full text search is employed for information retrieval from the natural drug private knowledge base, the search engine may be further configured to: and searching the full-text search keywords by using other external search systems to generate internet search results, and fusing the internet search results with document abstracts and/or key information generated by carrying out information retrieval on the natural medicinal material private domain knowledge base to generate background knowledge related to the latest user question, wherein the internet search results have lower fusion priority.
The full-text search mode allows related information to be searched and retrieved efficiently, is suitable for application scenes such as approximate matching, phrase change, misspelling, wrongly written or substituted spelling, and the like, and enables higher system flexibility and user friendliness in the full-text mode by establishing inverted indexes for each searchable document in the natural medicinal material private domain knowledge base in advance.
In some embodiments, which one or more of co-finger based graph search, vector search, and full text search is specifically adopted may be determined by the user through selection of a search mode, or the search mode adopted may also be determined by the first learning model or the like through features in dialogue information with the user, which is not limited in this application. By using the various search modes singly or jointly, the search results can be considered to be accurate and comprehensive, so that the overall usability of the system in the application is improved.
Furthermore, although exemplary embodiments have been described herein, the scope thereof includes any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of the various embodiments across), adaptations or alterations as pertains to the present application. Elements in the claims are to be construed broadly based on the language employed in the claims and are not limited to examples described in the present specification or during the practice of the present application, which examples are to be construed as non-exclusive. It is intended, therefore, that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims and their full scope of equivalents.
The above description is intended to be illustrative and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in combination with each other. For example, other embodiments may be used by those of ordinary skill in the art upon reading the above description. In addition, in the above detailed description, various features may be grouped together to streamline the application. This is not to be interpreted as an intention that the disclosed features not being claimed are essential to any claim. Rather, the subject matter of the present application is capable of less than all of the features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the detailed description as examples or embodiments, with each claim standing on its own as a separate embodiment, and it is contemplated that these embodiments may be combined with one another in various combinations or permutations. The scope of the invention should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
The above embodiments are only exemplary embodiments of the present application and are not intended to limit the present invention, the scope of which is defined by the claims. Various modifications and equivalent arrangements may be made to the present invention by those skilled in the art, which modifications and equivalents are also considered to be within the scope of the present invention.

Claims (17)

1. A system for acquiring natural medicine specific domain knowledge is characterized in that the system comprises a dialogue application program, a first learning model, a search engine and a natural medicine specific domain knowledge base,
the dialog application is configured to:
providing a user interaction interface for a user, and receiving dialogue information input by the user on the user interaction interface, wherein the dialogue information comprises a user question for acquiring natural medicinal material private domain knowledge; and is also provided with
Presenting answers to the user questions generated by the first learning model to the user;
the first learning model is configured to:
based on the dialogue history with the user, judging whether the dialogue history is enough to answer the latest user question;
generating an answer to the latest user question based on the dialogue history if the dialogue history is judged to be sufficient to answer the latest user question;
under the condition that the first learning model judges that the dialogue history is insufficient to answer the latest user question, performing first processing on dialogue information containing the latest user question, and interacting with the search engine by using the dialogue information after the first processing;
the search engine is configured to:
Based on the first processed dialogue information, performing information retrieval on the natural medicinal material private domain knowledge base by adopting at least one of graph search, vector search and full text search based on common fingers to acquire background knowledge associated with the latest user question, embedding the background knowledge into the dialogue information of the user to generate dialogue history consisting of each round of dialogue information and corresponding background knowledge, and returning the dialogue history to the first learning model.
2. The system of claim 1, wherein the conversation application is further configured to: providing dialogue guide information for a user on the user interaction interface, wherein the dialogue guide information comprises an example question for the user to view answers of the example question through clicking.
3. The system according to claim 1 or 2, wherein the dialogue information is dialogue information in natural language.
4. A system according to claim 3, wherein the dialogue information is dialogue information in the form of natural language of different languages.
5. The system of claim 1 or 2, wherein the conversation application is further configured to:
And presenting the answer of the user question matched with the answer style related information generated by the first learning model to the user under the condition that the dialogue information contains the answer style related information expected by the user, wherein the answer style related information comprises at least one of the language of the answer, the format of the answer and the length of the answer.
6. The system of claim 5, wherein the language of the answer includes at least chinese and english.
7. The system of claim 5, wherein the format of the answer comprises at least JSON.
8. The system of claim 1 or 2, wherein the dialog information is a plurality of rounds, the dialog application further configured to:
receiving multi-round dialogue information input by a user on the user interaction interface;
the user is presented with answers to user questions contained in the respective rounds of dialogue information generated by the first learning model.
9. The system of claim 1 or 2, wherein the conversation application is further configured to: no answer is provided to user questions not related to natural herbs, chinese herbal medicines or chinese medicine.
10. The system of claim 1 or 2, wherein the conversation application is further configured to:
and providing options of collection, downloading and reference of the history dialogue records of the user on the user interaction interface so that the user can add the history dialogue records to the private data of the user, download the original data associated with the natural medicinal material knowledge or reference pages of each dialogue record.
11. The system of claim 1 or 2, wherein the conversation application is further configured to:
and setting user evaluation options on the user interaction interface so as to carry out improved training on the first learning model by using the evaluation information submitted by the user.
12. The system of claim 1 or 2, wherein the conversation application is further configured to: the user invokes the dialog application by accessing a web page.
13. The system of claim 1 or 2, wherein the natural drug specific domain knowledge base is configured to contain natural drug system naming, structured and standardized natural drug knowledge, natural drug terminology, natural drug relationship sets, and natural drug related text, wherein,
The related text of the natural medicinal materials at least comprises Chinese pharmacopoeia: version 2020: update execution version of Yi section or Chinese pharmacopoeia;
the structured and standardized natural medicinal material knowledge is obtained by the following steps: version 2020: the Chinese pharmacopoeia is obtained by structuring and standardizing related information of natural medicinal materials in the updated implementation version of the first part or the Chinese pharmacopoeia, and covers the Chinese pharmacopoeia: version 2020: all natural medicinal materials in the updated implementation version of Yi section or Chinese pharmacopoeia;
each natural medicinal material term has a unique co-referring main word corresponding to the natural medicinal material term;
the natural medicinal material relation set comprises a plurality of groups of natural medicinal material relations, the natural medicinal material relations are expressed as manually annotated [ source object, relation, target object ] triplets, wherein the source object and the target object at least comprise natural medicinal material terms, and the relations at least comprise synonym relations, inclusion relations, derivative/superior relations, derivative/inferior relations and peer relations.
14. The system of claim 13, wherein the system further comprises a controller configured to control the controller,
the first learning model is further configured to have information extraction capability, and the first processing of the dialogue information including the latest user question specifically includes extracting natural medicinal material entities and search fields from the dialogue information;
The search engine is further configured to, in the event that the natural drug private knowledge base is retrieved using co-fingered based graph search:
searching the natural medicinal material entity and the search field in the co-indicated main word graph to obtain a first co-indicated main word corresponding to the natural medicinal material entity and a second co-indicated main word corresponding to the search field;
constructing a search instruction based on the first co-pointing main word and the second co-pointing main word;
based on the search instruction, carrying out information retrieval in the natural medicinal material private domain knowledge base, and taking an information retrieval result as background knowledge related to the latest user question; wherein,
the co-index main word graph is constructed by searching each group of natural medicinal material relationships with synonym relationships in the natural medicinal material relationship set in advance or in real time by utilizing the search engine, and comprises all co-index main words and reflects the synonym relationships among the natural medicinal material terms.
15. The system of claim 13, wherein the natural drug specific domain knowledge base is further configured to: storing the natural medicinal material knowledge as a slice data set consisting of a plurality of slice data, and storing a first vector embedded representation corresponding to each slice data into a vector database, wherein each slice data is associated with a corresponding co-pointing main word;
The first learning model is further configured to have information extraction capability and capability of generating a vector embedded representation of text, and the first processing of dialogue information including the latest user question specifically includes: extracting natural medicinal material entities from the dialogue information, searching in a co-pointing main word graph by utilizing the search engine to obtain a first co-pointing main word corresponding to the natural medicinal material entities, constructing a user question containing the first co-pointing main word, and generating a second vector embedded representation corresponding to the user question containing the first co-pointing main word;
the search engine is further configured to, in the event that vector search is employed to retrieve information from the natural drug private knowledge base: retrieving, based on the second vector embedded representations, matching first vector embedded representations in the vector database and taking a set of slice data corresponding to each matching first vector embedded representation as background knowledge associated with the most recent user question; wherein,
the co-index main word graph is constructed by searching each group of natural medicinal material relationships with synonym relationships in the natural medicinal material relationship set in advance or in real time by utilizing the search engine, and comprises all co-index main words and reflects the synonym relationships among the natural medicinal material terms.
16. The system of claim 13, wherein the natural drug specific domain knowledge base is further configured to: defining index fields for full-text searching in advance, and storing the knowledge of the natural medicinal material private domain into a searchable document with inverted indexes of the index fields;
the first learning model is further configured to have word segmentation capability, information extraction capability and capability of generating a document abstract, and the first processing of the dialogue information including the latest user question specifically includes: extracting full-text search keywords on the basis of word segmentation of the dialogue information;
the search engine is further configured to, in the event that a full text search is employed to retrieve information from the natural drug private knowledge base: searching to obtain a matched searchable document based on the full-text search keyword and the inverted index; the searchable documents are used as background knowledge associated with the latest user question, or the first learning model is used for extracting document abstracts and/or key information from each matched set of searchable documents, and the generated document abstracts and/or key information are used as background knowledge associated with the latest user question.
17. The system of claim 16, wherein the search engine is further configured to, in the event that the natural drug private knowledge base is retrieved using full text search:
searching the full-text search keywords by using other external search systems to generate internet search results;
and fusing the Internet search results with document abstracts and/or key information generated by carrying out information retrieval on the natural medicinal material private domain knowledge base to generate background knowledge related to the latest user question, wherein the Internet search results have lower fusion priority.
CN202311710151.1A 2023-12-13 2023-12-13 System for acquiring domain knowledge of natural medicinal materials Pending CN117648424A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311710151.1A CN117648424A (en) 2023-12-13 2023-12-13 System for acquiring domain knowledge of natural medicinal materials

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311710151.1A CN117648424A (en) 2023-12-13 2023-12-13 System for acquiring domain knowledge of natural medicinal materials

Publications (1)

Publication Number Publication Date
CN117648424A true CN117648424A (en) 2024-03-05

Family

ID=90049350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311710151.1A Pending CN117648424A (en) 2023-12-13 2023-12-13 System for acquiring domain knowledge of natural medicinal materials

Country Status (1)

Country Link
CN (1) CN117648424A (en)

Similar Documents

Publication Publication Date Title
US8463593B2 (en) Natural language hypernym weighting for word sense disambiguation
EP2181405B1 (en) Automatic expanded language search
Gupta et al. Biperpedia: An ontology for search applications
US20040098250A1 (en) Semantic search system and method
US20060161564A1 (en) Method and system for locating information in the invisible or deep world wide web
US20180004838A1 (en) System and method for language sensitive contextual searching
CN105045852A (en) Full-text search engine system for teaching resources
WO2009029903A2 (en) Coreference resolution in an ambiguity-sensitive natural language processing system
US20230014700A1 (en) Pre-emptive graph search for guided natural language interactions with connected data systems
Rodrigues et al. Advanced applications of natural language processing for performing information extraction
Golub et al. Subject indexing in humanities: a comparison between a local university repository and an international bibliographic service
US8229970B2 (en) Efficient storage and retrieval of posting lists
US8082240B2 (en) System for retrieving information units
Willems et al. From science to practice: bringing innovations to agronomy and forestry.
KR101037091B1 (en) Ontology Based Semantic Search System and Method for Authority Heading of Various Languages via Automatic Language Translation
CN117648424A (en) System for acquiring domain knowledge of natural medicinal materials
Yousaf et al. How to identify appropriate key-value pairs for querying osm
US11520989B1 (en) Natural language processing with keywords
Vicente-Diez et al. Temporal semantics extraction for improving web search
Buitelaar et al. Integrating different strategies for cross-language information retrieval in the MIETTA project
CN110457435A (en) A kind of patent novelty analysis system and its analysis method
Jaaniso Automatic mapping of free texts to bioinformatics ontology terms
Thapa Use Case Driven Evaluation of Database Systems for ILDA
Shidha et al. Chem Text Mining-An Outline
Mosteghanemi et al. Towards a multidimensional information retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination