CN114860887A - Disease content pushing method, device, equipment and medium based on intelligent association - Google Patents

Disease content pushing method, device, equipment and medium based on intelligent association Download PDF

Info

Publication number
CN114860887A
CN114860887A CN202210589107.9A CN202210589107A CN114860887A CN 114860887 A CN114860887 A CN 114860887A CN 202210589107 A CN202210589107 A CN 202210589107A CN 114860887 A CN114860887 A CN 114860887A
Authority
CN
China
Prior art keywords
disease
sentence
retrieved
user
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210589107.9A
Other languages
Chinese (zh)
Inventor
邓楚华
白丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangjian Information Technology Shenzhen Co Ltd
Original Assignee
Kangjian Information Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangjian Information Technology Shenzhen Co Ltd filed Critical Kangjian Information Technology Shenzhen Co Ltd
Priority to CN202210589107.9A priority Critical patent/CN114860887A/en
Publication of CN114860887A publication Critical patent/CN114860887A/en
Priority to PCT/CN2022/121588 priority patent/WO2023226262A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a disease content pushing method, device, equipment and medium based on intelligent association. The method comprises the following steps: responding to a retrieval request of a user, and obtaining a sentence to be retrieved input by the user; performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved; searching and storing the disease content corresponding to each search term in a pre-stored mapping document; and after sequencing the disease contents according to the preset weight, pushing the disease contents to the user according to the ascending order of the weight. The invention matches the label of the user with the label of the retrieval content, realizes the pushing content of thousands of people and thousands of faces, and accurately pushes the query result.

Description

Disease content pushing method, device, equipment and medium based on intelligent association
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a disease content pushing method, device, equipment and medium based on intelligent association.
Background
At present, most products of a plurality of companies on the market basically have retrieval functions, such as shopping mall module commodity search, prescription system disease search and medicine retrieval, patient disease retrieval, inquiry doctor search and the like. Generally, a user inputs a plurality of keywords, and a search system pushes corresponding search content to the user by matching the search content with the keywords.
The inventors have realized that most of the above search systems can only search for content including keywords, but cannot search for content related to the keywords but not including the keywords, and when the search keywords are too small, more content cannot be searched, and the search content cannot be pushed to the user from person to person.
Disclosure of Invention
The invention aims to provide a disease content pushing method, a device, equipment and a medium based on intelligent association, which aim to solve the technical problems that most of the existing retrieval systems can only search contents containing keywords, but cannot search contents related to the keywords but not containing the keywords, and when the retrieval keywords are too few, more contents cannot be retrieved, and the retrieved contents cannot be pushed to users according to different people.
In a first aspect, a disease content pushing method based on intelligent association is provided, including:
responding to a retrieval request of a user, and obtaining a sentence to be retrieved input by the user;
performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved;
searching and storing disease contents corresponding to each search term in a pre-stored mapping document, wherein each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search terms;
and after sequencing the disease contents according to the preset weight, pushing the disease contents to the user according to the ascending order of the weight.
In a second aspect, an intelligent association-based disease content pushing apparatus is provided, including:
the retrieval statement acquisition module is used for responding to a retrieval request of a user and acquiring a statement to be retrieved input by the user;
the retrieval word acquisition module is used for performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved;
the system comprises a disease content retrieval module, a search module and a search module, wherein the disease content retrieval module is used for searching and storing disease contents corresponding to search words in a pre-stored mapping document, each search word corresponds to one or more disease contents, and the disease contents represent information associated with the search words;
and the disease content pushing module is used for sequencing the disease contents according to the preset weight and then pushing the disease contents to the user according to the ascending order of the weight.
In a third aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the intelligent question-answering processing method are implemented.
In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the intelligent question-answering processing method.
The disease content pushing method, system, device and medium based on intelligent association obtain the sentence to be retrieved input by the user by responding to the retrieval request of the user. And performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved. And then searching and storing the disease content corresponding to each search term in a pre-stored mapping document, wherein each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search terms. After the disease contents are sequenced according to the preset weight, the disease contents are pushed to the user according to the ascending order of the weight, so that the disease contents different from person to person are pushed. The problem of single search is avoided, the pushing content of thousands of people is achieved, the query result is pushed accurately, and the user can retrieve the result which is most matched with the user at the first time. During searching, a small number of keywords input by a user can be searched, and more contents can be searched. Information not containing the keywords is also searched, so that a satisfactory result can be searched if a user only remembers a small number of keywords, and efficient, rapid and real-time searching of mass data is realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive labor:
fig. 1 is a schematic diagram illustrating an application environment of a method for pushing disease content based on intelligent association according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a method for pushing disease content based on intelligent association according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating step S20 according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating the step S30 according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating the step S32 according to an embodiment of the present invention;
FIG. 6 is a block diagram of a disease content pushing device based on intelligent association according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a computer device according to an embodiment of the present invention;
fig. 8 is another schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The disease content pushing method based on intelligent association provided by the embodiment of the invention can be applied to the application environment shown in fig. 1, wherein a client communicates with a server through a network. The server side can push corresponding disease content through a retrieval request input by the client side, and obtains the sentence to be retrieved input by the user by responding to the retrieval request of the user. And performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved. And then searching and storing the disease content corresponding to each search term in a pre-stored mapping document, wherein each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search terms. After the disease contents are sequenced according to the preset weight, the disease contents are pushed to the user according to the ascending order of the weight, so that the disease contents different from person to person are pushed. In the invention, the current disease content pushing mode for diseases is usually a single retrieval mode, so that a user can only search the content containing the keywords in the content, but cannot search the content which is related to the keywords but does not contain the keywords. And the retrieval speed is slow, and when the data reaches a certain amount, the retrieval speed is reduced, the front-end loading is delayed, and the user experience is influenced. Aiming at the problems, the user label is matched with the label of the retrieval content prestored in the mapping file maintained in the system, the relevant disease content is retrieved by using an inverted index mode, and the retrieval content with high weight is preferentially pushed according to different weights of the retrieval content. Therefore, the problem of single search is avoided, the pushing content of thousands of people is realized, the query result is accurately pushed, and the user can retrieve the result which is most matched with the user at the first time. During searching, a small number of keywords input by a user can be searched, and more contents can be searched. Information not containing the keywords is also searched, so that a satisfactory result can be searched if a user only remembers a small number of keywords, and efficient, rapid and real-time searching of mass data is realized. Among other things, the client may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented by an independent server or a server cluster composed of a plurality of servers. The present invention is described in detail below with reference to specific examples.
Referring to fig. 2, fig. 2 is a schematic flow chart of a disease content pushing method based on intelligent association according to an embodiment of the present invention, including the following steps:
and S10, responding to the retrieval request of the user, and obtaining the sentence to be retrieved input by the user.
In this embodiment, when a user wants to search for a content desired by the user in the search software, a sentence to be searched, for example, "what is diabetes", "how to prevent diabetes", or the like, may be input in the search box of the search software. It can be understood that the retrieval software in this embodiment may be any software capable of realizing the user retrieval requirement, including but not limited to various browsers, commodity search software, and secure and healthy APP, and the retrieval software may receive a sentence to be retrieved input by a user in real time, analyze and process the sentence to be retrieved, and push the retrieved content to the user. Those skilled in the art can adaptively select corresponding search software according to actual search requirements.
In this embodiment, after step S1, the method further includes: the method comprises the steps of obtaining a retrieval account number of a user, and obtaining a user label of the user according to the retrieval account number, wherein the user label is a doctor, a patient or a tourist. When logging in and browsing the system, the authorities of doctors, patients and tourists are different, and the corresponding retrieval accounts are also different. The account information of the user can be acquired in a point burying mode, and the identity of the user can be distinguished through the account when the user logs in a browsing system. The system comprises a user label, a browsing system and a server, wherein the user label is that a user of a tourist can directly browse disease contents in the system without registering, and after the user label is that a user of a doctor logs in the browsing system, the user label can inquire various medicines for treating the same disease according to the authority of the user, so that the user can better provide a proper medicine for a patient. After the user label logs in the browsing system for the user of the patient, the system can push a common method for treating the patient and a corresponding department to the user, so that the user can find the corresponding department in time and diagnose the patient as soon as possible.
S20, performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of search words contained in the sentence to be retrieved, wherein each search word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved.
In step S20, the performing word segmentation processing on the sentence to be retrieved by using the forward maximum matching word segmentation algorithm to obtain a plurality of search words included in the sentence to be retrieved includes the following steps:
s21, taking the first Chinese character of the sentence to be retrieved as a starting point;
s22, retrieving the phrase with the longest word number matched with the sentence to be retrieved from a pre-stored dictionary, and storing the phrase as the current retrieval word;
s23, segmenting the current search word from the sentence to be searched to obtain the segmented sentence to be searched, and repeating the steps S21 to S22 to obtain the search word until the sentence to be searched is segmented.
In this embodiment, the word segmentation processing is performed on the sentence to be retrieved by using the forward maximum matching word segmentation algorithm, the word segmentation can be performed by an elastic search-analysis-ik word segmentation device, and the words in the dictionary pre-stored in the word segmentation device can be updated at any time. The elastic search is an open source search server based on Lucene, provides a full-text search engine with distributed multi-user capability, and can realize the functions of real-time retrieval and analysis of mass data in the application. The existing database component or search component is used, the retrieval method provided by the embodiment can be conveniently integrated with the existing database system and the web page publishing system, and the stability of the system is improved. Specifically, the first Chinese character of the sentence to be retrieved can be used as a starting point, several continuous Chinese characters in the sentence to be retrieved are matched with phrases pre-stored in a dictionary from left to right, and if the continuous Chinese characters are matched, the Chinese characters are used as current retrieval words and are segmented out of the sentence to be retrieved, so that the sentence to be retrieved after segmentation is obtained. For example, when the sentence to be retrieved is "how to prevent diabetes", starting from the "like" word of the sentence to be retrieved, and matching the words with a plurality of phrases pre-stored in the dictionary word by word, a plurality of phrases beginning with "like" in the dictionary, such as "how to prevent", "if there is diabetes", "how to treat", "how to diagnose", etc., are retrieved. Then, the "what" of the sentence to be searched is matched with the plurality of phrases one by one, so as to obtain a plurality of phrases beginning with "how", such as "how to prevent", "how to treat", "how to diagnose", and the like. And then searching for a word with the third Chinese character as 'pre' from the phrases, repeating the matching process, and finally obtaining how to prevent the word in the dictionary, wherein the word is the same as the search word formed by the first 4 Chinese characters of the sentence to be searched, and segmenting the 'how to prevent' from the sentence to be searched to obtain the sentence to be searched 'diabetes'. Then, according to the retrieval method, the segmented sentences to be retrieved are continuously retrieved, and finally 2 retrieval words of how to prevent and diabetes mellitus can be obtained. Because the Chinese word segmentation function of the elastic search-analysis-ik word segmentation device is mature, the forward maximum matching word segmentation algorithm built in the elastic search-analysis-ik is used, so that the Chinese content of the sentence to be retrieved is conveniently and accurately segmented, and the word segmentation efficiency is high.
In this embodiment, before performing word segmentation processing on the sentence to be retrieved, the method further includes: and cleaning the sentence to be retrieved, and deleting punctuation marks, special marks and spaces in the sentence to be retrieved. Illustratively, if the sentence to be retrieved is "how to treat diabetes? First, the search sentence is subjected to necessary data cleaning, and the question mark and the blank space are removed, so that the 'how to treat diabetes' is obtained. The punctuation marks in the embodiment may be symbols of various auxiliary character recording languages such as commas, periods, colons, question marks and the like, and the special symbols may be tab symbols, unit symbols, arrow symbols and the like. Through cleaning the sentence to be retrieved, various non-Chinese characters can be removed from the sentence to be retrieved, so that the retrieval efficiency is greatly improved, and the disease content can be rapidly pushed to the user.
S30, searching and saving disease contents corresponding to the search terms in the pre-stored mapping document, wherein each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search terms.
In step S30, searching and saving the disease content corresponding to each search term in the pre-stored mapping document, including the following processes:
s31, searching the mapping document, and matching one search word with each pre-stored primary category in the mapping document to obtain a primary category corresponding to the current search word, wherein the mapping document pre-stores a plurality of primary categories, each primary category comprises a plurality of secondary categories, the primary categories represent disease types, the secondary categories represent treatment medicines, preventive measures and specific disease types corresponding to the current primary categories, each secondary category pre-stores a secondary category label and an address, the secondary category label is a doctor, a patient or a tourist, and the address represents the storage position of the disease content corresponding to the secondary category;
s32, searching disease contents corresponding to the current search terms in a plurality of secondary categories contained in the primary category in an inverted index mode;
s33, another search term is selected, and the steps S31 to S32 are repeated until all the search terms are searched, and the disease content corresponding to each search term is obtained.
Further, step S32 includes the following processes:
s321, matching the user label of the user with the secondary category labels corresponding to the secondary categories to obtain the secondary categories with the secondary category labels consistent with the user labels, and taking the secondary categories as secondary categories to be retrieved;
s322, searching the disease content corresponding to the current search word according to the address prestored in the secondary category to be searched.
In this embodiment, after a plurality of search terms are obtained in a word segmentation manner, the disease content corresponding to each search term is searched in the mapping document in an inverted index manner. The inverted index is used to store a mapping of the storage location of a phrase in a document or a group of documents under a full-text search, which is the most commonly used data structure in document retrieval systems. Through the inverted index, a document list containing the phrase can be quickly acquired according to the phrase, and corresponding document information is acquired. Specifically, each search word can be matched with each primary category in the mapping document one by one in a keyword matching mode, and the primary category matched with the search word is obtained. The primary category is the name of various diseases, such as diabetes, heart disease, and the like. Illustratively, as shown in table 1, table 1 schematically lists the correspondence between the primary category and the secondary category in the dictionary. When the retrieval word is diabetes, the matched first class is diabetes, and when the second class is a specific disease type, the first class can be type 1 diabetes, type 2 diabetes and gestational diabetes. The secondary category is therapeutic drugs, which may be metformin, pioglitazone and flovose. The secondary categories, when preventive measures are taken, can be weight control, adherence to exercise and timely blood glucose monitoring. Among them, type 1 diabetes (doc2) indicates that the content of the disease related to type 1 diabetes is stored in document 2. And matching the user label with the secondary category label to obtain a secondary category meeting the user label, and taking the content in the address corresponding to the secondary category as the disease content of the search word.
TABLE 1
Figure BDA0003664387090000081
As an example, when the user's label is a doctor, the input sentence to be retrieved is "diabetes", type 2 diabetes can be retrieved, the therapeutic drug is biglitazone rosiglitazone, and the relevant disease content that the preventive measure is adherence to exercise, such as the profile of type 2 diabetes stored in document 3, the usage and side effects of the pre-stored biglitazone rosiglitazone stored in document 6, the duration of exercise stored in document 9, and the like. When the label of the user is a patient, the same search statement is input, and the related information of type 1 diabetes, metformin serving as a therapeutic drug and weight control serving as a preventive measure can be searched. It is understood that the secondary categories may also include departments for treating diseases, basic manifestations of diseases, etc., and those skilled in the art may adapt the secondary categories according to actual needs. By matching the secondary category label with the user label, the query result is pushed accurately and quickly, so that the user can retrieve the result which is most matched with the user at the first time. By using the inverted index mode, even the information without the search word can be searched out, thereby realizing the pushing of thousands of people and thousands of faces and leading the pushed disease content to be more targeted.
And S40, sorting the disease contents according to preset weights, and pushing the disease contents to the user according to the ascending order of the weights.
In this embodiment, after the disease contents of the user are acquired, since each disease content is preset with a weight, the disease contents can be sorted in the order from high to low corresponding to the weight, and the disease contents with high weights are preferentially displayed. For example, when the user is a tourist, who retrieves diabetes, the resulting disease content and corresponding weights are: the weight of the medicine for treating diabetes is 0.7, and the weight of the medicine for preventing diabetes is 0.85. The disease content seen by the user will be presented with the prevention measures for diabetes first and then with the content of the drugs for treating diabetes, etc. The method and the device realize the pushing of the contents of thousands of people and the pushing of the query result accurately, and enable the user to retrieve the result which is most matched with the user at the first time. In addition, the retrieval mode uses the ElastiSearch technology, the retrieval speed of mass data is high, the time for presenting the retrieval content by the traditional method is more than 5 seconds when diabetes is retrieved for ten million-level data volume, and the time for presenting the retrieval content is less than 1 second when the diabetes is retrieved by the method.
In this embodiment, the obtaining process of the mapping file is as follows:
s301, obtaining keywords of a sentence to be retrieved, and crawling data corresponding to the keywords by adopting a distributed crawler technology;
s302, analyzing the crawled data and constructing the mapping file.
Illustratively, when the target to be retrieved is "treatment of heart disease", the keywords are "heart disease" and "treatment", and a distributed crawler technology is used to crawl through network channels such as news websites, microblogs, WeChats and the like according to the keywords, so as to obtain mass data related to the keywords. Optionally, the distributed crawler may use an Acrap framework. And after the data corresponding to the target to be retrieved is obtained, labeling processing is carried out according to the content of the data. For example, if the data is a drug for treating heart disease, such as Suxiaojiuxin pill, since it is an over-the-counter drug and is a more common drug for treating heart disease, the label of the drug may be labeled as the patient. By this parsing processing, a mapping file is constructed.
It is understood that the disease content pushing method based on intelligent association in the present embodiment can be applied to a variety of different fields, such as: speech recognition, medical diagnosis, testing of applications, etc.
Therefore, in the above scheme, the corresponding disease content is pushed through the retrieval request input by the user, and the sentence to be retrieved input by the user is obtained by responding to the retrieval request of the user. And performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved. And then searching and storing the disease content corresponding to each search term in a pre-stored mapping document, wherein each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search terms. After the disease contents are sequenced according to the preset weight, the disease contents are pushed to the user according to the ascending order of the weight, so that the disease contents different from person to person are pushed. In the invention, the current disease content pushing mode for diseases is usually a single retrieval mode, so that a user can only search the content containing the keywords in the content, but cannot search the content which is related to the keywords but does not contain the keywords. And the retrieval speed is slow, and when the data reaches a certain amount, the retrieval speed is reduced, the front-end loading is delayed, and the user experience is influenced. Aiming at the problems, the user label is matched with the label of the search content prestored in the mapping file maintained in the system, the related disease content is searched by using an inverted index mode, and the search content with high weight is preferentially pushed according to different weights of the search content. Therefore, the problem of single search is avoided, the pushing content of thousands of people is realized, the query result is accurately pushed, and the user can retrieve the result which is most matched with the user at the first time. During searching, a small number of keywords input by a user can be searched, and more contents can be searched. Information not containing the keywords is also searched, so that a satisfactory result can be searched if a user only remembers a small number of keywords, and efficient, rapid and real-time searching of mass data is realized.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, an intelligent association-based disease content pushing device is provided, and the intelligent association-based disease content pushing device corresponds to the intelligent association-based disease content pushing method in the above embodiments one to one. As shown in fig. 6, the intelligent association-based disease content pushing apparatus includes a search sentence acquisition module 111, a search term acquisition module 112, a disease content retrieval module 113, and a disease content pushing module 114. The functional modules are explained in detail as follows:
the retrieval statement acquiring module 111 is configured to, in response to a retrieval request of a user, acquire a to-be-retrieved statement input by the user, where the user has a user tag, and the user tag is a doctor, a patient, or a visitor.
A search term obtaining module 112, configured to perform term segmentation processing on the sentence to be searched by using a forward maximum matching term segmentation algorithm, to obtain a plurality of search terms included in the sentence to be searched, where each search term is formed by a plurality of consecutive different Chinese characters in the sentence to be searched;
a disease content retrieval module 113, configured to search and store, in a pre-stored mapping document, disease content corresponding to each search term, where each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search term;
the disease content pushing module 114 is configured to rank the disease contents according to a preset weight, and then push the disease contents to the user according to an increasing order of the weight.
In an embodiment, the retrieval statement obtaining module 111 is further configured to:
the method comprises the steps of obtaining a retrieval account number of a user, and obtaining a user label of the user according to the retrieval account number, wherein the user label is a doctor, a patient or a tourist.
In an embodiment, the term obtaining module 112 is specifically configured to:
taking the first Chinese character of the sentence to be retrieved as a starting point;
searching the phrase with the most word number matched with the sentence to be searched from a pre-stored dictionary to be used as a current search word;
and segmenting the current search word from the sentence to be searched to obtain the segmented sentence to be searched, and repeating the steps to obtain the search word until the sentence to be searched is segmented.
In an embodiment, the term obtaining module 112 is further configured to:
and cleaning the sentence to be retrieved, and deleting punctuation marks, special marks and spaces in the sentence to be retrieved.
In an embodiment, the disease content retrieving module 113 is specifically configured to:
searching the mapping document, and matching one search word with each pre-stored primary category in the mapping document to obtain a primary category corresponding to the current search word, wherein the mapping document is pre-stored with a plurality of primary categories, each primary category comprises a plurality of secondary categories, the primary categories represent types of diseases, the secondary categories represent therapeutic drugs, preventive measures and specific disease types corresponding to the current primary categories, each secondary category is pre-stored with a secondary category label and an address, the secondary category label is a doctor, a patient or a tourist, and the address represents a storage position of disease content corresponding to the secondary category;
searching the disease content corresponding to the current search word in a plurality of secondary categories contained in the primary category by adopting an inverted index mode;
and selecting another search term, and repeating the steps until all the search terms are searched, so as to obtain the disease content corresponding to each search term.
In an embodiment, the disease content retrieving module 113 is further configured to:
matching the user label of the user with the secondary category labels corresponding to the secondary categories to obtain the secondary categories with the secondary category labels consistent with the user labels, and taking the secondary categories as the secondary categories to be retrieved;
and searching the disease content corresponding to the current search word according to the address prestored in the secondary category to be searched.
In an embodiment, the disease content retrieving module 113 is further configured to:
obtaining keywords of a sentence to be retrieved, and crawling data corresponding to the keywords by adopting a distributed crawler technology;
and analyzing the crawled data to construct the mapping file.
The invention provides a disease content pushing device based on intelligent association, which is used for responding to a retrieval request of a user to obtain a sentence to be retrieved input by the user. And performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved. And then searching and storing the disease content corresponding to each search term in a pre-stored mapping document, wherein each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search terms. After the disease contents are sequenced according to the preset weight, the disease contents are pushed to the user according to the ascending order of the weight, so that the disease contents different from person to person are pushed. In the invention, the current disease content pushing mode for diseases is usually a single retrieval mode, so that a user can only search the content containing the keywords in the content, but cannot search the content which is related to the keywords but does not contain the keywords. And the retrieval speed is slow, and when the data reaches a certain amount, the retrieval speed is reduced, the front-end loading is delayed, and the user experience is influenced. Aiming at the problems, the user label is matched with the label of the retrieval content prestored in the mapping file maintained in the system, the relevant disease content is retrieved by using an inverted index mode, and the retrieval content with high weight is preferentially pushed according to different weights of the retrieval content. Therefore, the problem of single search is avoided, the pushing content of thousands of people is realized, the query result is accurately pushed, and the user can retrieve the result which is most matched with the user at the first time. During searching, a small number of keywords input by a user can be searched, and more contents can be searched. Information not containing the keywords is also searched, so that a satisfactory result can be searched if a user only remembers a small number of keywords, and efficient, rapid and real-time searching of mass data is realized.
For specific limitations of the intelligent association based disease content pushing apparatus, reference may be made to the above limitations of the intelligent association based disease content pushing method, and details are not repeated here. The modules in the intelligent association-based disease content pushing device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes non-volatile and/or volatile storage media, internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external client through a network connection. The computer program is executed by a processor to implement functions or steps of a disease content push method service side based on intelligent association.
In one embodiment, a computer device is provided, which may be a client, and its internal structure diagram may be as shown in fig. 8. The computer device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external server through a network connection. The computer program is executed by a processor to implement functions or steps of a disease content push method client side based on intelligent association
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
responding to a retrieval request of a user, and obtaining a sentence to be retrieved input by the user;
performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved;
searching and storing disease contents corresponding to each search term in a pre-stored mapping document, wherein each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search terms;
and after sequencing the disease contents according to the preset weight, pushing the disease contents to the user according to the ascending order of the weight.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
responding to a retrieval request of a user, and obtaining a sentence to be retrieved input by the user;
performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved;
searching and storing disease contents corresponding to each search term in a pre-stored mapping document, wherein each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search terms;
and after sequencing the disease contents according to the preset weight, pushing the disease contents to the user according to the ascending order of the weight.
It should be noted that, the functions or steps that can be implemented by the computer-readable storage medium or the computer device can be referred to the related descriptions of the server side and the client side in the foregoing method embodiments, and are not described here one by one to avoid repetition.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A disease content pushing method based on intelligent association is characterized by comprising the following steps:
responding to a retrieval request of a user, and obtaining a sentence to be retrieved input by the user;
performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved;
searching and storing disease contents corresponding to each search term in a pre-stored mapping document, wherein each search term corresponds to one or more disease contents, and the disease contents represent information associated with the search terms;
and after sequencing the disease contents according to the preset weight, pushing the disease contents to the user according to the ascending order of the weight.
2. The intelligent association-based disease content pushing method according to claim 1, wherein after obtaining the sentence to be retrieved input by the user, the method further comprises: the method comprises the steps of obtaining a retrieval account number of a user, and obtaining a user label of the user according to the retrieval account number, wherein the user label is a doctor, a patient or a tourist.
3. The intelligent association-based disease content pushing method according to claim 2, wherein the step of searching and saving the disease content corresponding to each search term in a pre-stored mapping document comprises the following steps:
s31, searching the mapping document, and matching one search word with each pre-stored primary category in the mapping document to obtain a primary category corresponding to the current search word, wherein the mapping document pre-stores a plurality of primary categories, each primary category comprises a plurality of secondary categories, the primary categories represent disease types, the secondary categories represent treatment medicines, preventive measures and specific disease types corresponding to the current primary categories, each secondary category pre-stores a secondary category label and an address, the secondary category label is a doctor, a patient or a tourist, and the address represents the storage position of the disease content corresponding to the secondary category;
s32, searching disease contents corresponding to the current search word in a plurality of secondary categories contained in the primary category in an inverted index mode;
s33, another search term is selected, and the steps S31 to S32 are repeated until all the search terms are searched, and the disease content corresponding to each search term is obtained.
4. The intelligent association-based disease content pushing method according to claim 3, wherein the searching for the disease content corresponding to the current search term in the plurality of secondary categories included in the primary category by means of the inverted index comprises the following steps:
matching the user label of the user with the secondary category labels corresponding to the secondary categories to obtain the secondary categories with the secondary category labels consistent with the user labels, and taking the secondary categories as secondary categories to be retrieved;
and searching the disease content corresponding to the current search word according to the address prestored in the secondary category to be searched.
5. The intelligent association-based disease content pushing method according to claim 1, wherein the using a forward maximum matching word segmentation algorithm to perform word segmentation on the sentence to be retrieved to obtain a plurality of search words contained in the sentence to be retrieved comprises the following steps:
s21, taking the first Chinese character of the sentence to be retrieved as a starting point;
s22, retrieving the phrase with the most word number matched with the sentence to be retrieved from a pre-stored dictionary, and storing the phrase as the current retrieval word;
s23, segmenting the current search word from the sentence to be searched to obtain the segmented sentence to be searched, and repeating the steps S21 to S22 to obtain the search word until the sentence to be searched is segmented.
6. The intelligent association-based disease content pushing method according to claim 1, wherein before performing word segmentation processing on the sentence to be retrieved, the method further comprises: and cleaning the sentence to be retrieved, and deleting punctuation marks, special marks and spaces in the sentence to be retrieved.
7. The intelligent association based disease content pushing method according to claim 1, wherein the mapping file is obtained by:
obtaining a keyword of a sentence to be retrieved, and crawling data corresponding to the keyword by adopting a distributed crawler technology;
and analyzing the crawled data to construct the mapping file.
8. A disease content pusher based on intelligent association is characterized by comprising:
the retrieval statement acquisition module is used for responding to a retrieval request of a user and acquiring a statement to be retrieved input by the user;
the retrieval word acquisition module is used for performing word segmentation processing on the sentence to be retrieved by using a forward maximum matching word segmentation algorithm to obtain a plurality of retrieval words contained in the sentence to be retrieved, wherein each retrieval word is composed of a plurality of continuous different Chinese characters in the sentence to be retrieved;
the system comprises a disease content retrieval module, a search module and a search module, wherein the disease content retrieval module is used for searching and storing disease contents corresponding to search words in a pre-stored mapping document, each search word corresponds to one or more disease contents, and the disease contents represent information associated with the search words;
and the disease content pushing module is used for sequencing the disease contents according to the preset weight and then pushing the disease contents to the user according to the ascending order of the weight.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202210589107.9A 2022-05-26 2022-05-26 Disease content pushing method, device, equipment and medium based on intelligent association Pending CN114860887A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210589107.9A CN114860887A (en) 2022-05-26 2022-05-26 Disease content pushing method, device, equipment and medium based on intelligent association
PCT/CN2022/121588 WO2023226262A1 (en) 2022-05-26 2022-09-27 Intelligent association-based disease content pushing method and apparatus, device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210589107.9A CN114860887A (en) 2022-05-26 2022-05-26 Disease content pushing method, device, equipment and medium based on intelligent association

Publications (1)

Publication Number Publication Date
CN114860887A true CN114860887A (en) 2022-08-05

Family

ID=82642250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210589107.9A Pending CN114860887A (en) 2022-05-26 2022-05-26 Disease content pushing method, device, equipment and medium based on intelligent association

Country Status (2)

Country Link
CN (1) CN114860887A (en)
WO (1) WO2023226262A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023226262A1 (en) * 2022-05-26 2023-11-30 康键信息技术(深圳)有限公司 Intelligent association-based disease content pushing method and apparatus, device, and medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349452B (en) * 2023-12-04 2024-02-09 长春中医药大学 Information service system for traditional Chinese medicine retrieval
CN117763129B (en) * 2024-02-22 2024-05-28 神州医疗科技股份有限公司 Medical record retrieval method and system based on generated pre-training model

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678576B (en) * 2013-12-11 2016-08-17 华中师范大学 The text retrieval system analyzed based on dynamic semantics
US11061948B2 (en) * 2016-09-22 2021-07-13 Verizon Media Inc. Method and system for next word prediction
CN107832442A (en) * 2017-11-17 2018-03-23 陆光辉 A kind of traditional Chinese medicine information query system and method
CN111984851B (en) * 2020-09-03 2023-11-14 深圳平安智慧医健科技有限公司 Medical data searching method, device, electronic device and storage medium
CN114860887A (en) * 2022-05-26 2022-08-05 康键信息技术(深圳)有限公司 Disease content pushing method, device, equipment and medium based on intelligent association

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023226262A1 (en) * 2022-05-26 2023-11-30 康键信息技术(深圳)有限公司 Intelligent association-based disease content pushing method and apparatus, device, and medium

Also Published As

Publication number Publication date
WO2023226262A1 (en) 2023-11-30

Similar Documents

Publication Publication Date Title
CN107609163B (en) Medical knowledge map generation method, storage medium and server
CN109815333B (en) Information acquisition method and device, computer equipment and storage medium
CN114860887A (en) Disease content pushing method, device, equipment and medium based on intelligent association
WO2020253725A1 (en) Drug recommendation method, electronic device, and computer-readable storage medium
CN111984851B (en) Medical data searching method, device, electronic device and storage medium
CN108427702B (en) Target document acquisition method and application server
CN113707303A (en) Method, device, equipment and medium for solving medical problems based on knowledge graph
WO2021051869A1 (en) Text data layout arrangement method, device, computer apparatus, and storage medium
CN112651236B (en) Method and device for extracting text information, computer equipment and storage medium
CN109036508A (en) A kind of traditional medical information processing method, device, computer equipment and storage medium
CN112035757A (en) Medical waterfall flow pushing method, device, equipment and storage medium
WO2023029513A1 (en) Artificial intelligence-based search intention recognition method and apparatus, device, and medium
CN111191105A (en) Method, device, system, equipment and storage medium for searching government affair information
CN113111159A (en) Question and answer record generation method and device, electronic equipment and storage medium
CN110569419A (en) question-answering system optimization method and device, computer equipment and storage medium
US20100306183A1 (en) Electronic system for a social -network web portal applied to the sector of health and health information
WO2023029510A1 (en) Remote diagnostic inquiry method and apparatus based on artificial intelligence, and device and medium
Hom et al. Facilitating clinical research through automation: Combining optical character recognition with natural language processing
CN107766400A (en) Text searching method and system
Guo et al. Identifying COVID-19 cases and extracting patient reported symptoms from Reddit using natural language processing
Oyebode et al. Identifying adverse drug reactions from patient reviews on social media using natural language processing
Topac et al. Patient empowerment by increasing the understanding of medical language for lay users
CN114817686A (en) Data query method, device, equipment and medium based on search ranking
US11269937B2 (en) System and method of presenting information related to search query
US11157538B2 (en) System and method for generating summary of research document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination