CN113761351A - Article recommendation method and device, electronic equipment and computer storage medium - Google Patents

Article recommendation method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN113761351A
CN113761351A CN202110294595.6A CN202110294595A CN113761351A CN 113761351 A CN113761351 A CN 113761351A CN 202110294595 A CN202110294595 A CN 202110294595A CN 113761351 A CN113761351 A CN 113761351A
Authority
CN
China
Prior art keywords
target patient
disease
article
category
articles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110294595.6A
Other languages
Chinese (zh)
Inventor
王雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Tuoxian Technology Co Ltd
Original Assignee
Beijing Jingdong Tuoxian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Tuoxian Technology Co Ltd filed Critical Beijing Jingdong Tuoxian Technology Co Ltd
Priority to CN202110294595.6A priority Critical patent/CN113761351A/en
Publication of CN113761351A publication Critical patent/CN113761351A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides an article recommendation method, an article recommendation device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring historical questionnaire data of a target patient; obtaining the disease category of the target patient according to the historical questionnaire data of the target patient; acquiring a plurality of articles, and determining a disease type label of each article in the articles; determining recommended articles to the target patient based on the disease category of the target patient and the disease type label of each article.

Description

Article recommendation method and device, electronic equipment and computer storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to an article recommendation method and apparatus, an electronic device, and a computer storage medium.
Background
With the rapid development of computer science and technology in recent years and the explosion of information technology, more and more commercial fields are expanded from offline to online; among them, the development of internet hospitals is particularly rapid in these years. With the rapid iteration of products, a content service for recommending the religious articles to the patient is provided, and the improvement of the user experience of the patient becomes a new requirement.
In the related technology, most recommendation methods for the articles suffering from the education are random recommendation or preferential recommendation is performed only aiming at the quality of the articles suffering from the education. However, random recommendation or a preferred recommendation mode only aiming at the quality of the articles suffering from education is adopted, only the dimension information of the articles is considered, and a large amount of dimension information of patients is lost; further, the problem that the article is not adapted to the patient is caused, and the accuracy of recommending the article to the patient is reduced.
Disclosure of Invention
The application provides an article recommendation method, an article recommendation device, electronic equipment and a computer storage medium; the problem that the article is not matched with the patient when the article is recommended to the patient can be solved, and the accuracy of recommending the article to the patient can be improved.
The technical scheme of the application is realized as follows:
the embodiment of the application provides an article recommendation method, which comprises the following steps:
acquiring historical questionnaire data of a target patient; obtaining the disease category of the target patient according to the historical questionnaire data of the target patient;
acquiring a plurality of articles, and determining a disease type label of each article in the articles;
determining recommended articles to the target patient based on the disease category of the target patient and the disease type label of each article.
In some embodiments, the obtaining the disease category of the target patient according to the historical questionnaire data of the target patient includes:
marking the ordering category frequency of the target patient according to the historical inquiry sheet data of the target patient to obtain the feature vector of the target patient; the ordering category frequency represents the frequency of different categories of questionnaires downloaded by the target patient within a set time period;
and inputting the feature vector of the target patient into a pre-trained classification model to obtain the disease category of the target patient.
In some embodiments, the marking the ordering category frequency of the target patient according to the historical questionnaire data of the target patient to obtain the feature vector of the target patient includes:
marking the ordering category frequency of the target patient according to the historical inquiry sheet data of the target patient and the preset disease category;
accumulating feature values of a first disease class if the target patient places an order for the first disease class; the first disease category is any one of the preset disease categories;
in the event that the target patient is not ordered for the first disease category, the characteristic value of the first disease category remains unchanged;
and obtaining a feature vector of the target patient based on the feature value of each disease category corresponding to the target patient.
In some embodiments, the classification model is trained by:
acquiring a historical inquiry list data set of a patient, and dividing the historical inquiry list data set into a training data set and a testing data set; the training data set includes: historical questionnaire data for a plurality of patients and disease category labels for the plurality of patients;
according to the historical questionnaire data of the patients, marking the ordering category frequency of the patients to obtain the feature vectors of the patients;
training a classification model through the feature vectors of the patients and the disease category labels to obtain an initial classification model;
and adjusting the initial classification model according to the test data set to obtain a trained classification model.
In some embodiments, the classification model is a model of a K-Nearest Neighbor algorithm (KNN).
In some embodiments, the determining a disease type label for each article of the plurality of articles comprises:
extracting keywords of the articles to obtain keywords of each article in the articles;
and performing labeling processing on each article based on the keywords of each article to obtain the disease type label of each article.
The embodiment of the application also provides an article recommending device, which comprises a first obtaining module, a second obtaining module and a recommending module, wherein,
the first acquisition module is used for acquiring historical questionnaire data of a target patient; obtaining the disease category of the target patient according to the historical questionnaire data of the target patient;
the second acquisition module is used for acquiring a plurality of articles and determining a disease type label of each article in the articles;
and the recommending module is used for determining the articles recommended to the target patient based on the disease category of the target patient and the disease type label of each article.
The embodiment of the application provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to implement the article recommendation method provided by one or more of the above technical solutions.
The embodiment of the application provides a computer storage medium, wherein a computer program is stored in the computer storage medium; the computer program can implement the article recommendation method provided by one or more of the above technical solutions after being executed.
The embodiment of the application provides an article recommendation method, an article recommendation device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring historical questionnaire data of a target patient; obtaining the disease category of the target patient according to the historical questionnaire data of the target patient; acquiring a plurality of articles, and determining a disease type label of each article in the articles; determining articles recommended to the target patient based on the disease category of the target patient and the disease type label of each article; thus, the corresponding disease category is determined according to the historical inquiry sheet data of the target patient, and the article is pertinently recommended to the target patient by combining the disease type label of each article in the multiple articles; in other words, by combining the information of the target patient dimension and the article dimension, the problem that the article is not matched with the patient when the article is recommended to the patient can be solved, and the accuracy of recommending the article to the patient is improved.
Drawings
FIG. 1a is a flowchart illustrating an article recommendation method in an embodiment of the present application;
FIG. 1b is a schematic diagram of article recommendation performed in an embodiment of the present application;
FIG. 2a is a schematic diagram of the KNN algorithm-based determination of patient disease category labels in an embodiment of the present application;
FIG. 2b is a flowchart illustrating another article recommendation method in an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a structure of an article recommendation device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the examples provided herein are merely illustrative of the present application and are not intended to limit the present application. In addition, the following examples are provided as partial examples for implementing the present application, not all examples for implementing the present application, and the technical solutions described in the examples of the present application may be implemented in any combination without conflict.
It should be noted that in the embodiments of the present application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a method or apparatus that comprises a list of elements does not include only the elements explicitly recited, but also includes other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, the use of the phrase "including a. -. said." does not exclude the presence of other elements (e.g., steps in a method or elements in a device, such as portions of circuitry, processors, programs, software, etc.) in the method or device in which the element is included.
The term "and/or" herein is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., I and/or J, may mean: the three cases of the single existence of I, the simultaneous existence of I and J and the single existence of J. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of I, J, R, and may mean including any one or more elements selected from the group consisting of I, J and R.
For example, although the article recommendation method provided in the embodiment of the present application includes a series of steps, the article recommendation method provided in the embodiment of the present application is not limited to the described steps, and similarly, the article recommendation apparatus provided in the embodiment of the present application includes a series of modules, but the article recommendation apparatus provided in the embodiment of the present application is not limited to include the modules explicitly described, and may include modules that are required to acquire relevant time series data or perform processing based on the time series data.
Embodiments of the application are operational with numerous other general purpose or special purpose computing system environments or configurations, and with terminal devices and server constituent computer systems. Here, the terminal devices may be thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network pcs, minicomputers, and the like, and the servers may be server computer systems, minicomputers, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
With the advent of the data processing technology age, how to mine useful information from massive data becomes an important research point of students. Therefore, deep learning algorithms are developed based on data-driven machine learning. For this application scenario, the following embodiments are proposed.
In some embodiments of the present Application, the article recommendation method may be implemented by a Processor in the article recommendation Device, where the Processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor.
Fig. 1a is a schematic flow chart of an article recommendation method in an embodiment of the present application, and as shown in fig. 1a, the method includes the following steps:
step 100: acquiring historical questionnaire data of a target patient; and obtaining the disease category of the target patient according to the historical questionnaire data of the target patient.
In the embodiment of the application, the target patient can be one or more patients with the downloaded records of the questionnaire at the internet hospital; the Internet hospital has the functions of consultation, follow-up visit, chronic disease management and the like, and has the powerful support of the entity hospital, so that medical service is provided for patients on line; that is, some simple problems associated with the patient need not be present at the physical hospital and can be resolved online.
In one embodiment, the historical questionnaire data represents data in an questionnaire downloaded by the target patient over a set period of time; illustratively, the data may include the disease category, length of illness, etc. of the target patient.
In the embodiment of the application, the set time period of the historical inquiry sheet data is not limited; for example, the historical questionnaire data may be the previous three years of the target patient, or the historical questionnaire data may be the previous six months of the target patient.
In some embodiments, deriving the disease category of the target patient from historical questionnaire data of the target patient may include: marking the ordering category frequency of the target patient according to the historical inquiry sheet data of the target patient to obtain the characteristic vector of the target patient; the order type frequency represents the frequency of different types of inquiry lists downloaded by the target patient in a set time period; and inputting the characteristic vector of the target patient into a pre-trained classification model to obtain the disease category of the target patient.
In the embodiment of the application, the inquiry lists of different categories correspond to different disease categories. Since the historical questionnaire data of the target patient includes all the questionnaire data downloaded within the set time period, the order category frequency of the target patient can be determined according to the historical questionnaire data of the target patient.
In one embodiment, assuming that patient 1 downloaded three questionnaires in the first six months, if the disease categories corresponding to the two questionnaires are both disease category a, the order-placing category frequency of disease category a is 2; if the disease category corresponding to another inquiry sheet is the disease category B, the frequency of placing the order category of the disease category B is 1.
For example, for an implementation that labels the ordering category frequency of the target patient according to the historical questionnaire data of the target patient to obtain the feature vector of the target patient, the implementation may include: marking the ordering category frequency of the target patient according to the historical inquiry sheet data of the target patient and the preset disease category; accumulating the feature values of the first disease category in case the target patient places an order for the first disease category; the first disease category is any one of preset disease categories; in the case where the target patient has not placed an order for the first disease category, the eigenvalues of the first disease category remain unchanged; and obtaining a feature vector of the target patient based on the feature value of each disease category corresponding to the target patient.
In the embodiment of the application, before marking the ordering category frequency of the target patient, the preset disease category needs to be determined; the predetermined disease category may include a plurality of different categories of diseases; illustratively, there may be heart disease, diabetes, cancer, depression, and the like.
For example, although the number of disease categories included in the preset disease category is not limited in the embodiment of the present application, since there are many disease categories existing in real life, in order to improve the efficiency of subsequent processing, common disease categories may be included in the preset disease categories, and relatively rare disease categories are uniformly represented by other disease categories. That is, the preset disease category may be composed of a plurality of common categories of diseases together with other disease categories.
In one embodiment, each disease category in the preset disease categories corresponds to one feature, that is, the disease category may be a feature vector composed of a plurality of features; for example, the preset disease categories include 11 disease categories, 10 common categories of diseases and other types of diseases, respectively; the pre-set disease category may be represented as an 11-dimensional feature vector.
In the embodiment of the application, when the ordering type frequency of the target patient is not marked, the characteristic value of each characteristic is 0, and the characteristic values of the disease types are accumulated under the condition that the target patient orders a certain disease type in the preset disease types according to the historical inquiry sheet data of the target patient; here, the accumulated value for each accumulation is a fixed value, and the size of the accumulated value is not limited, and may be, for example, 1, 2, or 3. Under the condition that the target patient is determined not to be ordered for a certain disease category in the preset disease categories, the characteristic value of the disease category is kept unchanged; if it is determined that the target patient has never ordered a disease category from the preset disease categories, the characteristic value of the disease category is 0.
In the embodiment of the application, after the order-placing category frequency of the target patient is marked, the characteristic value of each characteristic in the characteristic vector can be determined, and then the characteristic vector of the target patient is obtained; the disease category of the target patient can be obtained by inputting the feature vector of the target patient into a pre-trained classification model.
In some embodiments, the classification model is trained by: acquiring a historical inquiry sheet data set of a patient, and dividing the historical inquiry sheet data set into a training data set and a testing data set; the training data set includes: historical questionnaire data for a plurality of patients and disease category labels for the plurality of patients; according to historical questionnaire data of a plurality of patients, marking the ordering category frequency of the plurality of patients to obtain feature vectors of the plurality of patients; training the classification model through the feature vectors of a plurality of patients and the disease category labels to obtain an initial classification model; and adjusting the initial classification model according to the test data set to obtain a trained classification model.
Illustratively, obtaining a historical questionnaire dataset for a patient may include: collecting historical questionnaire data of each patient from an internet hospital; performing data cleaning operation on the collected historical inquiry sheet data of each patient; therefore, illegal data such as expired data, incomplete data, repeated data and the like can be filtered; the reliability of data is ensured; storing the washed historical inquiry list data of each patient to a patient end of an Internet hospital; further, a patient's historical questionnaire dataset is obtained from the patient side of the internet hospital.
In the embodiment of the application, the training data set is used for training the classification model, so that the classification model can meet the set requirement; in order to ensure the classification effect of the classification model, after the historical questionnaire data set of the patient is divided into a training data set and a test data set, the length of the training data set is usually greater than that of the test data set; illustratively, 80% of a patient's historical questionnaire dataset can be taken as the training dataset and 20% as the test dataset.
In some embodiments, the training of the classification model is supervised learning, i.e. for an input X there is an actual value Y corresponding thereto; here, the input X represents a feature vector of each patient in the training data set, and the actual value Y represents a disease category label corresponding to the feature vector of each patient. The loss function between the input X and the actual value Y of the classification model is the network back propagation, and the whole training process of the neural network is the process of continuously reducing the value of the loss function.
In the embodiment of the application, after the initial classification model is obtained through the training process, the feature vector of each patient in the test data set is input into the initial classification model, and the predicted disease category of each patient is output; and comparing the predicted disease category with the disease category label actually corresponding to the patient, further adjusting parameters in the initial classification model according to the comparison result, and taking the adjusted initial classification model as the trained classification model.
Therefore, the classification model is trained through the training data set, so that the classification result of the classification model can accurately reflect the corresponding disease types of different patients; the test data set is used for further verifying the performance of the classification model and ensuring the reliability of the classification result.
By way of example, the embodiments of the present application are not limited with respect to the type of classification model; for example, the KNN algorithm model may be used, and other machine learning models may be used.
Step 101: a plurality of articles is obtained, and a disease type label of each article in the plurality of articles is determined.
In the embodiment of the application, the article represents a teaching article related to patient education, and the teaching article comprises related contents such as propaganda of patient disease knowledge and attention matters of medicine taking; that is, the plurality of articles represent teaching articles for various disease types; illustratively, there may be educational articles in the disease categories of heart disease, diabetes, cancer, depression, and depression. Here, each article of the plurality of articles may include one or more articles related to the same category of disease.
In one embodiment, a plurality of articles may be obtained from a content kiosk; among them, the content staging represents a platform that manages multiple articles and pushes articles to different patients.
Illustratively, obtaining a plurality of articles may include: collecting articles of various disease types from an internet hospital; carrying out data cleaning operation on the collected articles of various disease types; therefore, illegal articles such as overdue articles, incomplete articles, repeated articles and the like can be filtered; the reliability of the article is ensured; storing the cleaned articles of various disease types in a content center station of an Internet hospital; further, a variety of articles are obtained from the contents of the internet hospital.
In some embodiments, determining the disease type label for each article of the plurality of articles may include: extracting keywords of various articles to obtain keywords of each article in the various articles; and performing labeling processing on each article based on the keywords of each article to obtain the disease type label of each article.
Exemplarily, after a plurality of articles are acquired from the content of the internet hospital, keyword extraction is performed on each article of the plurality of articles by using a keyword extraction algorithm to obtain a keyword of each article; here, the keyword means a word related to a disease type corresponding to each article; illustratively, the keyword extraction algorithm may be a Term Frequency-Inverse text Frequency index (TF-IDF) algorithm.
For example, in determining whether a word in an article is a keyword, not only the occurrence frequency of the word but also whether the word is a common word need to be considered; therefore, words need to be weighted, and very common words which appear frequently in some articles but appear rarely in other articles are given a higher weight; common words that appear in multiple articles are given less weight; wherein the weights may be determined according to a TF-IDF algorithm.
Exemplarily, in the process of extracting keywords from each article by using the TF-IDF algorithm, firstly, performing data preprocessing operations such as word segmentation, part-of-speech tagging and stop word removal on each article to obtain a plurality of candidate keywords; and then calculating the word frequency and the inverse text frequency index of each candidate keyword, determining the weight of each candidate keyword through the product of the word frequency and the inverse text frequency index, sequencing each candidate keyword according to the weight, and taking a plurality of candidate keywords ranked at the front as the keywords of each article.
In the embodiment of the application, the keyword extracted according to the TF-IDF algorithm can reflect the disease type of each article; therefore, labeling processing is carried out on each article according to the keywords, and each article is classified into an article corresponding to the disease type label; here, if a certain article cannot be classified by the keyword, the article is classified into another type of article. That is, each article of the plurality of articles has a corresponding disease type label.
Illustratively, the two steps of step 100 and step 101 are independent of each other and can be executed according to practical situations, and the execution order is not limited in the embodiments of the present application.
Step 102: the articles recommended to the target patient are determined based on the disease category of the target patient and the disease type label of each article.
In the present embodiment, the disease category of the target patient can be determined according to step 100; according to step 101, disease type labels of each article in the content console can be determined; by matching the disease category of the target patient with the disease type label of the article, the article related to the disease category of the target patient can be recommended to the target patient in a targeted manner, and the user experience is improved.
In one embodiment, fig. 1b is a schematic diagram of article recommendation performed in the embodiment of the present application, and as shown in fig. 1b, when it is determined that the disease category of the target patient is heart disease, the content center station recommends a heart disease-related article to the target patient; recommending a depression-related article to the target patient by the content center station when the disease category of the target patient is determined to be depression; recommending the diabetes-related article to the target patient by the content center station when the disease category of the target patient is determined to be diabetes; recommending the cancer-related article to the target patient by the content center station when the disease category of the target patient is determined to be cancer; the content center station recommends other types of articles which are not classified to target patients with different disease categories.
The embodiment of the application provides an article recommendation method, an article recommendation device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring historical questionnaire data of a target patient; obtaining the disease category of the target patient according to the historical questionnaire data of the target patient; acquiring a plurality of articles, and determining a disease type label of each article in the plurality of articles; determining articles recommended to the target patient based on the disease category of the target patient and the disease type label of each article; thus, the corresponding disease category is determined according to the historical inquiry sheet data of the target patient, and the article is pertinently recommended to the target patient by combining the disease type label of each article in the multiple articles; in other words, by combining the information of the target patient dimension and the article dimension, the problem that the article is not matched with the patient when the article is recommended to the patient can be solved, and the accuracy of recommending the article to the patient is improved.
In order to further embody the purpose of the present application, a model of the KNN algorithm is taken as an example to further illustrate the present application on the basis of the above-described embodiments.
The KNN algorithm is one of typical algorithms in machine learning, which classifies by measuring distances between different feature values. For any n-dimensional input feature vector, the feature vector corresponds to a sample point in the feature space, and the output is a category label or a predicted value corresponding to the feature vector.
Fig. 2a is a schematic diagram of the KNN algorithm-based determination of the disease category label of the patient in the embodiment of the present application, as shown in fig. 2a, ω 1, ω 2, and ω 3 represent three disease categories in the training data set, respectively; if the value of the hyperparameter K is 5, the most category of the 5 points closest to x is omega 1; here, X denotes that the feature vector X of the patient corresponds to a sample point in the feature space. As can be seen, the KNN algorithm predicts the disease category of the patient as ω 1.
In the following, the KNN algorithm is briefly explained:
the input data includes: training data set T { (x)1,y1),(x2,y2),...,(xi,yi),...,(xN,yN) }; wherein x isi∈x∈RnIs a feature vector of dimension n; y isi∈Y={c1,c2,...,cMThe category label for each feature vector; i-1, 2, … N, N representing the amount of data in the training data set; inputting each n-dimensional characteristic vector in the test data set, and outputting data as the prediction category of each characteristic vector in the test data set; if the patient is taken as an example for explanation, N represents the number of patients in the training data set, and the input further comprises the feature vector of each patient in the test data set; the output is the predicted disease category for each patient in the test data set.
When the KNN algorithm is used for classification, two parameters, namely a hyper-parameter K value and measurement of the distance between sample points, need to be customized.
In the KNN algorithm process, a hyper-parameter K needs to be set, and the influence of the selection of the K value on the classification result is significant. When the K value is too small, only the sample points with a short distance will affect the classification result, the classification result will be very sensitive to the neighboring points, and if the neighboring points are noise points, the classification result will be erroneous, thereby causing the over-fitting situation of the classification model. When the K value is selected too large, a small number of noise points will not affect the classification result, the robustness of the classification model is significantly improved, but sample points with longer distances will also affect the classification result, thereby causing the classification model to generate an under-fitting condition. Therefore, a cross-validation method is required to select the K value, so that the situations of over-fitting and under-fitting are prevented.
The distance measure between two sample points in the feature space represents the similarity degree between the two sample points, and the shorter the distance is, the higher the similarity degree is; conversely, the lower the degree of similarity. For two sample points in the n-dimensional feature space: w (w1, w2, …, wn) and v (v1, v2, …, vn); the Minkowski distance between w and v can be expressed as equation (1):
Figure BDA0002983846430000121
wherein p is a variable parameter; when p is 1, it is called manhattan distance; when p is 2, it is called euclidean distance; when p → ∞, it is called the chebyshev distance.
In one embodiment, the step of determining the disease category of the target patient using the KNN algorithm is performed as follows:
firstly, aiming at target patient sample points which are not classified by disease categories, the Euclidean distances between the target patient sample points and other sample points in a training data set are calculated; then, sorting the distances according to an increasing order; then, K sample points which are most similar to the target patient sample point are selected, and the disease category of the target patient sample point is determined from the K sample points according to a majority voting principle. Repeating the above procedures to obtain the disease classification result of each patient.
Fig. 2b is a schematic flow chart of another article recommendation method in the embodiment of the present application, and as shown in fig. 2b, the flow chart includes the following steps:
step A1: and (6) data acquisition.
In one embodiment, historical questionnaire data and articles for various disease types for each patient are collected from an internet hospital.
Step A2: and (6) data cleaning.
In one embodiment, the data cleansing operation is performed on historical questionnaire data and articles of various disease types collected for each patient.
Step A3: and (4) storing data.
In one embodiment, the washed historical questionnaire data and articles for each patient are stored locally; here, the history questionnaire data of each patient after washing is stored in the patient side of the internet hospital, and the articles of each disease type after washing are stored in the content center of the internet hospital.
Step A4: the disease category of the target patient is determined.
In one embodiment, a KNN algorithm based model determines a disease category of a target patient; firstly, acquiring a historical questionnaire data set of a patient from a patient side of an Internet hospital, and classifying the patient into a plurality of clusters of different disease categories; then, for target patient sample points which are not classified by disease categories, the Euclidean distance between the target patient sample points and each sample point in the clusters of different disease categories in the training data set is calculated; and finally, selecting K sample points which are close to the target patient sample point, and determining the disease category of the target patient sample point from the K sample points according to a majority voting principle.
Step A5: a disease type label for the article is determined.
In one embodiment, articles of various disease types are obtained from the content of an internet hospital; the TF-IDF algorithm can be used to determine the disease type label for each article; extracting keywords from articles with different disease types; and performing labeling processing on each article according to the keywords, and classifying each article into an article corresponding to the disease type label.
Step A6: and (6) article recommendation.
In one embodiment, article recommendations may be implemented by matching the disease category of the target patient to the disease type tags of the articles.
Therefore, the historical inquiry sheet data of different patients at the patient end are subjected to cluster division based on the KNN algorithm model in the embodiment of the application; in the content center platform, articles of different disease types are extracted and subjected to labeling processing based on TF-IDF keywords, and article recommendation is carried out on patients by integrating double-end information.
Fig. 3 is a schematic structural diagram of an article recommendation device according to an embodiment of the present application, and as shown in fig. 3, the device includes: a first obtaining module 300, a second obtaining module 301 and a recommending module 302, wherein:
a first obtaining module 300, configured to obtain historical questionnaire data of a target patient; obtaining the disease category of the target patient according to the historical questionnaire data of the target patient;
a second obtaining module 301, configured to obtain multiple articles, and determine a disease type tag of each article in the multiple articles;
a recommending module 302 for determining articles recommended to the target patient based on the disease category of the target patient and the disease type label of each article.
In some embodiments, the first obtaining module 300 is configured to obtain a disease category of the target patient according to the historical questionnaire data of the target patient, and includes:
marking the ordering category frequency of the target patient according to the historical inquiry sheet data of the target patient to obtain the characteristic vector of the target patient; the order type frequency represents the frequency of different types of inquiry lists downloaded by the target patient in a set time period;
and inputting the characteristic vector of the target patient into a pre-trained classification model to obtain the disease category of the target patient.
In some embodiments, the first obtaining module 300 is configured to mark the order category frequency of the target patient according to the historical questionnaire data of the target patient, and obtain the feature vector of the target patient, including:
marking the ordering category frequency of the target patient according to the historical inquiry sheet data of the target patient and the preset disease category;
accumulating the feature values of the first disease category in case the target patient places an order for the first disease category; the first disease category is any one of preset disease categories;
in the case where the target patient has not placed an order for the first disease category, the eigenvalues of the first disease category remain unchanged;
and obtaining a feature vector of the target patient based on the feature value of each disease category corresponding to the target patient.
In some embodiments, the classification model is trained by:
acquiring a historical inquiry sheet data set of a patient, and dividing the historical inquiry sheet data set into a training data set and a testing data set; the training data set includes: historical questionnaire data for a plurality of patients and disease category labels for the plurality of patients;
according to historical questionnaire data of a plurality of patients, marking the ordering category frequency of the plurality of patients to obtain feature vectors of the plurality of patients;
training the classification model through the feature vectors of a plurality of patients and the disease category labels to obtain an initial classification model;
and adjusting the initial classification model according to the test data set to obtain a trained classification model.
In some embodiments, the classification model is a model of a K-nearest neighbor algorithm.
In some embodiments, the second obtaining module 301, configured to determine the disease type tag of each article of the plurality of articles, includes:
extracting keywords of various articles to obtain keywords of each article in the various articles;
and performing labeling processing on each article based on the keywords of each article to obtain the disease type label of each article.
In practical applications, the first obtaining module 300, the second obtaining module 301 and the recommending module 302 may be implemented by a processor located in an electronic device, and the processor may be at least one of an ASIC, a DSP, a DSPD, a PLD, an FPGA, a CPU, a controller, a microcontroller and a microprocessor.
In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the related art, or all or part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Specifically, the computer program instructions corresponding to an article recommendation method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, or a usb disk, and when the computer program instructions corresponding to an article recommendation method in the storage medium are read or executed by an electronic device, the article recommendation method in any of the foregoing embodiments is implemented.
Based on the same technical concept of the foregoing embodiment, referring to fig. 4, it shows an electronic device 400 provided in the embodiment of the present application, which may include: a memory 401 and a processor 402; wherein the content of the first and second substances,
a memory 401 for storing computer programs and data;
a processor 402 for executing the computer program stored in the memory to implement any one of the article recommendation methods of the previous embodiments.
In practical applications, the memory 401 may be a volatile memory (RAM); or a non-volatile memory (non-volatile memory) such as a ROM, a flash memory (flash memory), a Hard Disk (HDD), or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 402.
The processor 402 may be at least one of an ASIC, a DSP, a DSPD, a PLD, an FPGA, a CPU, a controller, a microcontroller, and a microprocessor. It is understood that, for different article recommendation devices, the electronic device for implementing the above processor function may be other, and the embodiments of the present application are not limited in particular.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present application may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.
The methods disclosed in the method embodiments provided by the present application can be combined arbitrarily without conflict to obtain new method embodiments.
Features disclosed in various product embodiments provided by the application can be combined arbitrarily to obtain new product embodiments without conflict.
The features disclosed in the various method or apparatus embodiments provided herein may be combined in any combination to arrive at new method or apparatus embodiments without conflict.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims (10)

1. An article recommendation method, comprising:
acquiring historical questionnaire data of a target patient; obtaining the disease category of the target patient according to the historical questionnaire data of the target patient;
acquiring a plurality of articles, and determining a disease type label of each article in the articles;
determining recommended articles to the target patient based on the disease category of the target patient and the disease type label of each article.
2. The method of claim 1, wherein the deriving the disease category of the target patient from historical questionnaire data of the target patient comprises:
marking the ordering category frequency of the target patient according to the historical inquiry sheet data of the target patient to obtain the feature vector of the target patient; the ordering category frequency represents the frequency of different categories of questionnaires downloaded by the target patient within a set time period;
and inputting the feature vector of the target patient into a pre-trained classification model to obtain the disease category of the target patient.
3. The method of claim 2, wherein the step of marking the frequency of ordering categories of the target patient according to the historical questionnaire data of the target patient to obtain the feature vector of the target patient comprises:
marking the ordering category frequency of the target patient according to the historical inquiry sheet data of the target patient and the preset disease category;
accumulating feature values of a first disease class if the target patient places an order for the first disease class; the first disease category is any one of the preset disease categories;
in the event that the target patient is not ordered for the first disease category, the characteristic value of the first disease category remains unchanged;
and obtaining a feature vector of the target patient based on the feature value of each disease category corresponding to the target patient.
4. The method of claim 2, wherein the classification model is trained by:
acquiring a historical inquiry list data set of a patient, and dividing the historical inquiry list data set into a training data set and a testing data set; the training data set includes: historical questionnaire data for a plurality of patients and disease category labels for the plurality of patients;
according to the historical questionnaire data of the patients, marking the ordering category frequency of the patients to obtain the feature vectors of the patients;
training a classification model through the feature vectors of the patients and the disease category labels to obtain an initial classification model;
and adjusting the initial classification model according to the test data set to obtain a trained classification model.
5. The method according to any one of claims 1 to 4, wherein the classification model is a model of a K-nearest neighbor algorithm.
6. The method of claim 1, wherein determining the disease type label for each of the plurality of articles comprises:
extracting keywords of the articles to obtain keywords of each article in the articles;
and performing labeling processing on each article based on the keywords of each article to obtain the disease type label of each article.
7. An article recommendation apparatus, comprising:
the first acquisition module is used for acquiring historical questionnaire data of a target patient; obtaining the disease category of the target patient according to the historical questionnaire data of the target patient;
the second acquisition module is used for acquiring a plurality of articles and determining a disease type label of each article in the articles;
and the recommending module is used for determining the articles recommended to the target patient based on the disease category of the target patient and the disease type label of each article.
8. The apparatus of claim 7, wherein the first obtaining module is configured to obtain the disease category of the target patient according to the historical questionnaire data of the target patient, and comprises:
marking the ordering category frequency of the target patient according to the historical inquiry sheet data of the target patient to obtain the feature vector of the target patient;
and inputting the feature vector of the target patient into a pre-trained classification model to obtain the disease category of the target patient.
9. An electronic device, characterized in that the device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, which when executing the program implements the method of any of claims 1 to 6.
10. A computer storage medium on which a computer program is stored, characterized in that the computer program realizes the method of any one of claims 1 to 6 when executed by a processor.
CN202110294595.6A 2021-03-19 2021-03-19 Article recommendation method and device, electronic equipment and computer storage medium Pending CN113761351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110294595.6A CN113761351A (en) 2021-03-19 2021-03-19 Article recommendation method and device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110294595.6A CN113761351A (en) 2021-03-19 2021-03-19 Article recommendation method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN113761351A true CN113761351A (en) 2021-12-07

Family

ID=78786762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110294595.6A Pending CN113761351A (en) 2021-03-19 2021-03-19 Article recommendation method and device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN113761351A (en)

Similar Documents

Publication Publication Date Title
JP7330372B2 (en) A system that collects and identifies skin symptoms from images and expertise
Gunarathne et al. Performance evaluation on machine learning classification techniques for disease classification and forecasting through data analytics for chronic kidney disease (CKD)
CN111639516B (en) Analysis platform based on machine learning
CN110995459B (en) Abnormal object identification method, device, medium and electronic equipment
Mozetič et al. How to evaluate sentiment classifiers for Twitter time-ordered data?
CN112818218B (en) Information recommendation method, device, terminal equipment and computer readable storage medium
US20200074242A1 (en) System and method for monitoring online retail platform using artificial intelligence
CN112380449B (en) Information recommendation method, model training method and related device
Shanthini et al. A taxonomy on impact of label noise and feature noise using machine learning techniques
CN113435202A (en) Product recommendation method and device based on user portrait, electronic equipment and medium
CN111881671A (en) Attribute word extraction method
JP6334431B2 (en) Data analysis apparatus, data analysis method, and data analysis program
CN114610865A (en) Method, device and equipment for recommending recalled text and storage medium
CN111582932A (en) Inter-scene information pushing method and device, computer equipment and storage medium
Pham et al. Predicting hospital readmission patterns of diabetic patients using ensemble model and cluster analysis
CN113722507B (en) Hospitalization cost prediction method and device based on knowledge graph and computer equipment
CN117056575B (en) Method for data acquisition based on intelligent book recommendation system
CN113886697A (en) Clustering algorithm based activity recommendation method, device, equipment and storage medium
CN117829122A (en) Text similarity model training method, device and medium based on conditions
CN112328881A (en) Article recommendation method and device, terminal device and storage medium
CN113822390B (en) User portrait construction method and device, electronic equipment and storage medium
Patil et al. Machine Learning for Sentiment Analysis and Classification of Restaurant Reviews
CN113761351A (en) Article recommendation method and device, electronic equipment and computer storage medium
Nguyen et al. A model of convolutional neural network combined with external knowledge to measure the question similarity for community question answering systems
CN114817741A (en) Financial product accurate recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination