CN112951405A - Method, device and equipment for realizing feature sorting - Google Patents

Method, device and equipment for realizing feature sorting Download PDF

Info

Publication number
CN112951405A
CN112951405A CN202110104867.1A CN202110104867A CN112951405A CN 112951405 A CN112951405 A CN 112951405A CN 202110104867 A CN202110104867 A CN 202110104867A CN 112951405 A CN112951405 A CN 112951405A
Authority
CN
China
Prior art keywords
symptom
feature
characteristic
disease
importance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110104867.1A
Other languages
Chinese (zh)
Other versions
CN112951405B (en
Inventor
何峻青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202110104867.1A priority Critical patent/CN112951405B/en
Publication of CN112951405A publication Critical patent/CN112951405A/en
Application granted granted Critical
Publication of CN112951405B publication Critical patent/CN112951405B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Primary Health Care (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Pathology (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The embodiment of the application discloses a method, a device and equipment for realizing feature selection, which are used for determining a first symptom feature and a feature value of the first symptom feature by acquiring a current symptom feature set of a user. Obtaining conditional probabilities of the first symptom feature and the second symptom feature under each disease condition from the medical knowledge-graph, calculating a global significance of the second symptom feature, and/or calculating a discriminatory significance of the second symptom feature. And finally, determining the importance of the second symptom characteristic according to the global importance and/or the identification importance of the second symptom characteristic, sequencing the second symptom characteristic, and selecting the second symptom characteristic meeting preset conditions from the sequencing result as a candidate symptom characteristic. Thus, the correlation degree among the symptom characteristics of the inquiry can be enhanced, the calculation complexity for determining the second symptom characteristic is reduced, the inquiry frequency required for determining the predicted disease is reduced, and the inquiry efficiency of the symptom characteristics is improved.

Description

Method, device and equipment for realizing feature sorting
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a device for implementing feature sorting.
Background
The automatic inquiry system can acquire the symptom characteristics of the user through multi-polling, so that the disease which the user may suffer from is predicted according to the symptom characteristics input by the user, and the user can carry out subsequent medical treatment or treatment according to the predicted disease.
When the automatic inquiry system inquires, the historical symptom characteristics input by the user are utilized to select the symptom characteristics needing to be determined for inquiry. At present, the method for selecting the symptom characteristics needing to be determined in the automatic inquiry system is complex, the correlation between the symptom characteristics inquired continuously twice is low, and the user experience is poor. Moreover, only one symptom can be inquired each time, and the disease can be determined and predicted only by inquiring more times, so that the disease determining and predicting efficiency is low.
Disclosure of Invention
In view of this, embodiments of the present application provide a method, an apparatus, and a device for implementing feature sorting, which can determine and predict a disease by using fewer queries, and improve query efficiency of symptom features.
In order to solve the above problem, the technical solution provided by the embodiment of the present application is as follows:
a method of enabling feature selection, the method comprising:
acquiring a current symptom feature set of a user, wherein the current symptom feature set of the user comprises at least one first symptom feature and feature values corresponding to the first symptom features; the characteristic value corresponding to the first symptom characteristic is used for indicating whether the first symptom characteristic appears or not;
obtaining from the medical knowledge map a conditional probability of a first symptom feature under the condition of each disease and a conditional probability of a second symptom feature under the condition of each disease; the second symptom characteristic is any symptom characteristic in the medical knowledge-map that excludes the first symptom characteristic;
calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease; and/or calculating the identification importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under each disease condition, the characteristic value corresponding to each first symptom characteristic and the conditional probability of the second symptom characteristic under each disease condition;
determining the importance of the second symptom feature based on the global importance and/or the discriminatory importance of the second symptom feature;
and sequencing all the second symptom characteristics with the importance according to the importance, and determining the second symptom characteristics with the sequencing meeting the preset conditions as candidate symptom characteristics.
In one possible implementation, the calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease includes:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
adding the conditional probabilities of the second symptom features under the conditions of the respective associated diseases to obtain the global importance of the second symptom features.
In one possible implementation, the calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease includes:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
respectively taking the conditional probability of the second symptom characteristic under each condition of the associated diseases as a variable of an objective function to obtain a function value of the objective function; the objective function is monotonically increased in an interval with the variable larger than zero, and the amplitude of the function value is increased along with the increase of the variable;
and adding the function values to obtain the global importance of the second symptom characteristic.
In one possible implementation, the method further includes:
acquiring a disease prediction rule from the medical knowledge graph, wherein the disease prediction rule comprises a corresponding relation between a condition and an action, the condition is a target characteristic value corresponding to a target symptom characteristic, and the action is used as a processing mode of an evaluation value of a corresponding disease;
determining disease prediction rules including the second symptom features as target disease prediction rules, calculating reciprocal values of the number of the target symptom features included in each target disease prediction rule, and adding the reciprocal values to obtain a rule evaluation value of the second symptom features;
and adding the obtained global importance of the second symptom characteristic to the rule evaluation value of the second symptom characteristic to obtain the updated global importance of the second symptom characteristic.
In one possible implementation manner, the calculating the discriminatory importance of the second symptom feature according to the conditional probability of the first symptom feature under the condition of each disease, the feature value corresponding to each first symptom feature, and the conditional probability of the second symptom feature under the condition of each disease includes:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
calculating a probability value of a target associated disease under the condition of the current symptom feature set of the user according to the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature; the target associated disease is each of the associated diseases;
multiplying the probability value of the target associated disease under the condition of the current symptom feature set of the user by the uncertainty of the second symptom feature to the target associated disease to obtain a discrimination importance coefficient of the second symptom feature to the target associated disease; the uncertainty of the second symptom characteristic for the target associated disease is calculated according to the conditional probability of the second symptom characteristic under the condition of the target associated disease;
and adding the identification importance coefficients of the second symptom characteristics to the related diseases to obtain a first summation result, and calculating the difference between 1 and the first summation result to obtain the identification importance of the second symptom characteristics.
In a possible implementation manner, the calculating, according to the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature, a probability value of a target associated disease under the condition of a current symptom feature set of a user includes:
when the feature value corresponding to the target first symptom feature is 1, determining the conditional probability of the target first symptom feature under the condition of the target associated disease as the probability value of the target first symptom feature; the target first symptom characteristic is each of the first symptom characteristics;
when the feature value corresponding to the target first symptom feature is 0, determining the difference between 1 and the conditional probability of the target first symptom feature under the condition of the target associated disease as the probability value of the target first symptom feature;
multiplying the probability values of the first symptom characteristics to obtain a pseudo posterior probability value of the target associated disease under the condition of the current symptom characteristic set of the user;
calculating the sum of the pseudo posterior probability values of the associated diseases under the condition of the current symptom feature set of the user to obtain a second summation result;
and dividing the pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user by the second summation result to obtain the probability value of the target associated disease under the condition of the current symptom feature set of the user.
In one possible implementation, the determining the importance of the second symptom feature according to the global importance and/or the differential importance of the second symptom feature includes:
and weighting and summing the global importance of the second symptom characteristic and the identification importance to determine the importance of the second symptom characteristic.
An apparatus to enable feature selection, the apparatus comprising:
the system comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a user current symptom characteristic set, and the user current symptom characteristic set comprises at least one first symptom characteristic and characteristic values corresponding to the first symptom characteristics; the characteristic value corresponding to the first symptom characteristic is used for indicating whether the first symptom characteristic appears or not;
a second acquisition unit configured to acquire, from the medical knowledge map, a conditional probability of the first symptom feature under the condition of each disease and a conditional probability of the second symptom feature under the condition of each disease; the second symptom characteristic is any symptom characteristic in the medical knowledge-map that excludes the first symptom characteristic;
a first calculation unit configured to calculate a global importance of a second symptom feature on the basis of the conditional probability of the second symptom feature under the condition of each disease; and/or calculating the identification importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under each disease condition, the characteristic value corresponding to each first symptom characteristic and the conditional probability of the second symptom characteristic under each disease condition;
a first determination unit for determining the importance of the second symptom feature according to the global importance and/or the differential importance of the second symptom feature;
and the sorting unit is used for sorting the second symptom characteristics with the importance according to the importance, and determining the second symptom characteristics with the sorting meeting the preset conditions as candidate symptom characteristics.
In a possible implementation manner, the first calculating unit is specifically configured to determine, as the associated disease, a disease associated with the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease;
adding the conditional probabilities of the second symptom features under the conditions of the respective associated diseases to obtain the global importance of the second symptom features.
In a possible implementation manner, the first calculating unit is specifically configured to determine, as the associated disease, a disease associated with the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease;
respectively taking the conditional probability of the second symptom characteristic under each condition of the associated diseases as a variable of an objective function to obtain a function value of the objective function; the objective function is monotonically increased in an interval with the variable larger than zero, and the amplitude of the function value is increased along with the increase of the variable;
and adding the function values to obtain the global importance of the second symptom characteristic.
In one possible implementation, the apparatus further includes:
a third obtaining unit, configured to obtain a disease prediction rule from the medical knowledge graph, where the disease prediction rule includes a correspondence between a condition and an action, where the condition is a target feature value corresponding to a target symptom feature, and the action is a processing manner of an evaluation value for a corresponding disease;
a second calculation unit configured to determine a disease prediction rule including the second symptom feature as a target disease prediction rule, calculate a reciprocal value of the number of target symptom features included in each target disease prediction rule, and add the respective reciprocal values to obtain a rule evaluation value of the second symptom feature;
and the third calculating unit is used for adding the obtained global importance of the second symptom characteristic and the rule evaluation value of the second symptom characteristic to obtain the updated global importance of the second symptom characteristic.
In one possible implementation manner, the first computing unit includes:
a first determining subunit configured to determine, as an associated disease, a disease associated with the second symptom feature, based on the conditional probability of the second symptom feature under the condition of each disease;
the first calculating subunit is used for calculating a probability value of a target associated disease under the condition of the current symptom feature set of the user according to the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature; the target associated disease is each of the associated diseases;
the second calculating subunit is used for multiplying the probability value of the target associated disease under the condition of the current symptom feature set of the user by the uncertainty of the second symptom feature to the target associated disease to obtain the identification importance coefficient of the second symptom feature to the target associated disease; the uncertainty of the second symptom characteristic for the target associated disease is calculated according to the conditional probability of the second symptom characteristic under the condition of the target associated disease;
and the third calculating subunit is used for adding the identification importance coefficients of the second symptom features to the associated diseases to obtain a first summation result, and calculating the difference between 1 and the first summation result to obtain the identification importance of the second symptom features.
In one possible implementation manner, the first computing subunit includes:
the second determining subunit is used for determining the conditional probability of the target first symptom characteristic under the condition of the target associated disease as the probability value of the target first symptom characteristic when the characteristic value corresponding to the target first symptom characteristic is 1; the target first symptom characteristic is each of the first symptom characteristics;
a third determining subunit, configured to determine, when the feature value corresponding to the target first symptom feature is 0, a difference between 1 and a conditional probability of the target first symptom feature under the condition of the target-related disease as a probability value of the target first symptom feature;
the fourth calculating subunit is used for multiplying the probability values of the first symptom features to obtain a pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user;
the fifth calculating subunit is used for calculating the sum of the pseudo posterior probability values of the associated diseases under the condition of the current symptom feature set of the user to obtain a second summation result;
and the sixth calculating subunit is used for dividing the pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user by the second summation result to obtain the probability value of the target associated disease under the condition of the current symptom feature set of the user.
In a possible implementation manner, the first determining unit is specifically configured to perform weighted summation on the global importance and the discrimination importance of the second symptom feature to determine the importance of the second symptom feature.
An apparatus for implementing feature selection comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors, the one or more programs including instructions for:
acquiring a current symptom feature set of a user, wherein the current symptom feature set of the user comprises at least one first symptom feature and feature values corresponding to the first symptom features; the characteristic value corresponding to the first symptom characteristic is used for indicating whether the first symptom characteristic appears or not;
obtaining from the medical knowledge map a conditional probability of a first symptom feature under the condition of each disease and a conditional probability of a second symptom feature under the condition of each disease; the second symptom characteristic is any symptom characteristic in the medical knowledge-map that excludes the first symptom characteristic;
calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease; and/or calculating the identification importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under each disease condition, the characteristic value corresponding to each first symptom characteristic and the conditional probability of the second symptom characteristic under each disease condition;
determining the importance of the second symptom feature based on the global importance and/or the discriminatory importance of the second symptom feature;
and sequencing all the second symptom characteristics with the importance according to the importance, and determining the second symptom characteristics with the sequencing meeting the preset conditions as candidate symptom characteristics.
A computer-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the above-described method of implementing feature selection.
Therefore, the embodiment of the application has the following beneficial effects:
according to the method, the device and the equipment for realizing feature selection, the first symptom feature and the feature value representing whether the first symptom feature appears are determined by acquiring the current symptom feature set of the user. The conditional probabilities of the first symptom feature under each disease condition and the conditional probabilities of the second symptom feature under each disease condition are obtained from the medical knowledge map. The global importance of the second symptom feature is calculated using the conditional probability of the second symptom feature under each disease condition. Global importance may reflect the correlation between the second symptom characteristic and the disease. Introducing global importance takes into account the degree of correlation between symptom features and candidate diseases, and can improve semantic correlation between queried symptom features. And/or calculating the discriminatory importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under the condition of each disease, the characteristic value corresponding to each first symptom characteristic, and the conditional probability of the second symptom characteristic under the condition of each disease. The significance of the discrimination may reflect the ability of the second symptom characteristic to distinguish between diseases. The calculation amount for calculating the identification importance provided by the embodiment of the application is small, the calculation result of the identification importance can be quickly obtained, and the calculation efficiency for determining the second symptom feature sequence is improved. Finally, the importance of the second symptom characteristic is determined according to the global importance and/or the identification importance of the second symptom characteristic. And finally, sorting the second symptom features according to the importance of each obtained second symptom feature, and selecting the second symptom features meeting preset conditions from sorting results as candidate symptom features. Therefore, the symptom characteristics of inquiry with continuous semantics can be obtained, the calculation complexity for determining the second symptom characteristic is reduced, the inquiry frequency required for determining and predicting diseases is reduced, and the inquiry efficiency of the symptom characteristics is improved.
Drawings
Fig. 1 is a schematic diagram of a framework of an exemplary application scenario provided in an embodiment of the present application;
fig. 2 is a flowchart of a method for implementing feature selection according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an apparatus for implementing feature selection according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of another apparatus for implementing feature selection according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the drawings are described in detail below.
In order to facilitate understanding and explaining the technical solutions provided by the embodiments of the present application, the following description will first describe the background art of the present application.
After the inventor researches the traditional automatic inquiry system, the inventor finds that in the traditional automatic inquiry system, the differentiation of symptom characteristics is calculated by using information entropy, and the symptom characteristics to be inquired are sorted by using the differentiation of the symptom characteristics. The symptom feature ordering method based on the information entropy needs a large amount of calculation, and the calculation complexity is high, so that the efficiency of determining the symptom feature ordering is low. In addition, ranking symptom features using only discriminativity, the resulting association between symptom features for query is poor. For example, when the symptom characteristic of the user is acquired as "cough", the symptom characteristic determined to be the most discriminative may be "vomiting". However, the correlation between "vomiting" and "cough" is not strong, and this easily causes semantic jumps in the symptom characteristics for inquiry, thereby degrading the user experience. Moreover, limited by the sorting method of symptom characteristics, the conventional automatic inquiry system can only inquire one symptom characteristic every time the inquiry of the symptom characteristics is performed, and can determine the predicted disease only by inquiring for many times, so that the efficiency of determining the predicted disease is low.
Based on this, the embodiment of the present application provides a method for implementing feature selection, which first determines a first symptom feature and a feature value indicating whether the first symptom feature appears by obtaining a current symptom feature set of a user. The conditional probabilities of the first symptom feature under each disease condition and the conditional probabilities of the second symptom feature under each disease condition are obtained from the medical knowledge map. The global importance of the second symptom feature is calculated using the conditional probability of the second symptom feature under each disease condition. Global importance may reflect the correlation between the second symptom characteristic and the disease. And/or calculating the discriminatory importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under the condition of each disease, the characteristic value corresponding to each first symptom characteristic, and the conditional probability of the second symptom characteristic under the condition of each disease. The calculation amount for calculating the identification importance is small, the calculation result of the identification importance can be quickly obtained, and the calculation efficiency for determining the second symptom feature sequence is improved. Finally, the importance of the second symptom characteristic is determined according to the global importance and/or the identification importance of the second symptom characteristic. The global importance can be utilized to enhance the degree of correlation between symptom features and candidate diseases and improve the semantic correlation between the symptom features. And finally, sorting the second symptom features according to the importance of each obtained second symptom feature, and selecting the second symptom features meeting preset conditions from sorting results as candidate symptom features. Therefore, the calculation complexity for determining the second symptom characteristic can be reduced, the symptom characteristic which is queried continuously in semantic meaning can be obtained, the query frequency required for determining and predicting diseases is reduced, and the query efficiency of the symptom characteristic is improved.
Referring to fig. 1, the figure is a schematic diagram of a framework of an exemplary application scenario provided in an embodiment of the present application. The method for implementing feature selection provided in the embodiment of the present application may be applied to the server 20.
In practical applications, the server 20 obtains the nth query result sent by the client 10, and obtains the current symptom feature set of the user from the nth query result. Wherein n is a positive integer of 1 or more. The current symptom characteristic set of the user comprises at least one first symptom characteristic and a characteristic value whether each first symptom characteristic is corresponding to or not. The server 20 acquires, from the medical knowledge map, a conditional probability of a first symptom feature under each disease condition and a conditional probability of a second symptom feature under each disease condition, wherein the second symptom feature is any symptom feature other than the first symptom feature. Then, the global importance of the second symptom feature is calculated using the conditional probability of the second symptom feature under each disease condition. And/or calculating the identification importance of the second symptom characteristic by using the conditional probability of the first symptom characteristic under the condition of each disease, the characteristic value corresponding to each first symptom characteristic and the conditional probability of the second symptom characteristic under the condition of each disease. The importance of the second symptom characteristic may be determined using the global importance and/or the discriminatory importance of the second symptom characteristic. And sorting the second symptom characteristics according to the importance of the second symptom characteristics, and selecting the second symptom characteristics meeting preset conditions to be determined as candidate symptom characteristics. The server 20 generates the query information of the (n + 1) th time according to the determined candidate symptom characteristics, transmits the query information of the (n + 1) th time to the client 10 for the client 10 to display, and receives the corresponding query result of the (n + 1) th time fed back by the client 10. The server 20 determines the symptom characteristic of the (n + 2) th query or determines the predicted disease based on the (n + 1) th received query result and the historical query result.
Those skilled in the art will appreciate that the block diagram shown in fig. 1 is only one example in which embodiments of the present application may be implemented. The scope of applicability of the embodiments of the present application is not limited in any way by this framework.
It is noted that client 10 may be any user device now existing, developing or later developed that is capable of interacting with each other through any form of wired and/or wireless connection (e.g., Wi-Fi, LAN, cellular, coaxial, etc.), including but not limited to: smart wearable devices, smart phones, non-smart phones, tablets, laptop personal computers, desktop personal computers, minicomputers, midrange computers, mainframe computers, and the like, either now in existence, under development, or developed in the future. The embodiments of the present application are not limited in any way in this respect. It should also be noted that the server 20 in the embodiment of the present application may be an example of an existing, developing or future developing device capable of providing an application service of information recommendation to a user. The embodiments of the present application are not limited in any way in this respect.
In order to facilitate understanding of the technical solutions provided by the embodiments of the present application, a method for implementing feature selection provided by the embodiments of the present application will be described below with reference to the accompanying drawings.
Referring to fig. 2, which is a flowchart of a method for implementing feature selection according to an embodiment of the present application, as shown in fig. 2, the method may include S201 to S205:
s201: acquiring a current symptom feature set of a user, wherein the current symptom feature set of the user comprises at least one first symptom feature and feature values corresponding to the first symptom features; the feature value corresponding to the first symptom feature characterizes whether the first symptom feature appears.
The user's current symptom feature set may be generated based on the obtained query results of the user. The current symptom characteristic set of the user comprises at least one first symptom characteristic and characteristic values corresponding to the first symptom characteristics. The first symptom characteristic is a symptom characteristic determined whether to appear through inquiry, and a characteristic value corresponding to the first symptom characteristic is used for representing whether the user appears the first symptom characteristic.
The value of the characteristic value corresponding to the first symptom characteristic may be 0 or 1. Wherein a characteristic value of "0" may indicate that the first symptom characteristic is not present, and a characteristic value of "1" may indicate that the first symptom characteristic is present. For example, when the query result fed back by the user has "cough occurred", the first symptom feature "cough" of the corresponding current symptom features corresponds to a feature value of "1". When the query result fed back by the user has "no fever", the first symptom characteristic "fever" of the corresponding current symptom characteristics corresponds to a characteristic value of "0".
The current symptom feature set of the user can be updated according to the inquiry result of the symptom features, the symptom features in the new inquiry result are determined as first symptom features, and feature values corresponding to the first symptom features are determined according to the inquiry result.
S202: obtaining from the medical knowledge map a conditional probability of a first symptom feature under the condition of each disease and a conditional probability of a second symptom feature under the condition of each disease; the second symptom characteristic is any symptom characteristic in the medical knowledge map that excludes the first symptom characteristic.
The medical knowledge map is established in advance based on medical knowledge, and the medical knowledge can be derived from medical books, medical articles, medical webpage information, doctor experience and the like. The medical knowledge map has conditional probabilities for each symptom feature under each disease condition. The conditional probability of each symptom feature under each disease condition specifically indicates the probability of the symptom feature appearing when the disease appears.
When the next inquiry of symptom characteristics is made, the unexcited symptom characteristics need to be selected. Any symptom feature excluding the first symptom feature among the symptom features in the medical knowledge map is taken as the second symptom feature.
Conditional probabilities of a first symptom feature under each disease and conditional probabilities of a second symptom feature under each disease are obtained from the medical knowledge map. The importance of the second symptom characteristic is calculated using the conditional probability of the first symptom characteristic for each disease and the conditional probability of the second symptom characteristic for each disease.
S203: calculating a global importance of the second symptom feature based on a conditional probability of the second symptom feature under the condition of each disease; and/or calculating the discriminatory importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under the condition of each disease, the characteristic value corresponding to each first symptom characteristic, and the conditional probability of the second symptom characteristic under the condition of each disease.
The global importance is used for expressing the degree of correlation between the second symptom characteristic and the disease, and represents the importance of the second symptom characteristic to the disease. The global importance of the second symptom feature is calculated using the conditional probability of the second symptom feature under the condition of each disease.
The present application provides three specific embodiments for calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease, which are described in detail below.
The identification significance is used for representing the identification of the second symptom characteristic, and reflecting the significance of the second symptom characteristic on distinguishing diseases. The identification significance is calculated by the conditional probability of the first symptom feature under the condition of each disease, the feature value corresponding to each first symptom feature, and the conditional probability of the second symptom feature under the condition of each disease.
The present application provides a specific implementation manner for calculating the identification importance of the second symptom feature according to the conditional probability of the first symptom feature under the condition of each disease, the feature value corresponding to each first symptom feature, and the conditional probability of the second symptom feature under the condition of each disease, which is described in detail below.
Global significance and discriminatory significance weigh the degree of significance of the second symptom feature from different aspects. The embodiment of the application does not limit the type of the calculated importance, and the global importance of the second symptom characteristic can be calculated, the identification importance of the second symptom characteristic can be calculated, and the global importance and the identification importance of the second symptom characteristic can be calculated.
S204: the importance of the second symptom characteristic is determined based on the global importance and/or the discriminatory importance of the second symptom characteristic.
The importance of the second symptom characteristic may be determined using the calculated global importance and/or discriminatory importance of the second symptom characteristic.
In one possible implementation, if the global importance of the second symptom feature is calculated, the importance of the second symptom feature is determined using the global importance of the second symptom feature.
In another possible implementation, if the discriminatory importance of the second symptom characteristic is calculated, the importance of the second symptom characteristic is determined using the discriminatory importance of the second symptom characteristic.
In another possible implementation, if the global significance and the differential significance of the second symptom feature are calculated, the significance of the second symptom feature is determined using the global significance and/or the differential significance of the second symptom feature.
The embodiments of the present application provide a specific implementation for determining the importance of the second symptom characteristic according to the global importance and/or the differential importance of the second symptom characteristic, please refer to the following.
S205: and sequencing all the second symptom characteristics with the importance according to the importance, and determining the second symptom characteristics with the sequencing meeting the preset conditions as candidate symptom characteristics.
And ranking the second symptom characteristics according to the importance of the obtained second symptom characteristics. In one possible implementation, the second symptom characteristic may be ranked by the magnitude of its importance.
And determining the second symptom characteristics which meet the preset conditions in the ranking of the second symptom characteristics as candidate symptom characteristics. The preset condition may be the number of preset candidate symptom features, or may be a threshold value for selecting the importance of the second symptom feature.
The determined candidate symptom feature is a candidate symptom feature for the next query to the user. The number of candidate symptom features is not limited in the embodiment of the application, and the number of candidate symptom features determined according to the preset condition may be one or more. When the number of the candidate symptom features is multiple, the query can be performed by using the multiple candidate symptom features, so that the query frequency is reduced, and the query efficiency is improved.
Based on the related contents of the above S201-S205, by calculating the global importance of the second symptom feature, the correlation between the second symptom feature and the disease can be enhanced, so that the selected candidate symptom feature has a strong semantic association with the symptom feature queried last time, thereby reducing the semantic leap between the queried symptom features and improving the user experience. In the embodiment of the application, the method is adopted to calculate the identification importance of the second symptom characteristic, so that the calculation complexity can be reduced, and the efficiency of the calculated identification importance can be improved. Therefore, the speed of determining and predicting the diseases according to the symptom characteristics can be increased, the number of inquiry rounds for determining and predicting the diseases is reduced, and the efficiency of determining and predicting the diseases is improved.
The calculation of the global importance of the second symptom feature is explained below.
In one possible implementation, in S203, calculating a global importance of the second symptom feature according to a conditional probability of the second symptom feature under the condition of each disease includes:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
the conditional probabilities of the second symptom features under the conditions of the respective associated diseases are added to obtain the global importance of the second symptom features.
Some diseases in the medical knowledge map are not related to the second symptom characteristic, and when the global importance of the second symptom characteristic is calculated, the disease associated with the second symptom characteristic needs to be determined.
The disease associated with the second symptom characteristic may be determined based on a conditional probability of the second symptom characteristic under the condition of the respective disease. For example, if the probability value of the conditional probability of the second symptom characteristic is greater under a condition of a disease, it indicates that the occurrence of the second symptom characteristic under the condition of the disease is a more probable event, and the association between the disease and the second symptom characteristic is greater. In one possible implementation, a threshold value of the conditional probability may be set, and if the conditional probability of the second symptom feature is greater than or equal to the threshold value under the condition of a disease, the disease is determined as the associated disease. In another possible implementation, the conditional probabilities of the second symptom features under the condition of each disease can be used for sorting, and the associated diseases are determined according to the sorting result.
By adding the conditional probabilities of the second symptom characteristic under the condition of each associated disease, a global importance for representing the degree of correlation between the second symptom characteristic and each associated disease can be obtained. If the global importance is greater, it indicates that the second symptom characteristic is more relevant to each associated disease. If the global significance is low, it indicates that the degree of correlation between the second symptom characteristic and each associated disease is low. When the second symptom features are ranked, the second symptom features related to the related diseases are ranked earlier, the selected candidate symptom features have stronger semantic consistency, and the relevance degree between the queried symptom features is improved.
For example, the second symptom is characterized by fiWherein i is more than or equal to 1 and less than or equal to a, i is a positive integer, and a is the number of second symptom characteristics. The associated disease in the medical knowledge map is djWherein j is more than or equal to 1 and less than or equal to b, j is a positive integer, and b is the number of all related diseases in the medical knowledge map. P (f)i|dj) Representing the conditional probability of the ith second symptom characteristic under the condition of the jth associated disease. Global importance S (f)i) May be as shown in equation (1):
S(fi)=∑jP(fi|dj) (1)
based on the above, the global importance obtained by adding the conditional probabilities of the second symptom characteristics under the conditions of the respective related diseases can represent the correlation between the second symptom characteristics and the respective related diseases. The candidate symptom characteristics determined based on the global importance have high correlation with the symptom characteristics queried last time, and the query continuity is achieved, so that the user experience can be improved.
In another possible implementation, the objective function may be used to further optimize the conditional probability value, so as to obtain the global importance more suitable for the data analysis.
Specifically, in S203, the calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease includes:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
respectively taking the conditional probability of the second symptom characteristic under the condition of each associated disease as a variable of the objective function to obtain a function value of the objective function; the objective function is monotonically increased in an interval with the variable larger than zero, and the amplitude of the function value is increased along with the increase of the variable;
and adding the function values to obtain the global importance of the second symptom characteristic.
The disease associated with the second symptom characteristic may be determined based on a conditional probability of the second symptom characteristic under the condition of the respective disease. In one possible implementation, a threshold value of the conditional probability may be set, and if the conditional probability of the second symptom feature is greater than or equal to the threshold value under the condition of a disease, the disease is determined as the associated disease. In another possible implementation, the conditional probabilities of the second symptom features under the condition of each disease can be used for sorting, and the associated diseases are determined according to the sorting result.
The objective function is a function satisfying a monotonous increase in an interval in which the variable is larger than zero, and an increase in the function value increases as the variable increases. The conditional probability of the second symptom characteristic under each associated disease condition can be subjected to numerical value conversion through the objective function, so that a function value more in line with data analysis is obtained, and the obtained overall importance can be conveniently utilized for data analysis.
And respectively taking the conditional probability of the second symptom characteristic under the condition of each associated disease as a variable of the objective function to obtain a function value of the corresponding objective function. And adding the function values to obtain the global importance of the second symptom characteristic.
As an example, if the objective function is g (x), the conditional probabilities of the second symptom characteristics under the conditions of the respective related diseases are respectively used as variables of the objective function, and the function value of the obtained objective function is g (P (f)i|dj)). Correspondingly, the global significance S (f) of the second symptom featurei) As shown in equation (2):
S(fi)=∑jg(P(fi|dj)) (2)
in the embodiment of the application, the target function is used for converting the numerical value of the conditional probability, so that the global importance which is more in line with the data analysis requirement can be obtained, and the second symptom characteristic can be conveniently sequenced by subsequently utilizing the global importance.
The global importance calculated using the conditional probability of the second symptom characteristic under the condition of each disease can reflect only the correlation between the second symptom characteristic and the disease. In addition, the medical knowledge map also has disease prediction rules for determining diseases corresponding to the symptom characteristics.
Further, an embodiment of the present application further provides an updating method of the global importance of the second symptom feature, and after the global importance of the second symptom feature is calculated by using any one of the above methods, the method may further include the following three steps a 1-A3:
a1: and acquiring a disease prediction rule from the medical knowledge map, wherein the disease prediction rule comprises a corresponding relation between conditions and actions, the conditions are target characteristic values corresponding to the target symptom characteristics, and the actions are processing modes of evaluation values of corresponding diseases.
And acquiring a disease prediction rule from the medical knowledge map, wherein the disease prediction rule is the knowledge that the disease diagnosis cannot be carried out by adopting the probability. The disease prediction rule includes a correspondence between a condition and an action. The condition is a target characteristic value corresponding to the target symptom characteristic and is used for indicating that the condition meets the disease prediction rule when the characteristic value of the target symptom characteristic is the target characteristic value. And the action is used as a processing mode of the evaluation value of the corresponding disease and is used for carrying out corresponding processing on the evaluation value of the corresponding disease on the basis of meeting the corresponding condition.
For example, when the condition in a disease prediction rule is "diarrhea ═ 0", it indicates that the condition of the disease prediction rule is satisfied when the user has the target symptom feature "diarrhea" in the current symptom feature set and the corresponding target feature value is "0". Correspondingly, the action of viral enteritis & lt-5 & gt corresponding to the condition in the disease prediction rule is utilized to carry out the treatment of subtracting 5 from the evaluation value on the viral enteritis.
A2: and determining the disease prediction rules including the second symptom characteristics as target disease prediction rules, calculating reciprocal values of the number of the target symptom characteristics included in each target disease prediction rule, and adding the reciprocal values to obtain a rule evaluation value of the second symptom characteristics.
And if the acquired disease prediction rule has the second symptom characteristic, namely the target symptom characteristic in the condition of the disease prediction rule is the second symptom characteristic, determining the disease prediction rule as the target disease prediction rule.
It is understood that the target disease prediction rules may include only the second symptom characteristic, and may include other symptom characteristics in addition to the second symptom characteristic. A reciprocal value of the number of target symptom features included in the target disease prediction rule is calculated, and the evaluation value of the second symptom feature for the target disease prediction rule is expressed by the calculated reciprocal value.
Specifically, for example, when the second symptom characteristic is "cough" and the target symptom characteristic of the condition in the target disease prediction rule is "cough", the number of the target symptom characteristics is 1, and the calculated reciprocal value is 1. If the target symptom characteristics of the conditions in the target disease prediction rule are "cough", "fever" and "chills", the number of the target symptom characteristics is 3, and the reciprocal value obtained by calculation is
Figure BDA0002916966170000131
And adding the reciprocal values corresponding to the obtained target disease prediction rules to obtain a rule evaluation value of the second symptom characteristic for all target symptom characteristics.
A3: and adding the obtained global importance of the second symptom characteristic to the rule evaluation value of the second symptom characteristic to obtain the updated global importance of the second symptom characteristic.
And adding the obtained global importance of the second symptom characteristic and the calculated rule evaluation value of the second symptom characteristic to obtain the updated global importance of the second symptom characteristic.
As an example, if each target disease prediction rule includes a reciprocal value of the number of target symptom features RkWherein k is more than or equal to 1 and less than or equal to c, k is a positive integer, and c is the number of target disease prediction rules.
When the global importance of the second symptom feature is calculated according to the calculation method of formula (1), the updated global importance of the second symptom feature is shown in formula (3):
S(fi)=∑jP(fi|dj)+∑kRk (3)
when the global importance of the second symptom feature is calculated according to the calculation method of formula (2), the updated global importance of the second symptom feature is shown in formula (4):
S(fi)=∑jg(P(fi|dj))+∑kRk (4)
as can be seen from the above, by calculating the rule evaluation value of the second symptom feature, the updated global importance has a score related to the probability of the disease and a score related to the disease prediction rule. The global importance of the second symptom feature determined from two aspects is more accurate, so that candidate symptom features with stronger association with the symptom features of historical queries can be obtained later, and meanwhile, the correlation with the diagnosis rule is considered.
The following is a description of calculating the discriminatory significance of the second symptom signature.
In one possible implementation manner, in S203, the identification importance of the second symptom feature is calculated according to the conditional probability of the first symptom feature under the condition of each disease, the feature value corresponding to each first symptom feature, and the conditional probability of the second symptom feature under the condition of each disease, and the method includes the following four steps B1-B4:
b1: and determining a disease associated with the second symptom characteristic as an associated disease based on the conditional probability of the second symptom characteristic under the condition of each disease.
Some diseases in the medical knowledge map are not associated with the second symptom characteristic, and in calculating the discriminatory significance of the second symptom characteristic, the disease associated with the second symptom characteristic needs to be determined.
The disease associated with the second symptom characteristic may be determined based on a conditional probability of the second symptom characteristic under the condition of the respective disease. In one possible implementation, a threshold value of the conditional probability may be set, and if the conditional probability of the second symptom feature is greater than or equal to the threshold value under the condition of a disease, the disease is determined as the associated disease. In another possible implementation, the conditional probabilities of the second symptom features under the condition of each disease can be used for sorting, and the associated diseases are determined according to the sorting result.
B2: calculating a probability value of a target associated disease under the condition of the current symptom feature set of the user according to the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature; the target associated disease is each of the associated diseases.
And taking each of the associated diseases as a target associated disease, and calculating the probability value of the target associated disease under the condition of the current symptom feature set of the user by using the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature.
Specifically, an embodiment of the present invention provides a specific implementation manner of calculating a probability value of a target associated disease under the condition of a current symptom feature set of a user according to a conditional probability of a first symptom feature under the condition of each disease and a feature value corresponding to each first symptom feature, please refer to the following.
B3: multiplying the probability value of the target associated disease under the condition of the current symptom feature set of the user by the uncertainty of the second symptom feature to the target associated disease to obtain an identification importance coefficient of the second symptom feature to the target associated disease; the uncertainty of the second symptom characteristic for the target associated disease is calculated based on a conditional probability of the second symptom characteristic under the conditions of the target associated disease.
The uncertainty of the second symptom characteristic for the target associated disease is a calculation coefficient required for calculating the identification significance, and the uncertainty of the second symptom characteristic for the target associated disease can be calculated according to the conditional probability of the second symptom characteristic under the condition of the target associated disease.
Specifically, the conditional probability of the second symptom characteristic under the condition of the target-associated disease can be represented as P (f)i|dn) And (4) showing. Wherein d isnRepresenting target associated diseases, n is more than or equal to 1 and less than or equal to e, n is a positive integer, and e isThe number of associated diseases. The uncertainty of the second symptom characteristic for the target-associated disease can be defined as {1- [ P (f)i|dn)]2-[1-P(fi|dn)]2}。
And multiplying the obtained probability value of the target associated disease under the condition of the current symptom feature set of the user with the uncertainty of the second symptom feature to the target associated disease to obtain the identification importance coefficient of the second feature to the target associated disease.
If P' (d) is adoptedn| A) represents the probability value of the target associated disease under the condition of the current symptom feature set of the user, and the obtained identification importance coefficient Gini of the second symptom feature to the target associated diseasen(fi| D) is shown in formula (5):
Ginin(fi|D)=P′(dn|A){1-[P(fi|dn)]2-[1-P(fi|dn)]2} (5)
where D represents the patient data set and the size of D may default to 1.
B4: and adding the identification importance coefficients of the second symptom characteristics to the related diseases to obtain a first summation result, and calculating the difference between 1 and the first summation result to obtain the identification importance of the second symptom characteristics.
And adding the obtained identification importance coefficients of the second symptom characteristics to the related diseases to obtain a first summation result. The first summation result Gini (f)i| D) may be as shown in equation (6):
Gini(fi|D)=∑nP′(dn|A){1-[P(fi|dn)]2-[1-P(fi|dn)]2} (6)
the first summation result Gini (f) obtained by calculationi| D) is in the value range of [0, 1%]A smaller value of the first summation result indicates a more distinctive second symptom characteristic. To facilitate the calculation of the importance of the second symptom feature, fused with the global importance, the first summation result is transformed, calculation 1 is compared with the first summation result Gini (f)iThe difference of | D) is determined,the discriminatory significance of the second symptom characteristic is obtained.
Gini (f) of differential importance of the second symptom characteristicsi) Can be shown as equation (7):
Gini(fi)=1-Gini(fi|D) (7)
the larger the numerical value of the identification importance of the second symptom characteristic obtained after conversion is, the more the identification of the second symptom characteristic is, and the importance of the second symptom characteristic is convenient to calculate and obtain, which is consistent with the global importance that the larger the numerical value is, the stronger the association between the second symptom characteristic and the disease is.
In the embodiment of the application, compared with the existing method, the calculation amount for calculating the identification importance of the second symptom characteristic is less, the calculation complexity is reduced, the calculation speed for calculating the identification importance of the second symptom characteristic can be increased, and the efficiency for determining and predicting the disease is improved.
Further, the embodiment of the present application provides that the step B2 calculates the probability value of the target associated disease under the condition of the current symptom feature set of the user according to the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature, and specifically includes the following five steps C1-C5:
c1: when the feature value corresponding to the target first symptom feature is 1, determining the conditional probability of the target first symptom feature under the condition of the target associated disease as the probability value of the target first symptom feature; the target first symptom characteristic is each of the first symptom characteristics.
And taking each symptom characteristic in the first symptom characteristics as a target first symptom characteristic. If the characteristic value corresponding to the target first symptom characteristic is 1, the first symptom characteristic is indicated to appear. The conditional probability of the target first symptom feature, under the condition of the target associated disease, represents the probability of the target first symptom feature appearing when having the target associated disease. Determining a conditional probability of the target first symptom feature under the condition of the target associated disease as a probability value of the target first symptom feature.
C2: when the feature value corresponding to the target first symptom feature is 0, determining the difference between 1 and the conditional probability of the target first symptom feature under the condition of the target associated disease as the probability value of the target first symptom feature.
If the characteristic value corresponding to the target first symptom characteristic is 0, the first symptom characteristic is not shown. Since the conditional probability of the target first symptom feature under the condition of the target-associated disease, which represents the probability of the target first symptom feature appearing when the target-associated disease is suffered, the difference between 1 and the conditional probability of the target first symptom feature under the condition of the target-associated disease is calculated, resulting in the probability that the target first symptom feature does not appear under the condition of the target-associated disease. And taking the obtained probability value as the probability value of the target first symptom characteristic.
In one possible implementation, the probability value P of the target first symptom feature can be calculated by using formula (8)m(dn|A)。
Pm(dn|A)=A(fm)P(fm|dn)+[1-A(fm)][1-P(fm|dn)] (8)
Wherein f ismM is greater than or equal to 1 and less than or equal to z, m is a positive integer, and z is the number of first symptom features. A (f)m) Denotes fmCorresponding characteristic value, A (f)m) Is 0 or 1. P (f)m|dn) Representing a conditional probability of a first symptom characteristic of the target under conditions of the target-associated disease.
When A (f)m) When the value of (b) is 1, [1-A (f)m)]The value of fraction is 0, [1-A (f)m)][1-P(fm|dn)]The fraction is 0. A (f)m)P(fm|dn) I.e. P (f)m|dn) As a probability value for the target first symptom characteristic.
When A (f)m) When the value of (A) is 0, A (f)m) The value of the moiety is 0, A (f)m)P(fm|dn) The value of the portion is 0. Will [1-A (f)m)][1-P(fm|dn)]I.e. 1-P (f)m|dn) As a probability value for the target first symptom characteristic.
C3: and multiplying the probability values of the first symptom characteristics to obtain the pseudo posterior probability value of the target associated disease under the condition of the current symptom characteristic set of the user.
Multiplying the obtained probability values of the respective first symptom features. Taking the above formula (8) as an example, the probability value obtained by multiplying the probability values of the first symptom features is used as the pseudo posterior probability value P (d) of the target associated disease under the condition of the current symptom feature set of the usern| a), which can be expressed by formula (9):
Figure BDA0002916966170000161
since the formula (9) is a simplified formula of the bayesian formula, the calculated probability value is not a posterior probability value of the target associated disease under the condition of the current symptom feature set of the user, and the calculated probability value is used as a pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user. In particular implementation, the probability value of the symptom feature in the target associated disease may be calculated only (without considering the symptom feature not in the target associated disease), or the probability value of the symptom feature not in the target associated disease may be set to a small positive value, such as 0.01, so as not to affect the calculation.
In one possible implementation, the importance of different target-associated diseases may also be set. Calculating a pseudo posterior probability value P (d) of the target associated disease under the condition of the current symptom feature set of the usernThe formula for | a) is as follows:
Figure BDA0002916966170000162
wherein, h (d)n) Target associated disease dnThe importance measure value of, h (d)n) Can be a numerical value or a function value or a prior probability P (d) of a diseasen). H (d) when the degree of importance of each target-associated disease is the samen) May be 1.
C4: and calculating the sum of the pseudo posterior probability values of the associated diseases under the condition of the current symptom feature set of the user to obtain a second summation result.
The calculated pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user needs to be further normalized.
Pseudo posterior probability values P (d) for respective associated diseases on condition of the user's current symptom feature setnAnd | A) summing to obtain a second summing result. The second summation result may be by ∑nP(dn| A) is shown.
C5: and dividing the pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user by the second summation result to obtain the probability value of the target associated disease under the condition of the current symptom feature set of the user.
The second summation result sigmanP(dn| A) as denominator, and the pseudo posterior probability value P (d) of the target associated disease under the condition of the current symptom feature set of the usern| A) is taken as a molecule, and the probability value P' (d) of the target associated disease under the condition of the current symptom feature set of the user after normalization is obtainedn|A)。
Probability value P' (d) of target associated disease under condition of current symptom feature set of usern| a) is as shown in equation (11):
Figure BDA0002916966170000163
based on the above, by calculating the pseudo posterior probability value of the target associated disease under the condition of the user's current symptom feature set, the calculation can be simplified. And the obtained pseudo posterior probability value is normalized, so that the identification importance of the second symptom characteristic can be calculated conveniently in the follow-up process.
Additionally, in one possible implementation, the degree to which the global significance and the discriminatory significance of the second symptom feature affect the significance of the second symptom feature may be adjusted.
The embodiment of the present application provides a specific implementation manner of determining the importance of the second symptom feature according to the global importance and/or the differential importance of the second symptom feature in step S204, including:
and weighting and summing the global importance of the second symptom characteristic and the identification importance to determine the importance of the second symptom characteristic.
And respectively setting the weight of the global importance of the second symptom characteristic and the weight of the identification importance of the second symptom characteristic, and carrying out weighted summation on the global importance and the identification importance of the second symptom characteristic to obtain the importance of the second symptom characteristic.
Global importance to second symptom feature S (f)i) And identifying the importance Gini (f)i) The formula for performing the weighting calculation can be as shown in formula (12):
Vf=αS(fi)+βGini(fi) (12)
wherein, the value ranges of alpha and beta are [0, 1 ].
By adjusting the weight of the global importance and the weight of the discrimination importance, the adjustment of the weight of the influence of the global importance and the discrimination importance on the importance of the second symptom feature can be achieved. When the weight of the global importance of the second symptom feature is 0, the importance of the second symptom feature is determined using only the discriminative importance of the second symptom feature. When the weight of the discriminatory importance of the second symptom feature is 0, the importance of the second symptom feature is determined using only the global importance of the second symptom feature.
In the embodiment of the application, the influence of the global importance and the identification importance on the importance of the second symptom feature through weight adjustment can be realized by performing weighted calculation on the global importance and the identification importance of the second symptom feature. Based on the method for implementing feature selection provided by the above method embodiment, an embodiment of the present application further provides a device for implementing feature selection, which is described below with reference to the accompanying drawings.
Referring to fig. 3, the drawing is a schematic structural diagram of an apparatus for implementing feature selection according to an embodiment of the present application. The device for realizing feature selection comprises:
a first obtaining unit 301, configured to obtain a current symptom feature set of a user, where the current symptom feature set of the user includes at least one first symptom feature and a feature value corresponding to each of the first symptom features; the characteristic value corresponding to the first symptom characteristic is used for indicating whether the first symptom characteristic appears or not;
a second acquisition unit 302 for acquiring, from the medical knowledge map, a conditional probability of the first symptom feature under the condition of each disease and a conditional probability of the second symptom feature under the condition of each disease; the second symptom characteristic is any symptom characteristic in the medical knowledge-map that excludes the first symptom characteristic;
a first calculating unit 303, configured to calculate a global importance of the second symptom feature according to a conditional probability of the second symptom feature under the condition of each disease; and/or calculating the identification importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under each disease condition, the characteristic value corresponding to each first symptom characteristic and the conditional probability of the second symptom characteristic under each disease condition;
a first determining unit 304 for determining the importance of the second symptom feature according to the global importance and/or the differential importance of the second symptom feature;
the sorting unit 305 is configured to sort each second symptom feature with the importance according to the importance, and determine the second symptom feature with the ranking meeting a preset condition as a candidate symptom feature.
In a possible implementation manner, the first calculating unit 303 is specifically configured to determine, according to the conditional probability of the second symptom feature under the condition of each disease, a disease associated with the second symptom feature as an associated disease;
adding the conditional probabilities of the second symptom features under the conditions of the respective associated diseases to obtain the global importance of the second symptom features.
In a possible implementation manner, the first calculating unit 303 is specifically configured to determine, according to the conditional probability of the second symptom feature under the condition of each disease, a disease associated with the second symptom feature as an associated disease;
respectively taking the conditional probability of the second symptom characteristic under each condition of the associated diseases as a variable of an objective function to obtain a function value of the objective function; the objective function is monotonically increased in an interval with the variable larger than zero, and the amplitude of the function value is increased along with the increase of the variable;
and adding the function values to obtain the global importance of the second symptom characteristic.
In one possible implementation, the apparatus further includes:
a third obtaining unit, configured to obtain a disease prediction rule from the medical knowledge graph, where the disease prediction rule includes a correspondence between a condition and an action, where the condition is a target feature value corresponding to a target symptom feature, and the action is a processing manner of an evaluation value for a corresponding disease;
a second calculation unit configured to determine a disease prediction rule including the second symptom feature as a target disease prediction rule, calculate a reciprocal value of the number of target symptom features included in each target disease prediction rule, and add the respective reciprocal values to obtain a rule evaluation value of the second symptom feature;
and the third calculating unit is used for adding the obtained global importance of the second symptom characteristic and the rule evaluation value of the second symptom characteristic to obtain the updated global importance of the second symptom characteristic.
In a possible implementation manner, the first calculating unit 303 includes:
a first determining subunit configured to determine, as an associated disease, a disease associated with the second symptom feature, based on the conditional probability of the second symptom feature under the condition of each disease;
the first calculating subunit is used for calculating a probability value of a target associated disease under the condition of the current symptom feature set of the user according to the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature; the target associated disease is each of the associated diseases;
the second calculating subunit is used for multiplying the probability value of the target associated disease under the condition of the current symptom feature set of the user by the uncertainty of the second symptom feature to the target associated disease to obtain the identification importance coefficient of the second symptom feature to the target associated disease; the uncertainty of the second symptom characteristic for the target associated disease is calculated according to the conditional probability of the second symptom characteristic under the condition of the target associated disease;
and the third calculating subunit is used for adding the identification importance coefficients of the second symptom features to the associated diseases to obtain a first summation result, and calculating the difference between 1 and the first summation result to obtain the identification importance of the second symptom features.
In one possible implementation manner, the first computing subunit includes:
the second determining subunit is used for determining the conditional probability of the target first symptom characteristic under the condition of the target associated disease as the probability value of the target first symptom characteristic when the characteristic value corresponding to the target first symptom characteristic is 1; the target first symptom characteristic is each of the first symptom characteristics;
a third determining subunit, configured to determine, when the feature value corresponding to the target first symptom feature is 0, a difference between 1 and a conditional probability of the target first symptom feature under the condition of the target-related disease as a probability value of the target first symptom feature;
the fourth calculating subunit is used for multiplying the probability values of the first symptom features to obtain a pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user;
the fifth calculating subunit is used for calculating the sum of the pseudo posterior probability values of the associated diseases under the condition of the current symptom feature set of the user to obtain a second summation result;
and the sixth calculating subunit is used for dividing the pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user by the second summation result to obtain the probability value of the target associated disease under the condition of the current symptom feature set of the user.
In a possible implementation manner, the first determining unit 304 is specifically configured to perform weighted summation on the global importance and the discrimination importance of the second symptom feature to determine the importance of the second symptom feature.
Fig. 4 shows a block diagram of a client 400. For example, client 400 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.
Referring to fig. 4, client 400 may include one or more of the following components: processing component 402, memory 404, power component 406, multimedia component 408, audio component 410, input/output (I/O) interface 44, sensor component 414, and communication component 416.
Processing component 402 generally controls the overall operation of client 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 402 may include one or more processors 420 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 can include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.
Memory 404 is configured to store various types of data to support operations at client 400. Examples of such data include instructions for any application or method operating on client 400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 404 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 406 provide power to the various components of client 400. Power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for client 400.
The multimedia component 408 includes a screen providing an output interface between the client 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 408 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the client 400 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 410 is configured to output and/or input audio signals. For example, audio component 410 includes a Microphone (MIC) configured to receive external audio signals when client 400 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 also includes a speaker for outputting audio signals.
The I/O interface provides an interface between the processing component 402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
Sensor component 414 includes one or more sensors for providing status assessment of various aspects to client 400. For example, sensor component 414 may detect an open/closed state of device 400, the relative positioning of components, such as a display and keypad of client 400, sensor component 414 may also detect a change in location of client 400 or a component of client 400, the presence or absence of user contact with client 400, client 400 orientation or acceleration/deceleration, and a change in temperature of client 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
Communication component 416 is configured to facilitate communications between client 400 and other devices in a wired or wireless manner. Client 400 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the client 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the following methods:
acquiring a current symptom feature set of a user, wherein the current symptom feature set of the user comprises at least one first symptom feature and feature values corresponding to the first symptom features; the characteristic value corresponding to the first symptom characteristic is used for indicating whether the first symptom characteristic appears or not;
obtaining from the medical knowledge map a conditional probability of a first symptom feature under the condition of each disease and a conditional probability of a second symptom feature under the condition of each disease; the second symptom characteristic is any symptom characteristic in the medical knowledge-map that excludes the first symptom characteristic;
calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease; and/or calculating the identification importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under each disease condition, the characteristic value corresponding to each first symptom characteristic and the conditional probability of the second symptom characteristic under each disease condition;
determining the importance of the second symptom feature based on the global importance and/or the discriminatory importance of the second symptom feature;
and sequencing all the second symptom characteristics with the importance according to the importance, and determining the second symptom characteristics with the sequencing meeting the preset conditions as candidate symptom characteristics.
In one possible implementation, the calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease includes:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
adding the conditional probabilities of the second symptom features under the conditions of the respective associated diseases to obtain the global importance of the second symptom features.
In one possible implementation, the calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease includes:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
respectively taking the conditional probability of the second symptom characteristic under each condition of the associated diseases as a variable of an objective function to obtain a function value of the objective function; the objective function is monotonically increased in an interval with the variable larger than zero, and the amplitude of the function value is increased along with the increase of the variable;
and adding the function values to obtain the global importance of the second symptom characteristic.
In one possible implementation, the method further includes:
acquiring a disease prediction rule from the medical knowledge graph, wherein the disease prediction rule comprises a corresponding relation between a condition and an action, the condition is a target characteristic value corresponding to a target symptom characteristic, and the action is used as a processing mode of an evaluation value of a corresponding disease;
determining disease prediction rules including the second symptom features as target disease prediction rules, calculating reciprocal values of the number of the target symptom features included in each target disease prediction rule, and adding the reciprocal values to obtain a rule evaluation value of the second symptom features;
and adding the obtained global importance of the second symptom characteristic to the rule evaluation value of the second symptom characteristic to obtain the updated global importance of the second symptom characteristic.
In one possible implementation manner, the calculating the discriminatory importance of the second symptom feature according to the conditional probability of the first symptom feature under the condition of each disease, the feature value corresponding to each first symptom feature, and the conditional probability of the second symptom feature under the condition of each disease includes:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
calculating a probability value of a target associated disease under the condition of the current symptom feature set of the user according to the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature; the target associated disease is each of the associated diseases;
multiplying the probability value of the target associated disease under the condition of the current symptom feature set of the user by the uncertainty of the second symptom feature to the target associated disease to obtain a discrimination importance coefficient of the second symptom feature to the target associated disease; the uncertainty of the second symptom characteristic for the target associated disease is calculated according to the conditional probability of the second symptom characteristic under the condition of the target associated disease;
and adding the identification importance coefficients of the second symptom characteristics to the related diseases to obtain a first summation result, and calculating the difference between 1 and the first summation result to obtain the identification importance of the second symptom characteristics.
In a possible implementation manner, the calculating, according to the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature, a probability value of a target associated disease under the condition of a current symptom feature set of a user includes:
when the feature value corresponding to the target first symptom feature is 1, determining the conditional probability of the target first symptom feature under the condition of the target associated disease as the probability value of the target first symptom feature; the target first symptom characteristic is each of the first symptom characteristics;
when the feature value corresponding to the target first symptom feature is 0, determining the difference between 1 and the conditional probability of the target first symptom feature under the condition of the target associated disease as the probability value of the target first symptom feature;
multiplying the probability values of the first symptom characteristics to obtain a pseudo posterior probability value of the target associated disease under the condition of the current symptom characteristic set of the user;
calculating the sum of the pseudo posterior probability values of the associated diseases under the condition of the current symptom feature set of the user to obtain a second summation result;
and dividing the pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user by the second summation result to obtain the probability value of the target associated disease under the condition of the current symptom feature set of the user.
In one possible implementation, the determining the importance of the second symptom feature according to the global importance and/or the differential importance of the second symptom feature includes:
and weighting and summing the global importance of the second symptom characteristic and the identification importance to determine the importance of the second symptom characteristic.
Fig. 5 is a schematic structural diagram of a server in an embodiment of the present application. The server 500 may vary widely in configuration or performance and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and memory 532, one or more storage media 530 (e.g., one or more mass storage devices) storing applications 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.
The server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 556, one or more keyboards 556, and/or one or more operating systems 541, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
In addition, embodiments of the present application also provide a computer-readable medium having instructions stored thereon, which, when executed by one or more processors, cause an apparatus to perform the above-described method for implementing feature selection.
It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the system or the device disclosed by the embodiment, the description is simple because the system or the device corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for implementing feature selection, the method comprising:
acquiring a current symptom feature set of a user, wherein the current symptom feature set of the user comprises at least one first symptom feature and feature values corresponding to the first symptom features; the characteristic value corresponding to the first symptom characteristic is used for indicating whether the first symptom characteristic appears or not;
obtaining from the medical knowledge map a conditional probability of a first symptom feature under the condition of each disease and a conditional probability of a second symptom feature under the condition of each disease; the second symptom characteristic is any symptom characteristic in the medical knowledge-map that excludes the first symptom characteristic;
calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease; and/or calculating the identification importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under each disease condition, the characteristic value corresponding to each first symptom characteristic and the conditional probability of the second symptom characteristic under each disease condition;
determining the importance of the second symptom feature based on the global importance and/or the discriminatory importance of the second symptom feature;
and sequencing all the second symptom characteristics with the importance according to the importance, and determining the second symptom characteristics with the sequencing meeting the preset conditions as candidate symptom characteristics.
2. The method of claim 1, wherein said calculating a global significance of said second symptom feature based on said conditional probability of said second symptom feature under each disease condition comprises:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
adding the conditional probabilities of the second symptom features under the conditions of the respective associated diseases to obtain the global importance of the second symptom features.
3. The method of claim 1, wherein said calculating a global significance of said second symptom feature based on said conditional probability of said second symptom feature under each disease condition comprises:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
respectively taking the conditional probability of the second symptom characteristic under each condition of the associated diseases as a variable of an objective function to obtain a function value of the objective function; the objective function is monotonically increased in an interval with the variable larger than zero, and the amplitude of the function value is increased along with the increase of the variable;
and adding the function values to obtain the global importance of the second symptom characteristic.
4. A method according to claim 2 or 3, characterized in that the method further comprises:
acquiring a disease prediction rule from the medical knowledge graph, wherein the disease prediction rule comprises a corresponding relation between a condition and an action, the condition is a target characteristic value corresponding to a target symptom characteristic, and the action is used as a processing mode of an evaluation value of a corresponding disease;
determining disease prediction rules including the second symptom features as target disease prediction rules, calculating reciprocal values of the number of the target symptom features included in each target disease prediction rule, and adding the reciprocal values to obtain a rule evaluation value of the second symptom features;
and adding the obtained global importance of the second symptom characteristic to the rule evaluation value of the second symptom characteristic to obtain the updated global importance of the second symptom characteristic.
5. The method of claim 1, wherein said calculating the discriminatory importance of the second symptom signature based on the conditional probabilities of the first symptom signature under each disease condition, the feature values corresponding to each of the first symptom signature, and the conditional probabilities of the second symptom signature under each disease condition comprises:
determining a disease associated with the second symptom characteristic as an associated disease according to the conditional probability of the second symptom characteristic under the condition of each disease;
calculating a probability value of a target associated disease under the condition of the current symptom feature set of the user according to the conditional probability of the first symptom feature under the condition of each disease and the feature value corresponding to each first symptom feature; the target associated disease is each of the associated diseases;
multiplying the probability value of the target associated disease under the condition of the current symptom feature set of the user by the uncertainty of the second symptom feature to the target associated disease to obtain a discrimination importance coefficient of the second symptom feature to the target associated disease; the uncertainty of the second symptom characteristic for the target associated disease is calculated according to the conditional probability of the second symptom characteristic under the condition of the target associated disease;
and adding the identification importance coefficients of the second symptom characteristics to the related diseases to obtain a first summation result, and calculating the difference between 1 and the first summation result to obtain the identification importance of the second symptom characteristics.
6. The method of claim 5, wherein calculating a probability value of the target associated disease for the current symptom feature set of the user according to the conditional probability of the first symptom feature for each disease and the feature value corresponding to each first symptom feature comprises:
when the feature value corresponding to the target first symptom feature is 1, determining the conditional probability of the target first symptom feature under the condition of the target associated disease as the probability value of the target first symptom feature; the target first symptom characteristic is each of the first symptom characteristics;
when the feature value corresponding to the target first symptom feature is 0, determining the difference between 1 and the conditional probability of the target first symptom feature under the condition of the target associated disease as the probability value of the target first symptom feature;
multiplying the probability values of the first symptom characteristics to obtain a pseudo posterior probability value of the target associated disease under the condition of the current symptom characteristic set of the user;
calculating the sum of the pseudo posterior probability values of the associated diseases under the condition of the current symptom feature set of the user to obtain a second summation result;
and dividing the pseudo posterior probability value of the target associated disease under the condition of the current symptom feature set of the user by the second summation result to obtain the probability value of the target associated disease under the condition of the current symptom feature set of the user.
7. The method of claim 1, wherein determining the importance of the second symptom feature based on the global importance and/or the discriminatory importance of the second symptom feature comprises:
and weighting and summing the global importance of the second symptom characteristic and the identification importance to determine the importance of the second symptom characteristic.
8. An apparatus for enabling feature selection, the apparatus comprising:
the system comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining a user current symptom characteristic set, and the user current symptom characteristic set comprises at least one first symptom characteristic and characteristic values corresponding to the first symptom characteristics; the characteristic value corresponding to the first symptom characteristic is used for indicating whether the first symptom characteristic appears or not;
a second acquisition unit configured to acquire, from the medical knowledge map, a conditional probability of the first symptom feature under the condition of each disease and a conditional probability of the second symptom feature under the condition of each disease; the second symptom characteristic is any symptom characteristic in the medical knowledge-map that excludes the first symptom characteristic;
a first calculation unit configured to calculate a global importance of a second symptom feature on the basis of the conditional probability of the second symptom feature under the condition of each disease; and/or calculating the identification importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under each disease condition, the characteristic value corresponding to each first symptom characteristic and the conditional probability of the second symptom characteristic under each disease condition;
a first determination unit for determining the importance of the second symptom feature according to the global importance and/or the differential importance of the second symptom feature;
and the sorting unit is used for sorting the second symptom characteristics with the importance according to the importance, and determining the second symptom characteristics with the sorting meeting the preset conditions as candidate symptom characteristics.
9. An apparatus for implementing feature selection comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors, the one or more programs including instructions for:
acquiring a current symptom feature set of a user, wherein the current symptom feature set of the user comprises at least one first symptom feature and feature values corresponding to the first symptom features; the characteristic value corresponding to the first symptom characteristic is used for indicating whether the first symptom characteristic appears or not;
obtaining from the medical knowledge map a conditional probability of a first symptom feature under the condition of each disease and a conditional probability of a second symptom feature under the condition of each disease; the second symptom characteristic is any symptom characteristic in the medical knowledge-map that excludes the first symptom characteristic;
calculating the global importance of the second symptom feature according to the conditional probability of the second symptom feature under the condition of each disease; and/or calculating the identification importance of the second symptom characteristic according to the conditional probability of the first symptom characteristic under each disease condition, the characteristic value corresponding to each first symptom characteristic and the conditional probability of the second symptom characteristic under each disease condition;
determining the importance of the second symptom feature based on the global importance and/or the discriminatory importance of the second symptom feature;
and sequencing all the second symptom characteristics with the importance according to the importance, and determining the second symptom characteristics with the sequencing meeting the preset conditions as candidate symptom characteristics.
10. A computer-readable medium having stored thereon instructions which, when executed by one or more processors, cause an apparatus to perform a method of implementing feature selection as recited in one or more of claims 1-7.
CN202110104867.1A 2021-01-26 2021-01-26 Method, device and equipment for realizing feature ordering Active CN112951405B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110104867.1A CN112951405B (en) 2021-01-26 2021-01-26 Method, device and equipment for realizing feature ordering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110104867.1A CN112951405B (en) 2021-01-26 2021-01-26 Method, device and equipment for realizing feature ordering

Publications (2)

Publication Number Publication Date
CN112951405A true CN112951405A (en) 2021-06-11
CN112951405B CN112951405B (en) 2024-05-28

Family

ID=76237108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110104867.1A Active CN112951405B (en) 2021-01-26 2021-01-26 Method, device and equipment for realizing feature ordering

Country Status (1)

Country Link
CN (1) CN112951405B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628577A (en) * 2023-07-26 2023-08-22 安徽通灵仿生科技有限公司 Adverse event detection method and device for ventricular assist device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1423789A (en) * 2000-02-14 2003-06-11 第一咨询公司 Automated diagnostic system and method
CN107408143A (en) * 2014-12-16 2017-11-28 L.I.A.迪朱塞佩·卡帕索公司 Suitable for determining the medical antidiastole device of the optimal sequence of the diagnostic test for identifying lesion using diagnosis appropriateness standard
US20180011979A1 (en) * 2016-07-11 2018-01-11 Baidu Usa Llc Question generation systems and methods for automating diagnosis
CN107610774A (en) * 2017-10-25 2018-01-19 医渡云(北京)技术有限公司 Intelligent way of inquisition and device, storage medium, electronic equipment
JP2018120430A (en) * 2017-01-25 2018-08-02 株式会社メドレー Medical information providing method, medical information providing device, and program
CN109192300A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Intelligent way of inquisition, system, computer equipment and storage medium
CN110504028A (en) * 2019-08-22 2019-11-26 上海软中信息系统咨询有限公司 A kind of disease way of inquisition, device, system, computer equipment and storage medium
CN110634570A (en) * 2018-06-22 2019-12-31 北京搜狗科技发展有限公司 Diagnostic simulation method and related device
WO2020007028A1 (en) * 2018-07-04 2020-01-09 平安科技(深圳)有限公司 Medical consultation data recommendation method, device, computer apparatus, and storage medium
CN111180081A (en) * 2019-12-30 2020-05-19 众安信息技术服务有限公司 Intelligent inquiry method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1423789A (en) * 2000-02-14 2003-06-11 第一咨询公司 Automated diagnostic system and method
CN107408143A (en) * 2014-12-16 2017-11-28 L.I.A.迪朱塞佩·卡帕索公司 Suitable for determining the medical antidiastole device of the optimal sequence of the diagnostic test for identifying lesion using diagnosis appropriateness standard
US20180011979A1 (en) * 2016-07-11 2018-01-11 Baidu Usa Llc Question generation systems and methods for automating diagnosis
CN107610770A (en) * 2016-07-11 2018-01-19 百度(美国)有限责任公司 System and method are generated for the problem of automated diagnostic
JP2018120430A (en) * 2017-01-25 2018-08-02 株式会社メドレー Medical information providing method, medical information providing device, and program
CN107610774A (en) * 2017-10-25 2018-01-19 医渡云(北京)技术有限公司 Intelligent way of inquisition and device, storage medium, electronic equipment
CN110634570A (en) * 2018-06-22 2019-12-31 北京搜狗科技发展有限公司 Diagnostic simulation method and related device
WO2020007028A1 (en) * 2018-07-04 2020-01-09 平安科技(深圳)有限公司 Medical consultation data recommendation method, device, computer apparatus, and storage medium
CN109192300A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Intelligent way of inquisition, system, computer equipment and storage medium
CN110504028A (en) * 2019-08-22 2019-11-26 上海软中信息系统咨询有限公司 A kind of disease way of inquisition, device, system, computer equipment and storage medium
CN111180081A (en) * 2019-12-30 2020-05-19 众安信息技术服务有限公司 Intelligent inquiry method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628577A (en) * 2023-07-26 2023-08-22 安徽通灵仿生科技有限公司 Adverse event detection method and device for ventricular assist device
CN116628577B (en) * 2023-07-26 2023-10-31 安徽通灵仿生科技有限公司 Adverse event detection method and device for ventricular assist device

Also Published As

Publication number Publication date
CN112951405B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
KR102365890B1 (en) Method, apparatus and storage medium for training of neural network
CN109800325B (en) Video recommendation method and device and computer-readable storage medium
US20210117726A1 (en) Method for training image classifying model, server and storage medium
CN106202330B (en) Junk information judgment method and device
CN109918669B (en) Entity determining method, device and storage medium
CN108073303B (en) Input method and device and electronic equipment
CN111160448B (en) Training method and device for image classification model
US20210158126A1 (en) Method and device for compressing a neural network model for machine translation and storage medium
JP2016517110A (en) Clustering method and related apparatus
CN112784142A (en) Information recommendation method and device
CN112926310B (en) Keyword extraction method and device
CN112307281A (en) Entity recommendation method and device
CN112768064A (en) Disease prediction device, disease prediction apparatus, symptom information processing method, symptom information processing device, and symptom information processing apparatus
CN112951405A (en) Method, device and equipment for realizing feature sorting
CN107515853B (en) Cell word bank pushing method and device
CN115512829A (en) Method, device and medium for acquiring disease diagnosis related group
CN110020153B (en) Searching method and device
CN110471538B (en) Input prediction method and device
CN107992893B (en) Method and device for compressing image feature space
CN112307353B (en) Data processing method and device, electronic equipment and storage medium
CN111324805A (en) Query intention determining method and device, searching method and searching engine
CN113312475B (en) Text similarity determination method and device
CN113190725B (en) Object recommendation and model training method and device, equipment, medium and product
CN107544969B (en) Method for optimizing size of static lexicon and electronic equipment
CN114077712A (en) Search result ordering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant