CN116483987A - Method and device for selecting target crowd, computer equipment and readable storage medium - Google Patents
Method and device for selecting target crowd, computer equipment and readable storage medium Download PDFInfo
- Publication number
- CN116483987A CN116483987A CN202310472230.7A CN202310472230A CN116483987A CN 116483987 A CN116483987 A CN 116483987A CN 202310472230 A CN202310472230 A CN 202310472230A CN 116483987 A CN116483987 A CN 116483987A
- Authority
- CN
- China
- Prior art keywords
- candidate
- target
- preset
- user
- crowd
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000036541 health Effects 0.000 claims abstract description 57
- 230000002776 aggregation Effects 0.000 claims abstract description 33
- 238000004220 aggregation Methods 0.000 claims abstract description 33
- 201000010099 disease Diseases 0.000 claims description 47
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 47
- 238000012545 processing Methods 0.000 claims description 18
- 238000011156 evaluation Methods 0.000 claims description 16
- 230000000977 initiatory effect Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 8
- 229940127554 medical product Drugs 0.000 claims description 8
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 230000008685 targeting Effects 0.000 claims description 2
- 238000010187 selection method Methods 0.000 abstract 1
- 206010012601 diabetes mellitus Diseases 0.000 description 30
- 239000008280 blood Substances 0.000 description 16
- 210000004369 blood Anatomy 0.000 description 16
- 208000024891 symptom Diseases 0.000 description 16
- 238000003745 diagnosis Methods 0.000 description 13
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 10
- 239000008103 glucose Substances 0.000 description 10
- 229940079593 drug Drugs 0.000 description 8
- 239000003814 drug Substances 0.000 description 8
- 238000011161 development Methods 0.000 description 7
- 230000018109 developmental process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000007726 management method Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 230000004044 response Effects 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 6
- 230000006399 behavior Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000007689 inspection Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000005548 health behavior Effects 0.000 description 3
- 230000036772 blood pressure Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 2
- 229960004329 metformin hydrochloride Drugs 0.000 description 2
- OETHQSJEHLVLGH-UHFFFAOYSA-N metformin hydrochloride Chemical compound Cl.CN(C)C(=N)N=C(N)N OETHQSJEHLVLGH-UHFFFAOYSA-N 0.000 description 2
- XZWYZXLIPXDOLR-UHFFFAOYSA-N metformin hydrochloride Natural products CN(C)C(=N)NC(N)=N XZWYZXLIPXDOLR-UHFFFAOYSA-N 0.000 description 2
- 230000000291 postprandial effect Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 208000007342 Diabetic Nephropathies Diseases 0.000 description 1
- 206010012679 Diabetic neuropathic ulcer Diseases 0.000 description 1
- 102000017011 Glycated Hemoglobin A Human genes 0.000 description 1
- 108010014663 Glycated Hemoglobin A Proteins 0.000 description 1
- 206010020710 Hyperphagia Diseases 0.000 description 1
- 206010020772 Hypertension Diseases 0.000 description 1
- 208000006083 Hypokinesia Diseases 0.000 description 1
- 102000004877 Insulin Human genes 0.000 description 1
- 108090001061 Insulin Proteins 0.000 description 1
- 208000004880 Polyuria Diseases 0.000 description 1
- 206010067584 Type 1 diabetes mellitus Diseases 0.000 description 1
- 229930003316 Vitamin D Natural products 0.000 description 1
- QYSXJUFSXHHAJI-XFEUOLMDSA-N Vitamin D3 Natural products C1(/[C@@H]2CC[C@@H]([C@]2(CCC1)C)[C@H](C)CCCC(C)C)=C/C=C1\C[C@@H](O)CCC1=C QYSXJUFSXHHAJI-XFEUOLMDSA-N 0.000 description 1
- 230000003187 abdominal effect Effects 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000037396 body weight Effects 0.000 description 1
- 239000002775 capsule Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013270 controlled release Methods 0.000 description 1
- 208000033679 diabetic kidney disease Diseases 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 238000002651 drug therapy Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 229940125396 insulin Drugs 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 206010036067 polydipsia Diseases 0.000 description 1
- 208000022530 polyphagia Diseases 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000035488 systolic blood pressure Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 208000001072 type 2 diabetes mellitus Diseases 0.000 description 1
- 235000019166 vitamin D Nutrition 0.000 description 1
- 239000011710 vitamin D Substances 0.000 description 1
- 150000003710 vitamin D derivatives Chemical class 0.000 description 1
- 229940046008 vitamin d Drugs 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The application discloses a circle selection method, a circle selection device, computer equipment and a readable storage medium of target crowd, relates to the field of Internet and the field of digital medical treatment, and is used for scoring candidate crowd from different dimensions and then carrying out weighted aggregation, so that circle selection accuracy can be improved, recommendation hit rate is improved, and different business scenes can be adapted more flexibly by changing the weight of scoring models and scores. The method comprises the following steps: responding to a target crowd circling instruction, determining a search text indicated by the target crowd circling instruction, and determining keywords of the search text; extracting a plurality of target feature labels from the knowledge graph to obtain candidate target groups in the health archive database; reading data information of each candidate user in a plurality of preset items, and calculating a plurality of confidence scores of a plurality of preset dimensions by adopting the data information; and carrying out weighted summarization on the confidence scores of each candidate user to obtain a relevance score, and selecting a target crowd according to the relevance score.
Description
Technical Field
The present invention relates to the internet field and the digital medical field, and in particular, to a method, an apparatus, a computer device and a readable storage medium for selecting target groups.
Background
Along with the gradual increase of the application rate of the electronic health file, through mining demographic information, diagnosis and treatment information and treatment medication information contained in the electronic health file and combining health related behaviors such as on-line application program inquiry and on-line information browsing, richer user portraits can be constructed, more accurate target crowd recommendation is provided for different medical service scenes, and further the quality of medical health service is improved.
In the related technology, a target user is selected by establishing a large user characteristic list, and the whole selecting process relies on manual development to customize characteristic labels of target groups different from all groups, but the applicant realizes that under the application scenes of different products, the requirement response time is long, the universality is poor due to the fact that a significant target user characteristic label set is established by manual development, and the data source difference is large due to the fact that the characteristic labels established by different users are subjected to sparse data and low saturation of the large user characteristic list, so that the accuracy of group selecting is low.
Disclosure of Invention
In view of this, the present application provides a method, an apparatus, a computer device and a readable storage medium for selecting target groups, which mainly aims to solve the problems that under the application scenarios of different products, relying on manual development to construct a significant target user feature tag set, the requirement response time is long, the universality is poor, and due to the feature tags constructed by different users, the data source difference is large, the data of a wide table of user features is sparse, the saturation is low, and the accuracy of group selection is low.
According to a first aspect of the present application, there is provided a method for selecting a target group, the method comprising:
responding to a target crowd circling instruction, determining a search text indicated by the target crowd circling instruction, and determining keywords matched with a medical knowledge base in the search text;
extracting a plurality of target feature labels associated with the keywords from the knowledge graph corresponding to the keywords, and acquiring candidate target groups indicated by the target feature labels in a health archive database;
reading data information of each candidate user in the candidate target group in a plurality of preset items included in the health record database, and calculating a plurality of confidence scores of a plurality of preset dimensions for each candidate user by adopting the data information;
And carrying out weighted summarization on the confidence scores of each candidate user to obtain the relevance score of each candidate user, and selecting the candidate user from the candidate target crowd as the target crowd to output according to the relevance score of each candidate user.
Optionally, the determining the search text indicated by the target crowd circling instruction and determining the keywords matched with the medical knowledge base in the search text includes:
obtaining a search text indicated by the target crowd circling instruction, dividing the search text to obtain a plurality of character strings, and identifying at least one verb character string in the plurality of character strings;
matching the plurality of character strings with entities in a medical knowledge base, and identifying at least one entity character string in the plurality of character strings, wherein the entity character string comprises the entities in the medical knowledge base, and the entities are any one of disease names, medical product names, prescription treatment names and medical resource names;
for each entity character string in the at least one entity character string, determining an associated verb character string with an associated relation with the entity character string in the at least one verb character string, and combining the entity character string with the associated verb character string to obtain a candidate keyword;
Generating candidate keywords for each entity character string in the at least one entity character string respectively, obtaining at least one candidate keyword, and selecting one candidate keyword from the at least one candidate keyword as the keyword of the search text, wherein the selected candidate keyword is the candidate keyword with the highest correlation with the medical knowledge base.
Optionally, the method further comprises:
and when detecting that the plurality of character strings do not comprise the entity in the medical knowledge base, generating a circle selection failure prompt, and pushing the circle selection failure prompt to a user initiating the circle selection instruction of the target crowd.
Optionally, the extracting a plurality of target feature labels associated with the keywords in the knowledge graph corresponding to the keywords, and obtaining candidate target groups indicated by the plurality of target feature labels in the health record database, includes:
determining target nodes corresponding to the keywords in the knowledge graph;
determining a plurality of secondary nodes taking the target node as a center in the knowledge graph, acquiring a plurality of associated keywords indicated by the plurality of secondary nodes, and taking each associated keyword in the plurality of associated keywords as a target feature tag to acquire a plurality of target feature tags;
Extracting a user group marked with the target feature tag from the health record database for each target feature tag in the plurality of target feature tags to obtain a plurality of user groups;
the plurality of user groups are aggregated as the candidate target group.
Optionally, the reading the data information of each candidate user in the candidate target group in a plurality of preset items included in the health record database, and calculating a plurality of confidence scores of a plurality of preset dimensions for each candidate user by adopting the data information includes:
for each candidate user in the candidate target group, reading data information corresponding to the candidate user on a plurality of preset items in the health record database to obtain a plurality of data information of the candidate user;
acquiring a preset scoring model applied by the target crowd circle selection instruction indication, and determining a plurality of preset dimensions corresponding to the preset scoring model, wherein the preset scoring model is used for scoring information of the plurality of preset dimensions;
the following processing is carried out on each preset dimension of the preset dimensions: determining at least one preset item associated with the preset dimension, taking at least one piece of data information corresponding to the associated at least one preset item as associated data information of the preset dimension, and calculating confidence scores of the preset dimension for the candidate users by adopting the associated data information;
Obtaining confidence scores of each preset dimension of the candidate users, obtaining a plurality of confidence scores of a plurality of preset dimensions of the candidate users, and obtaining a plurality of confidence scores of each candidate user
Optionally, the calculating the confidence score of the preset dimension for the candidate user using the associated data information includes:
acquiring disease data information corresponding to the associated data information from the medical knowledge base;
and comparing the associated data information with the disease data information, determining a probability value of the associated data information belonging to the disease data information by adopting a preset evaluation index, and taking the probability value as the confidence score of the preset dimension.
Optionally, the weighting and summarizing the multiple confidence scores of each candidate user to obtain a relevance score of each candidate user, and selecting the candidate user from the candidate target crowd as the target crowd to output according to the relevance score of each candidate user, including:
acquiring a preset scoring model applied by the target crowd circle selection instruction indication, and determining an aggregation model corresponding to the preset scoring model, wherein the aggregation model indicates the weight corresponding to each preset dimension in the plurality of preset dimensions;
For each candidate user in the candidate target group, acquiring a plurality of confidence scores of the candidate user, and carrying out aggregation calculation on the plurality of confidence scores by adopting the aggregation model to acquire a relevance score of the candidate user;
acquiring the relevance score of each candidate user in the candidate target group, sequencing the candidate users included in the candidate target group according to the sequence from high to low of the relevance score, and determining a preset circle selection mode;
when the preset circle selection mode indicates circle selection according to a preset score threshold, selecting candidate users with association scores exceeding the preset score threshold from the sorted candidate users as the target crowd, generating circle selection success reminding comprising the target crowd, and pushing the circle selection success reminding to a user initiating the circle selection instruction of the target crowd;
when the preset circle selection mode indicates circle selection according to a preset quantity threshold, selecting the candidate users which are ranked first and the quantity of which meets the preset quantity threshold from the ranked candidate users as the target crowd, generating circle selection success reminding comprising the target crowd, and pushing the circle selection success reminding to the user initiating the circle selection instruction of the target crowd.
According to a second aspect of the present application, there is provided a population of subjects, the apparatus comprising:
the processing module is used for responding to the target crowd circling instruction, determining a search text indicated by the target crowd circling instruction and determining keywords matched with a medical knowledge base in the search text;
the extraction module is used for extracting a plurality of target feature labels associated with the keywords from the knowledge graph corresponding to the keywords and obtaining candidate target groups indicated by the target feature labels in the health record database;
the calculation module is used for reading the data information of each candidate user in the candidate target group from a plurality of preset items included in the health record database, and calculating a plurality of confidence scores of a plurality of preset dimensions for each candidate user by adopting the data information;
and the aggregation module is used for carrying out weighted aggregation on the confidence scores of each candidate user to obtain the relevance score of each candidate user, and selecting the candidate user from the candidate target crowd as the target crowd to output according to the relevance score of each candidate user.
Optionally, the processing module is configured to obtain a search text indicated by the target crowd circling instruction, perform segmentation processing on the search text to obtain a plurality of character strings, and identify at least one verb character string in the plurality of character strings; matching the plurality of character strings with entities in a medical knowledge base, and identifying at least one entity character string in the plurality of character strings, wherein the entity character string comprises the entities in the medical knowledge base, and the entities are any one of disease names, medical product names, prescription treatment names and medical resource names; for each entity character string in the at least one entity character string, determining an associated verb character string with an associated relation with the entity character string in the at least one verb character string, and combining the entity character string with the associated verb character string to obtain a candidate keyword; generating candidate keywords for each entity character string in the at least one entity character string respectively, obtaining at least one candidate keyword, and selecting one candidate keyword from the at least one candidate keyword as the keyword of the search text, wherein the selected candidate keyword is the candidate keyword with the highest correlation with the medical knowledge base.
Optionally, the processing module is further configured to generate a reselection failure reminder when detecting that the plurality of character strings do not include the entity in the medical knowledge base, and push the reselection failure reminder to the user initiating the target crowd reselection instruction.
Optionally, the extracting module is configured to determine a target node corresponding to the keyword in the knowledge graph; determining a plurality of secondary nodes taking the target node as a center in the knowledge graph, acquiring a plurality of associated keywords indicated by the plurality of secondary nodes, and taking each associated keyword in the plurality of associated keywords as a target feature tag to acquire a plurality of target feature tags; extracting a user group marked with the target feature tag from the health record database for each target feature tag in the plurality of target feature tags to obtain a plurality of user groups; the plurality of user groups are aggregated as the candidate target group.
Optionally, the computing module is configured to, for each candidate user in the candidate target crowd, read, in the health record database, data information corresponding to the candidate user on a plurality of preset items, and obtain a plurality of data information of the candidate user; acquiring a preset scoring model applied by the target crowd circle selection instruction indication, and determining a plurality of preset dimensions corresponding to the preset scoring model, wherein the preset scoring model is used for scoring information of the plurality of preset dimensions; the following processing is carried out on each preset dimension of the preset dimensions: determining at least one preset item associated with the preset dimension, taking at least one piece of data information corresponding to the associated at least one preset item as associated data information of the preset dimension, and calculating confidence scores of the preset dimension for the candidate users by adopting the associated data information; obtaining the confidence score of each preset dimension of the candidate user, obtaining a plurality of confidence scores of a plurality of preset dimensions of the candidate user, and obtaining a plurality of confidence scores of each candidate user.
Optionally, the computing module is further configured to obtain disease data information corresponding to the associated data information from the medical knowledge base; and comparing the associated data information with the disease data information, determining a probability value of the associated data information belonging to the disease data information by adopting a preset evaluation index, and taking the probability value as the confidence score of the preset dimension.
Optionally, the aggregation module is configured to obtain a preset scoring model that is indicated by the target crowd circle selection instruction and determine an aggregation model corresponding to the preset scoring model, where the aggregation model indicates a weight corresponding to each preset dimension in the plurality of preset dimensions; for each candidate user in the candidate target group, acquiring a plurality of confidence scores of the candidate user, and carrying out aggregation calculation on the plurality of confidence scores by adopting the aggregation model to acquire a relevance score of the candidate user; acquiring the relevance score of each candidate user in the candidate target group, sequencing the candidate users included in the candidate target group according to the sequence from high to low of the relevance score, and determining a preset circle selection mode; when the preset circle selection mode indicates circle selection according to a preset score threshold, selecting candidate users with association scores exceeding the preset score threshold from the sorted candidate users as the target crowd, generating circle selection success reminding comprising the target crowd, and pushing the circle selection success reminding to a user initiating the circle selection instruction of the target crowd; when the preset circle selection mode indicates circle selection according to a preset quantity threshold, selecting the candidate users which are ranked first and the quantity of which meets the preset quantity threshold from the ranked candidate users as the target crowd, generating circle selection success reminding comprising the target crowd, and pushing the circle selection success reminding to the user initiating the circle selection instruction of the target crowd.
According to a third aspect of the present application there is provided a computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any of the first aspects described above when the computer program is executed by the processor.
According to a fourth aspect of the present application there is provided a readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the first aspects described above.
By means of the technical scheme, the method, the device, the computer equipment and the readable storage medium for selecting the target crowd are used for responding to the target crowd selecting instruction, determining the search text indicated by the target crowd selecting instruction, determining the keywords matched with the medical knowledge base in the search text, extracting a plurality of target feature labels associated with the keywords from the knowledge graph corresponding to the keywords, acquiring candidate target crowds indicated by the plurality of target feature labels in the health archive database, reading data information of each candidate user in the candidate target crowds in a plurality of preset projects included in the health archive database, calculating a plurality of confidence scores of a plurality of preset dimensions for each candidate user by adopting the data information, weighting and summarizing the confidence scores of each candidate user to obtain a relevance score of each candidate user, selecting the candidate user as the target crowd in the candidate target crowds according to the relevance score of each candidate user, evaluating and scoring the candidate users from different dimensions, weighting and aggregating the evaluation scores of different dimensions, so that the accuracy of the circle selecting weight can be improved, the service score can be further flexibly changed, and the service score can be flexibly changed by the service score, and the service score can be flexibly and flexibly applied to the service score is shortened.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 shows a schematic flow chart of a method for selecting a target crowd according to an embodiment of the present application;
fig. 2A is a schematic flow chart of a method for selecting a target crowd according to an embodiment of the present application;
fig. 2B shows a schematic diagram of a diabetes knowledge graph according to an embodiment of the present application;
FIG. 2C is a schematic diagram illustrating relevance score calculation according to an embodiment of the present disclosure;
fig. 2D illustrates a schematic architecture diagram of a targeting crowd circling provided in an embodiment of the present application;
Fig. 2E is a schematic flow chart of a method for selecting a diabetic population according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a circle selection of a target crowd according to an embodiment of the present application;
fig. 4 shows a schematic device structure of a computer device according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the application provides a method for selecting target groups, as shown in fig. 1, the method comprises the following steps:
101. and responding to the target crowd circling instruction, determining a search text indicated by the target crowd circling instruction, and determining keywords matched with the medical knowledge base in the search text.
Along with rapid development of the internet, personalized requirements of users are more and more prominent, and corresponding medical services are more and more classified, so that in order to provide more accurate target crowd recommendation for different medical service scenes, the application provides a target crowd circling method, target feature labels are automatically formed by analyzing input search words and combining with a knowledge graph, candidate users are scored from multiple dimensions, and finally target crowds are circled according to relevance scores obtained by weighted aggregation. The method for selecting the target crowd can be operated based on an independent server, and can also be operated based on a server for providing cloud service, cloud database, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, content delivery network (Content Delivery Network, CDN), basic cloud computing such as big data and an artificial intelligent platform, so that a user can more accurately select the target crowd, service recommendation hit rate is improved, and operation conversion rate in unit time is improved.
In the embodiment of the application, a user inputs search content at a client and initiates a target crowd circling instruction, and then a server responds to the target crowd circling instruction to determine a search text indicated by the target crowd circling instruction, namely, the search content input by the user is acquired. And then, the server side processes and identifies the search text to obtain keywords matched with the medical knowledge base in the search text, wherein the obtained keywords are keywords corresponding to target groups focused by the user. Thus, by extracting the keywords in the search text, the characteristics of the target crowd concerned by the user can be determined more accurately.
102. Extracting a plurality of target feature labels associated with the keywords from the knowledge graph corresponding to the keywords, and acquiring candidate target groups indicated by the target feature labels in the health archive database.
The related technology relies on the characteristic labels of the manual development and customization target crowd, which are different from the whole crowd, so that the saturation of the characteristic labels is low, and when facing continuous iteration of medical products and increasingly abundant medical scenes, the problem of poor universality of the characteristic labels of the manual development and customization is caused in the application process. Therefore, in the embodiment of the application, after determining the keywords corresponding to the target crowd concerned by the user, the server obtains the knowledge graphs corresponding to the keywords, wherein the knowledge graphs contain a large amount of data, and the search can be performed by relying on the knowledge graphs, and meanwhile, the knowledge graphs can extend the search and display the associated information. And then, the server extracts a plurality of target feature labels associated with the keywords from the knowledge graphs corresponding to the keywords, wherein the target feature labels are feature labels of the target crowd different from the whole crowd, so that a plurality of pieces of key information associated with the keywords can be obtained through the knowledge graphs, and the feature labels of the target crowd different from the whole crowd can be more accurately determined, thereby improving the saturation of the feature labels.
And then, the server acquires candidate target groups indicated by a plurality of target feature labels in a health record database, wherein the health record database comprises a large number of users and information related to medical health of the users, and the candidate users indicated by the target feature labels can be determined in the health record database through the target feature labels. In this way, a large number of candidate users can be initially selected as candidate target groups by taking a plurality of associated information associated with the keywords as target feature labels, and a population basis is provided for scoring the candidate users from different dimensions and determining the target groups.
103. And reading the data information of each candidate user in the candidate target group in a plurality of preset items included in the health record database, and calculating a plurality of confidence scores of a plurality of preset dimensions for each candidate user by adopting the data information.
The large user tag width table is formed only by means of the target characteristic tag, people are selected in a circle through the large user tag width table, the problems of sparse data and low saturation are caused, and the accuracy of the circle selection is low. Therefore, in the embodiment of the application, the server reads the data information of each candidate user in the candidate target group from a plurality of preset items included in the health record database, wherein the plurality of preset items include, but are not limited to, demographic information, diagnosis and treatment information, treatment medication information, on-line inquiry information, behavior data information and the like, so that the candidate users can be scored later to provide a large amount of accurate data basis. Then, a plurality of confidence scores of a plurality of preset dimensions are calculated for each candidate user by adopting the data information, wherein the plurality of preset dimensions can be at least one of disease risk, disease symptoms, inspection and examination, disease diagnosis, treatment medication and health behaviors, and the confidence scores represent probability values of the data information of the candidate user on whether the candidate user belongs to a target crowd. Therefore, the data information of the candidate users is scored through different dimensions, different medical scenes can be flexibly applied, evaluation is carried out from multiple dimensions, and the accuracy of crowd circle selection can be improved.
104. And carrying out weighted summarization on the confidence scores of each candidate user to obtain the relevance score of each candidate user, and selecting the candidate user from the candidate target crowd as the target crowd to output according to the relevance score of each candidate user.
In order to better reflect the confidence that the candidate users belong to the target crowd, and simultaneously more accurately select the target crowd and recommend relevant medical services to the target crowd, the candidate users with higher matching degree need to be further determined in the candidate target crowd. Therefore, in the embodiment of the application, the server performs weighted aggregation on the multiple confidence scores of the candidate user to obtain the relevance score of the candidate user, and reflects the confidence that the candidate user belongs to the target crowd through the relevance score, wherein the weights of the multiple confidence scores can be adjusted according to different medical service scenes, so that the universality of the target crowd circle selection can be improved. And then, calculating the relevance score of each candidate user in the candidate target group, and selecting the candidate user from the candidate target group as the target group to output according to the relevance score of each candidate user. Therefore, the objective crowd can be more flexibly selected in practical application by changing the weight of the evaluation score, and meanwhile, the accuracy of the objective crowd selecting can be improved.
According to the method provided by the embodiment of the application, the search text indicated by the target crowd circle selection instruction is determined in response to the target crowd circle selection instruction, the keyword matched with the medical knowledge base in the search text is determined, the multiple target feature labels associated with the keyword are extracted from the knowledge graph corresponding to the keyword, the candidate target crowd indicated by the multiple target feature labels in the health archive database is obtained, the data information of each candidate user in the candidate target crowd is read from multiple preset projects included in the health archive database, the multiple confidence scores of multiple preset dimensions are calculated for each candidate user by adopting the data information, the multiple confidence scores of each candidate user are weighted and summarized, the relevance score of each candidate user is obtained, the candidate user is selected from the candidate target crowd according to the relevance score of each candidate user and is used as the target crowd to be evaluated and scored from different dimensions, the evaluation scores of different dimensions are weighted and aggregated, the accuracy of crowd circle selection can be improved, the hit rate of service recommendation can be improved, the service requirement can be flexibly responded by changing the dimensions of the scoring model and the evaluation score, and the service requirement time can be reduced.
Further, as a refinement and extension of the foregoing embodiment, in order to fully describe a specific implementation procedure of the embodiment, the embodiment of the present application provides another method for selecting a target crowd, as shown in fig. 2A, where the method includes:
201. and responding to the target crowd circling instruction, determining a search text indicated by the target crowd circling instruction, and carrying out segmentation processing on the search text.
The requirements of users on medical treatment and health are endless, and the medical treatment and health resources provided by society are limited, so that different medical services need to define own service capability and service characteristics in order to avoid waste of medical treatment resources, and meanwhile, target crowds are positioned according to own service characteristics and medical services are recommended to the target crowds, so that the target crowds need to be selected in a circle in the whole crowds according to different medical service scenes. Thus, in the embodiment of the application, the user inputs the search content at the client and initiates the target crowd round-robin instruction, wherein the search content input by the user may be a word, a phrase, or a sentence, such as how a person with diabetes performs blood glucose management. Then, the server receives and responds to the target crowd circling instruction, and obtains a search text indicated by the target crowd circling instruction, namely search content input by a user. Then, the server performs segmentation processing on the search text to obtain a plurality of character strings, such as suffering from diabetes, people, how to proceed, blood sugar and management, wherein the process of segmenting the search text can be performed in a word segmentation device, so that the search text is processed to quickly and accurately identify the medical related keywords in the search text.
202. Keywords in the search text that match the medical knowledge base are determined.
In order to determine keywords corresponding to a target crowd of interest of a user and improve accuracy of target crowd circling, in the embodiment of the present application, a server identifies at least one verb string, such as suffering from and managing, in a plurality of strings, where the identification process may be performed in a verb structure analyzer. Then, the server matches the plurality of character strings with entities in a medical knowledge base, wherein the medical knowledge base comprises a disease base, a medical product base, a prescription treatment base and a medical resource base, and the medical knowledge base can correct wrongly written characters, slang expressions and the like of the character strings so as to obtain more accurate characteristic labels of target groups later, and the entities are any one of disease names, medical product names, prescription treatment names and medical resource names. Then, the server identifies at least one entity character string in the plurality of character strings, such as the plurality of character strings are affected, diabetic, man, how, blood sugar and managed, and the entity character strings obtained after matching with the medical knowledge base are diabetic and blood sugar, wherein the entity character strings comprise entities in the medical knowledge base, and the process of matching the plurality of character strings with the medical knowledge base can be executed in a medical entity analyzer to identify the entities with medical specific meanings.
Then, in order to determine the keywords in the search text more accurately, the server determines, for each entity string in at least one entity string, an associated verb string having an association with the entity string in at least one verb string, and combines the entity string with the associated verb string to obtain a candidate keyword, wherein the relationship between the plurality of strings can be analyzed by a relationship analyzer, such as analyzing the subject or object of the verb in grammar or syntax, then removing the nonsensical connecting words and the associated words in the plurality of strings, such as how, and how, the entity string is combined with the verb string associated with the entity string, such as diabetes and blood glucose management, and then completing the standardized processing of the search content. And then generating candidate keywords for each entity character string in the at least one entity character string respectively to obtain at least one candidate keyword, and selecting one candidate keyword from the at least one candidate keyword as a keyword for searching text to obtain keywords of target groups of interest of the user, such as the keywords suffering from diabetes, wherein the selected candidate keyword is the candidate keyword with the highest correlation with the medical knowledge base. Therefore, the keyword with highest medical relevance is determined in the search text input by the user, the characteristics of the crowd concerned by the user can be accurately determined, deviation is avoided, and a plurality of target characteristic labels are determined according to the keyword later.
In an alternative embodiment, when detecting that the plurality of character strings do not include the entity in the medical knowledge base, determining that the search text input by the user does not include the medical related word, so that the server side can generate a circle selection failure reminder and push the circle selection failure reminder to the user initiating the circle selection instruction of the target crowd, notifying the user that the target keyword is null, and searching content does not find the medical related concept. Therefore, crowd searching in a medical scene can be realized, and service delivery of a target crowd can be accurately serviced.
203. And extracting a plurality of target feature labels associated with the keywords from the knowledge maps corresponding to the keywords.
In order to avoid the problem that the target feature labels of manual development have limitations, in the embodiment of the application, the server determines a target node corresponding to the keyword in the knowledge graph and determines a plurality of secondary nodes taking the target node as a center in the knowledge graph, wherein the plurality of secondary nodes can be all primary nodes and secondary nodes, for example, the diabetes node is determined in the diabetes knowledge graph, and all primary nodes and secondary nodes are determined by taking the diabetes node as a center node. Then, a plurality of associated keywords indicated by a plurality of secondary nodes are obtained, each associated keyword in the plurality of associated keywords is used as a target feature label, a plurality of target feature labels are obtained, for example, the keywords corresponding to all primary nodes and secondary nodes are used as target feature labels of a circled target crowd, and the target feature labels can be eaten more, debilitated and drunk more, so that a plurality of associated feature labels can be constructed through a knowledge graph, the saturation of the feature labels is improved, and the problem of scattered label data sources caused in the process of manually customizing the labels is avoided.
In summary, the schematic diagram of the diabetes knowledge graph provided in the embodiment of the present application is as follows:
as shown in fig. 2B, since the information about (1) the related diseases and symptoms of the blood glucose management requirement, (3) the related treatment plan, (4) the related documents and science popularization, (5) the related health factors, and (6) the related doctor and institution can be obtained by the diabetes knowledge graph, the information about the related plurality of related information can be determined as the target feature label according to the keyword blood glucose management requirement in the diabetes knowledge graph, for example, all the first-level nodes and the second-level nodes are determined as the target feature label by taking (1) the blood glucose management requirement node as the central node, the risk factors of (5) the related health factors such as fasting blood pressure, systolic pressure, postprandial blood glucose, body weight, age, glycosylated hemoglobin, waistline, and diet movement can be taken as the target feature labels, and the disease types of (2) the related diseases and symptoms such as type 1 diabetes, insulin and type 2 diabetes can be taken as the target feature labels, and the doctors of the related doctor and institution types such as diabetes special hospitals, 301 endocrinology doctors, tense, and hospitals such as the doctors and hospitals can be taken as the target feature labels, and the related metformin hydrochloride and the target feature labels can be taken as the target feature labels in the target feature labels such as the target feature labels, and the target enteric-controlled-release drug therapy capsule (3) and the related aspects such as metformin hydrochloride. The target feature tag may be a symptom such as polydipsia, polyphagia, polyuria, or hypodynamia in the related diseases and symptoms of (2), a complication such as diabetic nephropathy or diabetic neuropathic ulcer in the related diseases and symptoms of (2), a question and answer of the patient such as what the diabetes is in the related diseases and symptoms of (2), what the diabetes is eating fruit, and why the blood pressure of the diabetes is raised, and finally a science popularization article such as "kidney" weight, diabetes, or hypertension is required to reduce blood glucose and increase the liver cancer risk may be generated by the low vitamin D level in the related documents and science popularization of (4).
204. And obtaining candidate target groups indicated by a plurality of target feature labels in the health record database.
In order to preliminarily obtain a large number of crowd samples so that target crowd can be further accurately determined in the crowd samples, in the embodiment of the application, a server side extracts user groups marked with target feature tags from a health archive database for each target feature tag in a plurality of target feature tags, obtains a plurality of user groups, and integrates the plurality of user groups as candidate target crowd. Therefore, a large number of candidate users can be selected as candidate target groups according to the target feature labels with high saturation, a basis is provided for further determining the target groups for the candidate users in the follow-up scoring, and the circling speed of the target groups can be improved.
205. And reading the data information of each candidate user in the candidate target group in a plurality of preset items included in the health record database.
In the embodiment of the application, for each candidate user in the candidate target crowd, the server reads the data information corresponding to the candidate user on a plurality of preset items in the health record database to obtain a plurality of data information of the candidate user, wherein the plurality of preset items include, but are not limited to, demographic information, past medical treatment records, physical examination records, online inquiry records, insurance guarantee information and the like, so that whether the candidate user belongs to the target crowd can be comprehensively and accurately judged through the information of the plurality of items, and the target crowd circle selection precision can be further improved. In an alternative embodiment, the data information for a plurality of preset items may be formatted so that the confidence score can be calculated more quickly thereafter.
206. And calculating a plurality of confidence scores of a plurality of preset dimensions for each candidate user by adopting the data information.
In order to select target groups in a multi-angle and multi-layer mode, the data information of candidate users is evaluated through different dimensions. Therefore, in the embodiment of the application, the server acquires the preset scoring model applied by the target crowd circle selection instruction indication, and determines a plurality of preset dimensions corresponding to the preset scoring model, wherein the preset scoring model is used for scoring information of the plurality of preset dimensions, the preset scoring model can adjust the plurality of preset dimensions according to different medical service scenes, and the plurality of preset dimensions can be disease risks, disease symptoms, inspection, disease diagnosis, treatment medication and health behaviors.
Then, the server performs the following processing on each preset dimension of the plurality of preset dimensions: at least one preset item associated with the preset dimension is determined, at least one piece of data information corresponding to the associated at least one preset item is used as associated data information of the preset dimension, for example, the dimension corresponding to the checking check is used for identifying that the data information corresponding to the candidate user is that the abdominal blood sugar in the physical examination report is 6.7. Then, disease data information corresponding to the associated data information is obtained from the medical knowledge base, for example, the normal fasting blood glucose of the diabetic patient is read in the disease base to be not more than 7. And then, comparing the associated data information with the disease data information, determining the probability value of the associated data information belonging to the disease data information by adopting a preset evaluation index, and taking the probability value as the confidence score of a preset dimension, for example, according to the fasting blood glucose standard of diabetics in a disease library, judging that the probability of the candidate user belonging to the diabetes target crowd is 0.8, namely, the confidence score of the candidate user in the dimension of inspection is 0.8.
And then, the dimension of disease symptoms is obtained, the symptoms of on-line inquiry and off-line diagnosis of the candidate user are obtained, and the similarity evaluation is carried out on the symptoms of the candidate user and the symptoms of diabetes in a disease library, so that the probability of the symptoms of the candidate user is obtained, and the candidate user belongs to a target crowd of diabetes. It should be noted that the probability value may be a probability value obtained by combining probability values of at least one data information in the associated data information.
Then, obtaining the confidence score of each preset dimension of the candidate user, obtaining a plurality of confidence scores of a plurality of preset dimensions of the candidate user, and obtaining a plurality of confidence scores of each candidate user. It should be noted that, the user can set the number of preset dimensions of the selected target crowd according to different medical service scenes, so that different service scenes can be adapted more flexibly by adjusting the evaluation standard of the target crowd.
In summary, the association score calculation schematic diagram provided in the embodiment of the present application is as follows:
as shown in fig. 2C, centering on the target demand disease, the crowd that can be matched by all the fields within the secondary nodes is taken as the largest target crowd, and the evidence collected from six dimensions for each potential demand client in the largest target crowd is obtained, wherein the six dimensions include the high-incidence crowd, the disease symptoms, the physiological and biochemical detection indexes, the disease diagnosis, the therapeutic medication and the behavior data. It should be noted that, the evidence of the high-rise crowd may be case information of an offline hospital visit, demographic information in a diagnosis record, past medical history, basic information of a physical examination report, basic information of a health assessment, basic information of a user acquired from a platform, and the like, the evidence of the disease symptoms may be consultation text of online dialogue consultation, case information of an offline visit, a physical and biochemical detection index of a physical and chemical detection section of an abnormal result in a physical and chemical detection report, the evidence of the disease diagnosis may be a diagnosis record and a disease claim record, the evidence of the treatment medication may be a treatment record, an operation record, a medication record, and the like of a potential requirement client, and the evidence of the behavior data may be medical related information of browsing, focusing and subscribing of the potential requirement client. Next, for each dimension, a confidence score is given to each potential demand customer indicating whether the customer belongs to the target demand customer group, wherein each dimension has similar data of different time and different sources, and each potential demand customer has a weight at the final confidence score of the dimension. Finally, the confidence scores of the six dimensions are weighted and summarized to determine a final confidence score of each potential demand customer, wherein the confidence of different evidences for whether the potential demand customer belongs to the target demand customer group is different, and the time of an evidence item is longer, the lower the confidence of the evidence item in judging the customer is, for example, the higher the confidence of a hospital diagnosis record for the customer belonging to the diabetes target group is than the on-line inquiry diagnosis record of the customer.
207. And carrying out weighted summarization on the confidence scores of each candidate user to obtain the relevance score of each candidate user.
In the embodiment of the application, the server acquires the preset scoring model which is indicated to be applied by the target crowd circle selection instruction, determines the aggregation model corresponding to the preset scoring model, and indicates the weight corresponding to each preset dimension in a plurality of preset dimensions, wherein after the confidence scores are calculated for the candidate users according to different numbers of dimensions, the weighting summarization modes of the plurality of confidence scores are different, and in different medical service scenes, the weights of the plurality of confidence scores obtained by calculation in the same dimension can be adjusted, so that the aggregation model corresponding to the preset scoring model needs to be determined, the relevance score of the candidate users can be calculated more accurately through different aggregation models, the circle selection accuracy is improved, and meanwhile, the universality in practical application can be improved. And then, for each candidate user in the candidate target group, acquiring a plurality of confidence scores of the candidate user, and carrying out aggregation calculation on the plurality of confidence scores by adopting an aggregation model to obtain the relevance scores of the candidate user. In this way, by acquiring the data related to the medical treatment in multiple aspects of the candidate user as the evidence and judging whether the candidate user belongs to the target crowd from multiple dimensions by using the evidence as the basis, the relevance score can be calculated more accurately, the target crowd can be determined more accurately, and different business scenes can be adapted more flexibly by changing the score weights of different models.
208. And selecting candidate users from the candidate target groups according to the relevance scores of the candidate users to be used as target group output.
In order to accurately select the target crowd, the relevance scores of all candidate users in the candidate target crowd may be ranked, and then the target crowd is selected according to the ranked candidate users, so in the embodiment of the application, the server obtains the relevance score of each candidate user in the candidate target crowd, ranks the candidate users included in the candidate target crowd according to the order of the relevance scores from high to low, and determines a preset selecting mode.
When the preset circle selection mode indicates circle selection is carried out according to a preset score threshold value, selecting candidate users with the relevance score exceeding the preset score threshold value from the sorted candidate users as target groups, generating circle selection success reminding comprising the target groups, and pushing the circle selection success reminding to the user initiating the circle selection instruction of the target groups. For example, selecting all candidate users with the relevance scores exceeding 85 points from the ranked candidate users as target groups, and then displaying the information of the candidate users with the relevance scores exceeding 85 points to the users so as to facilitate the users to conduct service recommendation.
When the preset circle selection mode indicates that circle selection is performed according to the preset quantity threshold, selecting the candidate users which are ranked first and the quantity of which meets the preset quantity threshold from the ranked candidate users as target groups, generating circle selection success reminding comprising the target groups, and pushing the circle selection success reminding to the user initiating the circle selection instruction of the target groups. For example, the first 90 candidate users are selected from the ranked candidate users as the target crowd, and then the information of the 90 candidate users included in the target crowd is displayed to the users. Therefore, the appropriate circle selection mode is selected according to the characteristics of different medical scenes, and the accuracy of the circle selection of the target crowd can be improved.
In summary, the architecture diagram of the target crowd circle selection provided in the embodiment of the present application is as follows:
as shown in fig. 2D, after the user inputs the search content, in the input word frequency preprocessing module, the word segmentation device performs segmentation processing on the search content, then the medical entity analyzer performs matching on the processed search content in combination with a medical knowledge base, where the medical knowledge base includes a disease base, a medical product base, a prescription treatment base and a medical resource base, and then determines that a plurality of candidate keywords are put into a target keyword pool through the verb structure analyzer and the relationship analyzer, and finally selects one keyword. And then, in the graph searching module, the largest candidate target group is obtained from the health record database according to the knowledge graph corresponding to the keyword. Then, in the evidence obtaining module, data information of each candidate user in the largest candidate target group is obtained from the personal health database of the health record database. Then, in the evidence scoring module, the data information of each candidate user is evaluated through a plurality of models such as disease risk, disease symptoms, inspection and examination, disease diagnosis, treatment medication, health behaviors and the like, so that a plurality of confidence scores are obtained. Finally, in the aggregation module, weighting and summarizing a plurality of confidence scores of each candidate user to determine a relevance score, and sequencing the relevance scores to determine the candidate users exceeding a score threshold as target groups.
In summary, the flow chart of the method for selecting the diabetic population provided by the embodiment of the present application is as follows:
as shown in fig. 2E, the business requirement input by the user at the client is a blood glucose management requirement, and the server performs keyword analysis on the business requirement through the medical knowledge base, and then determines that the keyword is diabetes. And then, the server searches a plurality of main associated keywords through the diabetes knowledge graph to obtain a plurality of target feature labels, reads candidate users in the health archive database according to the plurality of target feature labels, gathers the plurality of candidate users to form candidate target groups, and completes marking operation on related groups, such as user A and user B indicated by the target feature labels. Then, the server performs evidence retrieval in the health record database, acquires the health evidence of the user A and the user B, and performs evidence scoring to acquire the evidence score of the user A and the user B, for example, acquires the offline diagnosis of the user A as diabetes, monitors and records that the postprandial blood sugar is detected as 8.1 by using a household glucometer, acquires the online consultation of the user B as diabetes, and acquires that the fasting blood sugar is 6.7 in a physical examination report, wherein the behavior record comprises the self-test of the diabetes risk by using a golden manager APP (Application). Then, the relevant evidence is scored through N different models such as model 1, model 2 and the like, for example, the evidence is scored, so that the score of the model 1 of the user A is 1, the score of the model 2 is 0, the score of the model N is 0.7, the score of the model 1 of the user B is 0, the score of the model 2 is 0.8 and the score of the model N is 0.9. Finally, the scores of the multiple models are weighted and aggregated, the confidence coefficient of the user A is determined to be 0.9, the confidence coefficient of the user B is determined to be 0.6, and the matched target crowd is determined to be the user A according to the confidence coefficient.
According to the method provided by the embodiment of the application, the search text indicated by the target crowd circle selection instruction is determined in response to the target crowd circle selection instruction, the keyword matched with the medical knowledge base in the search text is determined, the multiple target feature labels associated with the keyword are extracted from the knowledge graph corresponding to the keyword, the candidate target crowd indicated by the multiple target feature labels in the health archive database is obtained, the data information of each candidate user in the candidate target crowd is read from multiple preset projects included in the health archive database, the multiple confidence scores of multiple preset dimensions are calculated for each candidate user by adopting the data information, the multiple confidence scores of each candidate user are weighted and summarized, the relevance score of each candidate user is obtained, the candidate user is selected from the candidate target crowd according to the relevance score of each candidate user and is used as the target crowd to be evaluated and scored from different dimensions, the evaluation scores of different dimensions are weighted and aggregated, the accuracy of crowd circle selection can be improved, the hit rate of service recommendation can be improved, the service requirement can be flexibly responded by changing the dimensions of the scoring model and the evaluation score, and the service requirement time can be reduced.
Further, as a specific implementation of the method shown in fig. 1, an embodiment of the present application provides a device for selecting a target crowd, as shown in fig. 3, where the device includes: a processing module 301, an extraction module 302, a calculation module 303 and an aggregation module 304.
The processing module 301 is configured to determine, in response to a target crowd-surrounding instruction, a search text indicated by the target crowd-surrounding instruction, and determine keywords in the search text that match with a medical knowledge base;
the extracting module 302 is configured to extract a plurality of target feature labels associated with the keywords from a knowledge graph corresponding to the keywords, and obtain candidate target groups indicated by the plurality of target feature labels in a health record database;
a calculating module 303, configured to read data information of each candidate user in the candidate target group from a plurality of preset items included in the health record database, and calculate a plurality of confidence scores of a plurality of preset dimensions for each candidate user by using the data information;
and the aggregation module 304 is configured to aggregate the multiple confidence scores of each candidate user in a weighted manner, obtain a relevance score of each candidate user, and select a candidate user from the candidate target crowd as a target crowd to output according to the relevance score of each candidate user.
In a specific application scenario, the processing module 301 is configured to obtain a search text indicated by the target crowd circling instruction, segment the search text to obtain a plurality of character strings, and identify at least one verb character string in the plurality of character strings; matching the plurality of character strings with entities in a medical knowledge base, and identifying at least one entity character string in the plurality of character strings, wherein the entity character string comprises the entities in the medical knowledge base, and the entities are any one of disease names, medical product names, prescription treatment names and medical resource names; for each entity character string in the at least one entity character string, determining an associated verb character string with an associated relation with the entity character string in the at least one verb character string, and combining the entity character string with the associated verb character string to obtain a candidate keyword; generating candidate keywords for each entity character string in the at least one entity character string respectively, obtaining at least one candidate keyword, and selecting one candidate keyword from the at least one candidate keyword as the keyword of the search text, wherein the selected candidate keyword is the candidate keyword with the highest correlation with the medical knowledge base.
In a specific application scenario, the processing module 301 is further configured to generate a reselection failure reminder when detecting that the plurality of character strings do not include the entity in the medical knowledge base, and push the reselection failure reminder to a user initiating the target crowd reselection instruction.
In a specific application scenario, the extracting module 302 is configured to determine a target node corresponding to the keyword in the knowledge graph; determining a plurality of secondary nodes taking the target node as a center in the knowledge graph, acquiring a plurality of associated keywords indicated by the plurality of secondary nodes, and taking each associated keyword in the plurality of associated keywords as a target feature tag to acquire a plurality of target feature tags; extracting a user group marked with the target feature tag from the health record database for each target feature tag in the plurality of target feature tags to obtain a plurality of user groups; the plurality of user groups are aggregated as the candidate target group.
In a specific application scenario, the computing module 303 is configured to, for each candidate user in the candidate target crowd, read, in the health record database, data information corresponding to the candidate user on a plurality of preset items, and obtain a plurality of data information of the candidate user; acquiring a preset scoring model applied by the target crowd circle selection instruction indication, and determining a plurality of preset dimensions corresponding to the preset scoring model, wherein the preset scoring model is used for scoring information of the plurality of preset dimensions; the following processing is carried out on each preset dimension of the preset dimensions: determining at least one preset item associated with the preset dimension, taking at least one piece of data information corresponding to the associated at least one preset item as associated data information of the preset dimension, and calculating confidence scores of the preset dimension for the candidate users by adopting the associated data information; obtaining the confidence score of each preset dimension of the candidate user, obtaining a plurality of confidence scores of a plurality of preset dimensions of the candidate user, and obtaining a plurality of confidence scores of each candidate user.
In a specific application scenario, the computing module 303 is further configured to obtain disease data information corresponding to the associated data information from the medical knowledge base; and comparing the associated data information with the disease data information, determining a probability value of the associated data information belonging to the disease data information by adopting a preset evaluation index, and taking the probability value as the confidence score of the preset dimension.
In a specific application scenario, the aggregation module 304 is configured to obtain a preset scoring model that the target crowd circling instruction indicates to apply, and determine an aggregation model corresponding to the preset scoring model, where the aggregation model indicates a weight corresponding to each preset dimension in the plurality of preset dimensions; for each candidate user in the candidate target group, acquiring a plurality of confidence scores of the candidate user, and carrying out aggregation calculation on the plurality of confidence scores by adopting the aggregation model to acquire a relevance score of the candidate user; acquiring the relevance score of each candidate user in the candidate target group, sequencing the candidate users included in the candidate target group according to the sequence from high to low of the relevance score, and determining a preset circle selection mode; when the preset circle selection mode indicates circle selection according to a preset score threshold, selecting candidate users with association scores exceeding the preset score threshold from the sorted candidate users as the target crowd, generating circle selection success reminding comprising the target crowd, and pushing the circle selection success reminding to a user initiating the circle selection instruction of the target crowd; when the preset circle selection mode indicates circle selection according to a preset quantity threshold, selecting the candidate users which are ranked first and the quantity of which meets the preset quantity threshold from the ranked candidate users as the target crowd, generating circle selection success reminding comprising the target crowd, and pushing the circle selection success reminding to the user initiating the circle selection instruction of the target crowd.
According to the device provided by the embodiment of the application, the search text indicated by the target crowd circle selection instruction is determined in response to the target crowd circle selection instruction, the keyword matched with the medical knowledge base in the search text is determined, the multiple target feature labels associated with the keyword are extracted from the knowledge graph corresponding to the keyword, the candidate target crowd indicated by the multiple target feature labels in the health archive database is obtained, the data information of each candidate user in the candidate target crowd is read in multiple preset projects included in the health archive database, the multiple confidence scores of multiple preset dimensions are calculated for each candidate user by adopting the data information, the multiple confidence scores of each candidate user are weighted and summarized, the relevance score of each candidate user is obtained, the candidate user is selected from the candidate target crowd according to the relevance score of each candidate user and is used as the target crowd to be evaluated and scored from different dimensions, the evaluation scores of different dimensions are weighted and aggregated, the accuracy of crowd circle selection can be improved, the hit rate of service recommendation can be improved, the service requirement scene can be flexibly responded by changing the dimensions of the scoring model and the evaluation score, and the service requirement time can be reduced.
It should be noted that, other corresponding descriptions of each functional unit related to the circling device for the target crowd provided in the embodiment of the present application may refer to corresponding descriptions in fig. 1 and fig. 2A to 2E, and are not repeated herein.
In an exemplary embodiment, referring to fig. 4, there is also provided a computer device, which includes a bus, a processor, a memory, and a communication interface, and may further include an input-output interface and a display device, where each functional unit may perform communication with each other through the bus. The memory stores a computer program and a processor for executing the program stored in the memory to execute the method for selecting the target crowd in the above embodiment.
A readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of selecting a target population.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented in hardware, or may be implemented by means of software plus necessary general hardware platforms. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the methods described in various implementation scenarios of the present application.
Those skilled in the art will appreciate that the drawings are merely schematic illustrations of one preferred implementation scenario, and that the modules or flows in the drawings are not necessarily required to practice the present application.
Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The foregoing application serial numbers are merely for description, and do not represent advantages or disadvantages of the implementation scenario.
The foregoing disclosure is merely a few specific implementations of the present application, but the present application is not limited thereto and any variations that can be considered by a person skilled in the art shall fall within the protection scope of the present application.
Claims (10)
1. A method for selecting a target group, comprising:
responding to a target crowd circling instruction, determining a search text indicated by the target crowd circling instruction, and determining keywords matched with a medical knowledge base in the search text;
extracting a plurality of target feature labels associated with the keywords from the knowledge graph corresponding to the keywords, and acquiring candidate target groups indicated by the target feature labels in a health archive database;
Reading data information of each candidate user in the candidate target group in a plurality of preset items included in the health record database, and calculating a plurality of confidence scores of a plurality of preset dimensions for each candidate user by adopting the data information;
and carrying out weighted summarization on the confidence scores of each candidate user to obtain the relevance score of each candidate user, and selecting the candidate user from the candidate target crowd as the target crowd to output according to the relevance score of each candidate user.
2. The method of claim 1, wherein the determining the search text indicated by the target crowd-sourced instruction, and determining keywords in the search text that match a medical knowledge base, comprises:
obtaining a search text indicated by the target crowd circling instruction, dividing the search text to obtain a plurality of character strings, and identifying at least one verb character string in the plurality of character strings;
matching the plurality of character strings with entities in a medical knowledge base, and identifying at least one entity character string in the plurality of character strings, wherein the entity character string comprises the entities in the medical knowledge base, and the entities are any one of disease names, medical product names, prescription treatment names and medical resource names;
For each entity character string in the at least one entity character string, determining an associated verb character string with an associated relation with the entity character string in the at least one verb character string, and combining the entity character string with the associated verb character string to obtain a candidate keyword;
generating candidate keywords for each entity character string in the at least one entity character string respectively, obtaining at least one candidate keyword, and selecting one candidate keyword from the at least one candidate keyword as the keyword of the search text, wherein the selected candidate keyword is the candidate keyword with the highest correlation with the medical knowledge base.
3. The method according to claim 2, wherein the method further comprises:
and when detecting that the plurality of character strings do not comprise the entity in the medical knowledge base, generating a circle selection failure prompt, and pushing the circle selection failure prompt to a user initiating the circle selection instruction of the target crowd.
4. The method according to claim 1, wherein the extracting a plurality of target feature labels associated with the keywords in the knowledge graph corresponding to the keywords, and obtaining candidate target groups indicated by the plurality of target feature labels in the health record database, includes:
Determining target nodes corresponding to the keywords in the knowledge graph;
determining a plurality of secondary nodes taking the target node as a center in the knowledge graph, acquiring a plurality of associated keywords indicated by the plurality of secondary nodes, and taking each associated keyword in the plurality of associated keywords as a target feature tag to acquire a plurality of target feature tags;
extracting a user group marked with the target feature tag from the health record database for each target feature tag in the plurality of target feature tags to obtain a plurality of user groups;
the plurality of user groups are aggregated as the candidate target group.
5. The method of claim 1, wherein the reading the data information of each candidate user in the candidate target group from a plurality of preset items included in the health profile database, and calculating a plurality of confidence scores of a plurality of preset dimensions for each candidate user using the data information, comprises:
for each candidate user in the candidate target group, reading data information corresponding to the candidate user on a plurality of preset items in the health record database to obtain a plurality of data information of the candidate user;
Acquiring a preset scoring model applied by the target crowd circle selection instruction indication, and determining a plurality of preset dimensions corresponding to the preset scoring model, wherein the preset scoring model is used for scoring information of the plurality of preset dimensions;
the following processing is carried out on each preset dimension of the preset dimensions: determining at least one preset item associated with the preset dimension, taking at least one piece of data information corresponding to the associated at least one preset item as associated data information of the preset dimension, and calculating confidence scores of the preset dimension for the candidate users by adopting the associated data information;
obtaining the confidence score of each preset dimension of the candidate user, obtaining a plurality of confidence scores of a plurality of preset dimensions of the candidate user, and obtaining a plurality of confidence scores of each candidate user.
6. The method of claim 5, wherein calculating the confidence score for the predetermined dimension for the candidate user using the associated data information comprises:
acquiring disease data information corresponding to the associated data information from the medical knowledge base;
and comparing the associated data information with the disease data information, determining a probability value of the associated data information belonging to the disease data information by adopting a preset evaluation index, and taking the probability value as the confidence score of the preset dimension.
7. The method of claim 1, wherein the weighting and summarizing the confidence scores of each candidate user to obtain a relevance score of each candidate user, and selecting a candidate user from the candidate target crowd as a target crowd output according to the relevance score of each candidate user, comprises:
acquiring a preset scoring model applied by the target crowd circle selection instruction indication, and determining an aggregation model corresponding to the preset scoring model, wherein the aggregation model indicates the weight corresponding to each preset dimension in the plurality of preset dimensions;
for each candidate user in the candidate target group, acquiring a plurality of confidence scores of the candidate user, and carrying out aggregation calculation on the plurality of confidence scores by adopting the aggregation model to acquire a relevance score of the candidate user;
acquiring the relevance score of each candidate user in the candidate target group, sequencing the candidate users included in the candidate target group according to the sequence from high to low of the relevance score, and determining a preset circle selection mode;
when the preset circle selection mode indicates circle selection according to a preset score threshold, selecting candidate users with association scores exceeding the preset score threshold from the sorted candidate users as the target crowd, generating circle selection success reminding comprising the target crowd, and pushing the circle selection success reminding to a user initiating the circle selection instruction of the target crowd;
When the preset circle selection mode indicates circle selection according to a preset quantity threshold, selecting the candidate users which are ranked first and the quantity of which meets the preset quantity threshold from the ranked candidate users as the target crowd, generating circle selection success reminding comprising the target crowd, and pushing the circle selection success reminding to the user initiating the circle selection instruction of the target crowd.
8. A targeting population's circle selection device, characterized by comprising:
the processing module is used for responding to the target crowd circling instruction, determining a search text indicated by the target crowd circling instruction and determining keywords matched with a medical knowledge base in the search text;
the extraction module is used for extracting a plurality of target feature labels associated with the keywords from the knowledge graph corresponding to the keywords and obtaining candidate target groups indicated by the target feature labels in the health record database;
the calculation module is used for reading the data information of each candidate user in the candidate target group from a plurality of preset items included in the health record database, and calculating a plurality of confidence scores of a plurality of preset dimensions for each candidate user by adopting the data information;
And the aggregation module is used for carrying out weighted aggregation on the confidence scores of each candidate user to obtain the relevance score of each candidate user, and selecting the candidate user from the candidate target crowd as the target crowd to output according to the relevance score of each candidate user.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A readable storage medium having stored thereon a computer program, which when executed by a processor realizes the steps of the method according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310472230.7A CN116483987A (en) | 2023-04-23 | 2023-04-23 | Method and device for selecting target crowd, computer equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310472230.7A CN116483987A (en) | 2023-04-23 | 2023-04-23 | Method and device for selecting target crowd, computer equipment and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116483987A true CN116483987A (en) | 2023-07-25 |
Family
ID=87226592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310472230.7A Pending CN116483987A (en) | 2023-04-23 | 2023-04-23 | Method and device for selecting target crowd, computer equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116483987A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118551769A (en) * | 2024-07-25 | 2024-08-27 | 浙江鸟潮供应链管理有限公司 | Label circling method and device based on large language model and computer equipment |
-
2023
- 2023-04-23 CN CN202310472230.7A patent/CN116483987A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118551769A (en) * | 2024-07-25 | 2024-08-27 | 浙江鸟潮供应链管理有限公司 | Label circling method and device based on large language model and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111710420B (en) | Complication onset risk prediction method, system, terminal and storage medium based on electronic medical record big data | |
KR102075788B1 (en) | Healthy content recommendation service system using big datas | |
CN112786194A (en) | Medical image diagnosis guide inspection system, method and equipment based on artificial intelligence | |
US11468363B2 (en) | Methods and systems for classification to prognostic labels using expert inputs | |
US11157822B2 (en) | Methods and systems for classification using expert data | |
CN108320798A (en) | Illness result generation method and device | |
CN111415760B (en) | Doctor recommendation method, doctor recommendation system, computer equipment and storage medium | |
CN114400062A (en) | Interpretation method and device of inspection report, computer equipment and storage medium | |
CN116483987A (en) | Method and device for selecting target crowd, computer equipment and readable storage medium | |
Mishra et al. | An enhanced approach for analyzing the performance of heart stroke prediction with machine learning techniques | |
Hom et al. | Facilitating clinical research through automation: Combining optical character recognition with natural language processing | |
CN113241193B (en) | Drug recommendation model training method, recommendation method, device, equipment and medium | |
CN109036506A (en) | Monitoring and managing method, electronic device and the readable storage medium storing program for executing of internet medical treatment interrogation | |
Emakhu et al. | A hybrid machine learning and natural language processing model for early detection of acute coronary syndrome | |
EP3948760A1 (en) | Methods and systems for utilizing diagnostics for informed vibrant constitutional guidance | |
CN116153496A (en) | Neural network model training method and depression emotion detection method | |
Rammal et al. | Heart failure prediction models using big data techniques | |
CN111681776B (en) | Medical object relation analysis method and system based on medical big data | |
Omkari et al. | An integrated Two-Layered Voting (TLV) framework for coronary artery disease prediction using machine learning classifiers | |
US11810669B2 (en) | Methods and systems for generating a descriptor trail using artificial intelligence | |
Shobha et al. | Analysis of importance of pre-processing in prediction of hypertension | |
Sathya et al. | Fake and Unproven Medical Remedy Detector using LSTM Model with Tensorflow Framework | |
US20220285035A1 (en) | Device and method of predicting disease by using elderly cohort data | |
Shabbeer et al. | Prediction of Sudden Health Crises Owing to Congestive Heart Failure with Deep Learning Models. | |
CN114708965B (en) | Diagnosis recommendation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |