CN115620886B - Data auditing method and device - Google Patents

Data auditing method and device Download PDF

Info

Publication number
CN115620886B
CN115620886B CN202211628930.2A CN202211628930A CN115620886B CN 115620886 B CN115620886 B CN 115620886B CN 202211628930 A CN202211628930 A CN 202211628930A CN 115620886 B CN115620886 B CN 115620886B
Authority
CN
China
Prior art keywords
information
attribute
diagnosis
sample
medicine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211628930.2A
Other languages
Chinese (zh)
Other versions
CN115620886A (en
Inventor
闫盈盈
徐晓涵
翟所迪
杨帅
张亚
周谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Peking University Third Hospital Peking University Third Clinical Medical College
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Peking University Third Hospital Peking University Third Clinical Medical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Peking University Third Hospital Peking University Third Clinical Medical College filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202211628930.2A priority Critical patent/CN115620886B/en
Publication of CN115620886A publication Critical patent/CN115620886A/en
Application granted granted Critical
Publication of CN115620886B publication Critical patent/CN115620886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a data auditing method and device, and relates to the technical field of intelligent medical treatment. One embodiment of the method comprises the following steps: acquiring prescription data to be audited, extracting medicine information and diagnosis information from the prescription data, and determining sample diagnosis information similar to the diagnosis information in a sample diagnosis information base; querying a medicine vector corresponding to the medicine information, and querying a diagnosis vector corresponding to the sample diagnosis information; and calculating the similarity of the medicine vector and the diagnosis vector, and determining that the prescription data passes the audit in response to the similarity being greater than or equal to a preset similarity threshold. According to the embodiment, the relations among the medicine information, the diagnosis information and the diagnosis information are mined based on the knowledge graph, and the problem of diversified diagnosis description in prescription data can be solved.

Description

Data auditing method and device
Technical Field
The invention relates to the technical field of intelligent medical treatment, in particular to a data auditing method and device.
Background
Due to uneven medical resource allocation, the professional field knowledge of part of hospital pharmacists is lack, and the prescription audit issued by the hospitals is not perfect and accurate. To solve this problem, a prescription auditing system is provided to audit the indication portion of the prescription based on objective pharmaceutical rules.
At present, examination is mainly based on indication descriptions in the drug specifications, but diagnosis descriptions in most prescriptions are different from the indication descriptions in the drug specifications, and with the development of medical technology, many using modes beyond the drug descriptions exist. In addition, the current indication checking function needs to be manually maintained based on pharmacists, and when a brand new diagnosis or synonym of indication of a drug specification is involved in a prescription, the existing prescription checking system cannot effectively solve the technical problems.
Disclosure of Invention
In view of the above, the present invention provides a data auditing method and apparatus, which at least can solve the problem in the prior art that it is difficult to accurately audit prescription indications due to the diversity of diagnostic descriptions.
To achieve the above object, according to an aspect of the embodiments of the present invention, there is provided a data auditing method, including:
acquiring prescription data to be audited, extracting medicine information and diagnosis information from the prescription data, and determining sample diagnosis information similar to the diagnosis information in a sample diagnosis information base;
querying a medicine vector corresponding to the medicine information, and querying a diagnosis vector corresponding to the sample diagnosis information; the medicine vector and the diagnosis vector are obtained through meta-path training, and the meta-path represents a path from medicine information to diagnosis information in the knowledge graph;
And calculating the similarity of the medicine vector and the diagnosis vector, and determining that the prescription data passes the audit in response to the similarity being greater than or equal to a preset similarity threshold.
Optionally, before the prescription data is acquired, the method further includes:
receiving an input sample drug diagnosis pairing relationship; the sample medicine diagnosis pairing relation comprises matched sample medicine information and sample diagnosis information;
querying drug specification information corresponding to the sample drug information, extracting drug feature information from the drug specification information, and extracting diagnostic feature information from the sample diagnostic information;
and constructing a knowledge graph based on the drug characteristic information and the diagnosis characteristic information, and training a machine learning model by using the knowledge graph to obtain a word vector model so as to obtain a drug vector and a diagnosis vector based on the word vector model.
Optionally, the extracting the drug characteristic information from the drug specification information includes: invoking a medicine attribute extraction model, and extracting a medicine common name, a first attribute, a second attribute and a third attribute from the medicine specification information to construct medicine characteristic information;
The extracting diagnostic feature information from the sample diagnostic information includes: and calling a diagnosis attribute extraction model, extracting a diagnosis name, a fourth attribute, a fifth attribute and a sixth attribute from the sample diagnosis information, and inquiring a seventh attribute corresponding to the fourth attribute to construct diagnosis characteristic information.
Optionally, the constructing a knowledge graph based on the drug feature information and the diagnostic feature information includes:
constructing a first meta-path based on the medicine universal name, the first attribute, the seventh attribute, the fourth attribute and the diagnosis name; and
constructing a second binary path based on the drug common name, the second attribute, the fifth attribute and the diagnosis name; and
constructing a third sub-path based on the drug common name, the third attribute, the sixth attribute and the diagnosis name;
and constructing a knowledge graph based on the first meta-path, the second meta-path and the third meta-path.
Optionally, the determining sample diagnosis information similar to the diagnosis information in the sample diagnosis information base includes:
and respectively calculating the similarity of the diagnosis information and each sample diagnosis information in the sample diagnosis information base, and taking the sample diagnosis information with the highest similarity as target sample diagnosis information similar to the diagnosis information.
Optionally, the determining sample diagnosis information similar to the diagnosis information in the sample diagnosis information base includes:
calling a diagnostic attribute extraction model, and extracting a fourth attribute, a fifth attribute and a sixth attribute from the diagnostic information;
determining a first set of sample diagnostic information in the sample diagnostic information library that is similar to a fourth attribute of the diagnostic information, a second set of sample diagnostic information that is similar to a fifth attribute, and a third set of sample diagnostic information that is similar to a sixth attribute;
the first sample diagnosis information set, the second sample diagnosis information set and the third sample diagnosis information set are intersected, and target sample diagnosis information similar to the diagnosis information is obtained in response to the fact that the number of sample diagnosis information in the intersected set is only one; or (b)
And in response to the number of the sample diagnosis information in the intersection set being a plurality of, respectively calculating the similarity of the diagnosis information and each sample diagnosis information in the intersection set, and taking the sample diagnosis information with the highest similarity as target sample diagnosis information similar to the diagnosis information.
To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided a data auditing apparatus, including:
The extraction module is used for acquiring prescription data to be audited, extracting medicine information and diagnosis information from the prescription data and determining sample diagnosis information similar to the diagnosis information in the sample diagnosis information base;
the query module is used for querying the medicine vector corresponding to the medicine information and querying the diagnosis vector corresponding to the sample diagnosis information; the medicine vector and the diagnosis vector are obtained through meta-path training, and the meta-path represents a path from medicine information to diagnosis information in the knowledge graph;
and the calculation module is used for calculating the similarity of the medicine vector and the diagnosis vector, and determining that the prescription data passes the audit in response to the similarity being greater than or equal to a preset similarity threshold.
Optionally, the training module is further included for:
receiving an input sample drug diagnosis pairing relationship; the sample medicine diagnosis pairing relation comprises matched sample medicine information and sample diagnosis information;
querying drug specification information corresponding to the sample drug information, extracting drug feature information from the drug specification information, and extracting diagnostic feature information from the sample diagnostic information;
And constructing a knowledge graph based on the drug characteristic information and the diagnosis characteristic information, and training a machine learning model by using the knowledge graph to obtain a word vector model so as to obtain a drug vector and a diagnosis vector based on the word vector model.
Optionally, the training module is configured to: invoking a medicine attribute extraction model, and extracting a medicine common name, a first attribute, a second attribute and a third attribute from the medicine specification information to construct medicine characteristic information;
the training module is used for: and calling a diagnosis attribute extraction model, extracting a diagnosis name, a fourth attribute, a fifth attribute and a sixth attribute from the sample diagnosis information, and inquiring a seventh attribute corresponding to the fourth attribute to construct diagnosis characteristic information.
Optionally, the training module is configured to:
constructing a first meta-path based on the medicine universal name, the first attribute, the seventh attribute, the fourth attribute and the diagnosis name; and
constructing a second binary path based on the drug common name, the second attribute, the fifth attribute and the diagnosis name; and
constructing a third sub-path based on the drug common name, the third attribute, the sixth attribute and the diagnosis name;
And constructing a knowledge graph based on the first meta-path, the second meta-path and the third meta-path.
Optionally, the extracting module is configured to:
and respectively calculating the similarity of the diagnosis information and each sample diagnosis information in the sample diagnosis information base, and taking the sample diagnosis information with the highest similarity as target sample diagnosis information similar to the diagnosis information.
Optionally, the extracting module is configured to:
calling a diagnostic attribute extraction model, and extracting a fourth attribute, a fifth attribute and a sixth attribute from the diagnostic information;
determining a first set of sample diagnostic information in the sample diagnostic information library that is similar to a fourth attribute of the diagnostic information, a second set of sample diagnostic information that is similar to a fifth attribute, and a third set of sample diagnostic information that is similar to a sixth attribute;
the first sample diagnosis information set, the second sample diagnosis information set and the third sample diagnosis information set are intersected, and target sample diagnosis information similar to the diagnosis information is obtained in response to the fact that the number of sample diagnosis information in the intersected set is only one; or (b)
And in response to the number of the sample diagnosis information in the intersection set being a plurality of, respectively calculating the similarity of the diagnosis information and each sample diagnosis information in the intersection set, and taking the sample diagnosis information with the highest similarity as target sample diagnosis information similar to the diagnosis information.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a data auditing electronic device.
The electronic equipment of the embodiment of the invention comprises: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the data auditing method.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described data auditing methods.
According to the scheme provided by the invention, the invention has the following advantages or beneficial effects: the recommendation algorithm based on the knowledge graph is introduced into a prescription indication inspection scene, the relation between the medicine universal names and the diagnoses is excavated, the medicine universal name vectors and the diagnosis vectors are finally constructed, the corresponding vectors can be directly inquired based on the uniqueness of the medicine universal names in subsequent use, for the diagnosis information, the similar sample diagnosis information and the diagnosis vectors inquiring the sample diagnosis information are determined through attribute similarity calculation, text similarity calculation and keyword similarity calculation modes, so that the high efficiency and the accuracy of the diagnosis vector inquiry used at the time are improved, the mode that the model calculation vectors are still relied when the existing application is abandoned is entirely, and the problem of the diversity of diagnosis description can be solved.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic flow chart of a data auditing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a recommendation algorithm for an analog knowledge graph;
FIG. 3 is a flow chart of an alternative data auditing method according to an embodiment of the present invention;
fig. 4 (a) is a structural diagram of a drug knowledge graph and a disease knowledge graph;
FIG. 4 (b) is a knowledge-graph sample schematic;
FIG. 5 is a flow chart of another alternative data auditing method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the main modules of a data auditing apparatus according to an embodiment of the present invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 8 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It is noted that embodiments of the invention and features of the embodiments may be combined with each other without conflict. The acquisition, storage, use, processing and the like of the data (such as the personal information of the user) in the technical scheme all accord with the relevant regulations of national laws and regulations, and the public order is not violated.
Referring to fig. 1, a main flowchart of a data auditing method provided by an embodiment of the present invention is shown, including the following steps:
s101: acquiring prescription data to be audited, extracting medicine information and diagnosis information from the prescription data, and determining sample diagnosis information similar to the diagnosis information in a sample diagnosis information base;
s102: querying a medicine vector corresponding to the medicine information, and querying a diagnosis vector corresponding to the sample diagnosis information; the medicine vector and the diagnosis vector are obtained through meta-path training, and the meta-path represents a path from medicine information to diagnosis information in the knowledge graph;
s103: and calculating the similarity of the medicine vector and the diagnosis vector, and determining that the prescription data passes the audit in response to the similarity being greater than or equal to a preset similarity threshold.
In the above embodiments, synonyms for indications in the drug specification, such as cerebral stroke and apoplexy, may be described in the specification as cerebral stroke, but stroke may be written in diagnosis, and common string matching cannot solve the problem, and it is necessary to maintain a synonym table or rely on a recommendation algorithm of a knowledge graph.
The purpose of the prescription indication audit is to determine if the drug matches the diagnosis in the prescription. Referring to fig. 2, if a drug is compared to a consumer, the properties of the drug may be analogous to the consumer's preference, and if a diagnosis is compared to a product, the relevant properties of the disease may be analogous to the properties of the product. Thus, prescription indication review may translate into item recommendation issues, i.e., determining whether the recommended item is of interest to the consumer. In the application field of item recommendation, items in a recommendation algorithm can refer to all content objects such as videos, articles, commodities, news and the like, and also can refer to items.
For step S101, the medical prescription data of the user is data collected during the user' S visit, and specifically includes drug information and diagnostic information. The diagnosis information is data for representing diagnosis results, such as cold, fever and pharyngalgia, and at least comprises inquiry information, diagnosis results and medication orders, wherein the medication orders are adjustment and use notes of a doctor on medicines according to clinical scenes, and medicines with low matching degree between medicine applicability and the current diagnosis results can be used based on the notes, so that the adaptability use of the medicines is improved, and the utilization rate of medical resources is improved.
The medicine information in the scheme mainly refers to the common name of the medicine. The common names of medicines refer to the names listed in the national medicine standards, and the common names such as aspirin can be used worldwide, and any medicine instruction book should be marked with common names, so that the medicine has mandatory and constraint properties. The trade name is the product name approved by the drug supervision and management department and determined by the drug manufacturer, has special property and cannot be imitated. Under the general name of a medicine, due to different manufacturers, a plurality of commodity names can be provided, such as cefoperazone sodium specification labels: cefoperazone sodium; [ trade name ] first resistance; [ PRIOR ART ] cephalosporin, cefoperazone, pioneer piprazole, oxypiperazine, sodium cephalosporin, pioneer piprazole, pioneer pine, cefpodoxime proxetil, and oxypiperazine cephalosporin. The prescription will generally have a common name, so the concept of a commercial name is not involved.
For steps S102 to S103, the present solution constructs a drug universal name vector table and a diagnostic vector table in advance, the drug universal name vector table stores drug universal name vectors, the diagnostic vector table stores diagnostic vectors, and the process is an offline operation, as shown in subsequent fig. 3, 4 (a) and 4 (b).
The medicine universal name vector representation corresponding to the current medicine universal name is inquired from the medicine universal name vector table, and the diagnosis vector representation corresponding to the current diagnosis information is inquired from the diagnosis vector table. By calculating the similarity of the two vectors, in some possible embodiments, the similarity calculation may be based on pearson correlation coefficients (Pearson Correlation Coefficient), euclidean distances (Euclidean Distance), etc., with cosine similarity being preferred in this case. When the obtained similarity exceeds a certain threshold (for example, 0.7), the medicine information and the diagnosis information in the medical prescription data of the user can be considered to be matched, and the prescription passes the examination, otherwise, the prescription does not pass. In this embodiment, the medical prescription data of the user is acquired as described above, the drug common name and the diagnostic information are input, and whether the prescription data passes the audit is output.
According to the method provided by the embodiment, a mode that the vector is still calculated by depending on a model in the prior application is abandoned, and the medicine universal name vector and the diagnosis vector are preset, so that the vector is directly inquired based on the uniqueness of the medicine universal name, and the diagnosis vector is inquired in a similarity calculation mode, so that the problem of diversity of diagnosis description is solved, and the efficiency and the accuracy rate of checking prescription data are further improved.
Referring to fig. 3, an optional data auditing method according to an embodiment of the present invention is shown, comprising the steps of:
s301: receiving an input sample drug diagnosis pairing relationship; the sample medicine diagnosis pairing relation comprises matched sample medicine information and sample diagnosis information;
s302: querying drug specification information corresponding to the sample drug information, extracting drug feature information from the drug specification information, and extracting diagnostic feature information from the sample diagnostic information;
s303: and constructing a knowledge graph based on the drug characteristic information and the diagnosis characteristic information, and training a machine learning model by using the knowledge graph to obtain a word vector model so as to obtain a drug vector and a diagnosis vector based on the word vector model.
In the above embodiment, for steps S301 to S302, before training the machine learning model, the present embodiment needs to obtain structured drug specification information and diagnostic information according to sample drug information and sample diagnostic information, and the sample drug diagnosis pairing relationship may be obtained from a prescription that the pharmacist examines and passes.
And calling out the corresponding drug instruction book based on the drug universal name. Structured drug specification information including a drug generic name, a efficacy attribute (i.e., a first attribute), a treatment site attribute (i.e., a second attribute), and a treatment indication attribute (i.e., a third attribute) may be obtained from the drug specification by a drug attribute extraction model.
Structured diagnostic information, including pathological process attributes (i.e., fourth attributes), lesion site attributes (i.e., fifth attributes), and sub-symptom attributes (i.e., sixth attributes), may be obtained by a diagnostic attribute extraction model. A treatment attribute table (e.g., pain management) is preset, and treatment attributes (i.e., seventh attributes) for the case procedure attributes may be queried from the table.
Taking the common name of the medicine as the Gentongping granule and the diagnosis name as the left scapulohumeral periarthritis as examples, the obtained medicine characteristic information comprises: the medicine is commonly named as Gentongping granule, the efficacy attribute is pain relieving, the treatment part attribute is shoulder, the treatment indication attribute is shoulder neck pain, and the obtained diagnosis characteristic information comprises: the diagnosis name is left scapulohumeral periarthritis, the pathological process attribute is pain, the disease part attribute is shoulder and sub-symptom attribute is shoulder neck pain, and the pain treatment method is inquired to be pain.
For step S303, the method trains the vectors of the drug common names and the diagnoses on the constructed knowledge graph, and the specific construction process of the knowledge graph is not in the scope of the method and is not described in detail herein. The recommendation algorithm used is preferably a meta-path based recommendation algorithm to further promote the interpretability of the model. The meta-path can also be considered as a group of text contents, and in the field of natural language processing, a statistical language model (i.e. a machine learning model) is the basis of all natural language processing technologies, and is widely applied to tasks such as text processing, speech recognition, machine translation, word segmentation, part-of-speech tagging, information retrieval and the like. Briefly, a statistical language model is a probabilistic model that is used to compute the probability of a sentence, typically constructed based on a corpus. When a statistical language model is built based on a neural network, text is converted into a numerical tensor, i.e., word vectorization.
Referring to fig. 4 (a), the knowledge graph constructed by the scheme includes three element paths, which are respectively: 1) Drug generic name-efficacy attribute-therapeutic attribute-pathological process attribute-diagnostic name; 2) Drug generic name-treatment site attribute-morbidity site attribute-diagnosis name; 3) Drug common name-treatment indication attribute-sub-symptom attribute-diagnosis name. Referring to fig. 4 (b), a first component path is constructed based on the pain relieving granules, the pain relieving, the pain and the scapulohumeral periarthritis, a second component path is constructed based on the pain relieving granules, the shoulder and the scapulohumeral periarthritis, and a third component path is constructed based on the pain relieving granules, the neck pain and the scapulohumeral periarthritis.
When training a model, first, a sliding window size for training the model needs to be determined, training sample pairs are obtained according to the determined sliding window size, and each group of training sample pairs comprises an input sample and an output sample. And training the model according to the training sample pair, thereby obtaining parameters of an hidden layer of the model. After training the model, the new task is not processed by the trained model, and what is really needed is that the model learn parameters through training data in order to obtain a vector representation of the text. The training model is utilized to train the first element path, the second element path and the third element path, after the word vector model is built and trained, the trained word vector model can be utilized to obtain the medicine universal name vector (namely medicine vector) and the diagnosis vector, and the process is off-line operation.
Taking fig. 4 (b) as an example, the vector representations of the two outputs of the radiculine particles and the left scapulohumeral periarthritis are respectively vectors of 1-256 dimensions. How many drug names are in the map, and how many 1 x 256 vectors are output by the diagnostic entity. The resulting drug generic name vector and diagnostic vector may be stored in MySQL, or in any other database. A medicine universal name vector table for storing medicine universal name vectors and a diagnosis vector table for storing diagnosis vectors may be preset. The two do not need to store matching relations, because the matching relations can be calculated in real time, and the calculation mode is cosine similarity of the two. Cosine similarity is a value of [ -1, 1] that is more toward 1, indicating that there is a therapeutic relationship between the two.
According to the method provided by the embodiment, the recommendation algorithm based on the knowledge graph is introduced into the prescription indication examination scene, so that the relationship between the medicine common name and the diagnosis is further mined, and the problem of diversified diagnosis description can be solved.
Referring to fig. 5, another alternative data auditing method flow diagram according to an embodiment of the present invention is shown, comprising the steps of:
S501: calling a diagnosis attribute extraction model, and extracting a fourth attribute, a fifth attribute and a sixth attribute from diagnosis information;
s502: determining a first sample diagnostic information set similar to a fourth attribute of the diagnostic information, a second sample diagnostic information set similar to a fifth attribute, and a third sample diagnostic information set similar to a sixth attribute in a sample diagnostic information library;
s503: intersecting the first sample diagnostic information set, the second sample diagnostic information set, and the third sample diagnostic information set;
s504: obtaining target sample diagnosis information similar to the diagnosis information in response to the fact that the number of sample diagnosis information in the intersection set is only one;
s505: and in response to the number of the sample diagnosis information in the intersection set being a plurality of, respectively calculating the similarity of the diagnosis information and each sample diagnosis information in the intersection set, and taking the sample diagnosis information with the highest similarity as target sample diagnosis information similar to the diagnosis information.
In the above embodiment, for steps S501 to S505, the drug common name is usually a word, and according to the foregoing description, it is a universal word in the world, which is generally unchanged, so that the corresponding drug common name vector can be directly queried from the drug common name vector table according to the drug common name. However, the diagnostic information generally covers a large amount of information, and the description diversity is considered, and even if the description for the same symptom is different, a certain means is required to be adopted for obtaining.
In actual operation, the text similarity between the current diagnosis information and each sample information in the sample diagnosis information base can be calculated. Keywords can be extracted from the information recorded by the two, so that the similarity between the keywords is calculated, and the calculation efficiency is improved. And taking the sample diagnosis information with the highest text similarity as target sample diagnosis information similar to the current diagnosis information.
According to the above description, the present solution mainly considers the pathological process attribute, the attack part attribute and the sub-symptom attribute of the diagnostic information, so that in order to further improve the calculation efficiency, the diagnostic attribute extraction model can be called again to extract the three attributes from the present diagnostic information. Considering the diversity of diagnostic information descriptions as well, there may be different situations in the three attributes of diagnostic information with the same efficacy, and thus it is preferable to calculate the similarity of the diagnostic information attributes, such as pain and pain, shoulder and shoulder, from the three attribute angles, respectively, to obtain three sets: pathological process attribute-first sample diagnosis information set, disease part attribute-second sample diagnosis information set, sub symptom attribute-third sample diagnosis information set.
If there is only one sample diagnosis information in the intersection of the three sets, it is determined directly as sample diagnosis information similar to the present sample information, and then a diagnosis vector corresponding to the sample diagnosis information is searched from a diagnosis vector table. If two or more sample diagnostic information exist, calculating the similarity between the current diagnostic information and each sample diagnostic information in the intersection by adopting a text similarity calculation mode or a keyword similarity calculation mode, and finally screening out one sample diagnostic information with the highest similarity.
According to the method provided by the embodiment, the sample diagnosis information similar to the current diagnosis information is determined through attribute similarity calculation, text similarity calculation and keyword similarity calculation, and the diagnosis vector of the sample diagnosis information is directly inquired, so that the problem of description diversity of the diagnosis information is further solved, and the high efficiency and accuracy of diagnosis vector inquiry are improved.
Referring to fig. 6, a schematic diagram of main modules of a data auditing apparatus 600 according to an embodiment of the present invention is shown, including:
the extraction module 601 is configured to obtain prescription data to be audited, extract drug information and diagnostic information from the prescription data, and determine sample diagnostic information similar to the diagnostic information in a sample diagnostic information base;
a query module 602, configured to query a drug vector corresponding to the drug information, and query a diagnostic vector corresponding to the sample diagnostic information; the medicine vector and the diagnosis vector are obtained through meta-path training, and the meta-path represents a path from medicine information to diagnosis information in the knowledge graph;
a calculation module 603, configured to calculate a similarity between the medicine vector and the diagnostic vector, and determine that the prescription data passes the audit in response to the similarity being greater than or equal to a preset similarity threshold.
The data auditing device of the invention further comprises a training module for:
receiving an input sample drug diagnosis pairing relationship; the sample medicine diagnosis pairing relation comprises matched sample medicine information and sample diagnosis information;
querying drug specification information corresponding to the sample drug information, extracting drug feature information from the drug specification information, and extracting diagnostic feature information from the sample diagnostic information;
and constructing a knowledge graph based on the drug characteristic information and the diagnosis characteristic information, and training a machine learning model by using the knowledge graph to obtain a word vector model so as to obtain a drug vector and a diagnosis vector based on the word vector model.
In the data auditing device of the present invention, the training module is configured to: invoking a medicine attribute extraction model, and extracting a medicine common name, a first attribute, a second attribute and a third attribute from the medicine specification information to construct medicine characteristic information;
the training module is used for: and calling a diagnosis attribute extraction model, extracting a diagnosis name, a fourth attribute, a fifth attribute and a sixth attribute from the sample diagnosis information, and inquiring a seventh attribute corresponding to the fourth attribute to construct diagnosis characteristic information.
In the data auditing device of the present invention, the training module is configured to:
constructing a first meta-path based on the medicine universal name, the first attribute, the seventh attribute, the fourth attribute and the diagnosis name; and
constructing a second binary path based on the drug common name, the second attribute, the fifth attribute and the diagnosis name; and
constructing a third sub-path based on the drug common name, the third attribute, the sixth attribute and the diagnosis name;
and constructing a knowledge graph based on the first meta-path, the second meta-path and the third meta-path.
In the data auditing device of the present invention, the extracting module 601 is configured to:
and respectively calculating the similarity of the diagnosis information and each sample diagnosis information in the sample diagnosis information base, and taking the sample diagnosis information with the highest similarity as target sample diagnosis information similar to the diagnosis information.
In the data auditing device of the present invention, the extracting module 601 is configured to:
calling a diagnostic attribute extraction model, and extracting a fourth attribute, a fifth attribute and a sixth attribute from the diagnostic information;
determining a first set of sample diagnostic information in the sample diagnostic information library that is similar to a fourth attribute of the diagnostic information, a second set of sample diagnostic information that is similar to a fifth attribute, and a third set of sample diagnostic information that is similar to a sixth attribute;
The first sample diagnosis information set, the second sample diagnosis information set and the third sample diagnosis information set are intersected, and target sample diagnosis information similar to the diagnosis information is obtained in response to the fact that the number of sample diagnosis information in the intersected set is only one; or (b)
And in response to the number of the sample diagnosis information in the intersection set being a plurality of, respectively calculating the similarity of the diagnosis information and each sample diagnosis information in the intersection set, and taking the sample diagnosis information with the highest similarity as target sample diagnosis information similar to the diagnosis information.
In addition, the implementation of the apparatus in the embodiments of the present invention has been described in detail in the above method, so that the description is not repeated here.
Fig. 7 shows an exemplary system architecture 700, including terminal devices 701, 702, 703, a network 704, and a server 705 (by way of example only), to which embodiments of the invention may be applied.
The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, are installed with various communication client applications, and a user may interact with the server 705 through the network 704 using the terminal devices 701, 702, 703 to receive or transmit messages, etc.
The network 704 is the medium used to provide communication links between the terminal devices 701, 702, 703 and the server 705. The network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The server 705 may be a server providing various services, and it should be noted that the method provided by the embodiment of the present invention is generally performed by the server 705, and accordingly, the apparatus is generally disposed in the server 705.
It should be understood that the number of terminal devices, networks and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, there is illustrated a schematic diagram of a computer system 800 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 801.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor comprises an extraction module, a query module and a calculation module. Where the names of the modules do not constitute a limitation on the module itself in some cases, for example, a query module may also be described as a "vector query module".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to perform any of the data auditing methods described above.
The computer program product of the present invention comprises a computer program which, when executed by a processor, implements the data auditing method of embodiments of the present invention.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (7)

1. A method of data auditing, comprising:
acquiring prescription data to be audited, extracting medicine information and diagnosis information from the prescription data, and determining sample diagnosis information similar to the diagnosis information in a sample diagnosis information base;
Querying a medicine vector corresponding to the medicine information, and querying a diagnosis vector corresponding to the sample diagnosis information; the medicine vector and the diagnosis vector are obtained through meta-path training, and the meta-path represents a path from medicine information to diagnosis information in the knowledge graph;
calculating the similarity of the medicine vector and the diagnosis vector, and determining that the prescription data passes the audit in response to the similarity being greater than or equal to a preset similarity threshold;
the determining sample diagnostic information similar to the diagnostic information in the sample diagnostic information base comprises:
calculating the similarity of the diagnosis information and each sample diagnosis information in a sample diagnosis information base respectively, wherein the sample diagnosis information with the highest similarity is used as target sample diagnosis information similar to the diagnosis information;
calling a diagnostic attribute extraction model, and extracting a fourth attribute, a fifth attribute and a sixth attribute from the diagnostic information;
determining a first set of sample diagnostic information in the sample diagnostic information library that is similar to a fourth attribute of the diagnostic information, a second set of sample diagnostic information that is similar to a fifth attribute, and a third set of sample diagnostic information that is similar to a sixth attribute;
The first sample diagnosis information set, the second sample diagnosis information set and the third sample diagnosis information set are intersected, and target sample diagnosis information similar to the diagnosis information is obtained in response to the fact that the number of sample diagnosis information in the intersected set is only one; or (b)
In response to the number of sample diagnostic information in the intersection set being a plurality of, respectively calculating the similarity of the diagnostic information and each sample diagnostic information in the intersection set, and using the sample diagnostic information with the highest similarity as target sample diagnostic information similar to the diagnostic information;
the fourth attribute is a pathological process attribute;
the fifth attribute is a disease part attribute;
the sixth attribute is a sub-symptom attribute.
2. The method of claim 1, further comprising, prior to acquiring the prescription data:
receiving an input sample drug diagnosis pairing relationship; the sample medicine diagnosis pairing relation comprises matched sample medicine information and sample diagnosis information;
querying drug specification information corresponding to the sample drug information, extracting drug feature information from the drug specification information, and extracting diagnostic feature information from the sample diagnostic information;
And constructing a knowledge graph based on the drug characteristic information and the diagnosis characteristic information, and training a machine learning model by using the knowledge graph to obtain a word vector model so as to obtain a drug vector and a diagnosis vector based on the word vector model.
3. The method of claim 2, wherein the extracting drug characteristic information from the drug specification information comprises: invoking a medicine attribute extraction model, and extracting a medicine common name, a first attribute, a second attribute and a third attribute from the medicine specification information to construct medicine characteristic information;
the extracting diagnostic feature information from the sample diagnostic information includes: calling a diagnosis attribute extraction model, extracting a diagnosis name, a fourth attribute, a fifth attribute and a sixth attribute from the sample diagnosis information, and inquiring a seventh attribute corresponding to the fourth attribute to construct diagnosis feature information;
the first attribute is an efficacy attribute;
the second attribute is a treatment part attribute;
the third attribute is a treatment indication attribute;
the seventh attribute is a treatment attribute.
4. A method according to claim 3, wherein said constructing a knowledge-graph based on said drug characteristic information and said diagnostic characteristic information comprises:
Constructing a first meta-path based on the medicine universal name, the first attribute, the seventh attribute, the fourth attribute and the diagnosis name; and
constructing a second binary path based on the drug common name, the second attribute, the fifth attribute and the diagnosis name; and
constructing a third sub-path based on the drug common name, the third attribute, the sixth attribute and the diagnosis name;
and constructing a knowledge graph based on the first meta-path, the second meta-path and the third meta-path.
5. A data auditing apparatus, comprising:
the extraction module is used for acquiring prescription data to be audited, extracting medicine information and diagnosis information from the prescription data and determining sample diagnosis information similar to the diagnosis information in the sample diagnosis information base;
the query module is used for querying the medicine vector corresponding to the medicine information and querying the diagnosis vector corresponding to the sample diagnosis information; the medicine vector and the diagnosis vector are obtained through meta-path training, and the meta-path represents a path from medicine information to diagnosis information in the knowledge graph;
the calculating module is used for calculating the similarity of the medicine vector and the diagnosis vector, and determining that the prescription data passes the audit in response to the similarity being greater than or equal to a preset similarity threshold;
The determining sample diagnostic information similar to the diagnostic information in the sample diagnostic information base comprises:
calculating the similarity of the diagnosis information and each sample diagnosis information in a sample diagnosis information base respectively, wherein the sample diagnosis information with the highest similarity is used as target sample diagnosis information similar to the diagnosis information;
the determining sample diagnostic information similar to the diagnostic information in the sample diagnostic information base further comprises:
calling a diagnostic attribute extraction model, and extracting a fourth attribute, a fifth attribute and a sixth attribute from the diagnostic information;
determining a first set of sample diagnostic information in the sample diagnostic information library that is similar to a fourth attribute of the diagnostic information, a second set of sample diagnostic information that is similar to a fifth attribute, and a third set of sample diagnostic information that is similar to a sixth attribute;
the first sample diagnosis information set, the second sample diagnosis information set and the third sample diagnosis information set are intersected, and target sample diagnosis information similar to the diagnosis information is obtained in response to the fact that the number of sample diagnosis information in the intersected set is only one; or (b)
In response to the number of sample diagnostic information in the intersection set being a plurality of, respectively calculating the similarity of the diagnostic information and each sample diagnostic information in the intersection set, and using the sample diagnostic information with the highest similarity as target sample diagnostic information similar to the diagnostic information;
The fourth attribute is a pathological process attribute;
the fifth attribute is a disease part attribute;
the sixth attribute is a sub-symptom attribute.
6. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-4.
7. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-4.
CN202211628930.2A 2022-12-19 2022-12-19 Data auditing method and device Active CN115620886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211628930.2A CN115620886B (en) 2022-12-19 2022-12-19 Data auditing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211628930.2A CN115620886B (en) 2022-12-19 2022-12-19 Data auditing method and device

Publications (2)

Publication Number Publication Date
CN115620886A CN115620886A (en) 2023-01-17
CN115620886B true CN115620886B (en) 2023-04-28

Family

ID=84880014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211628930.2A Active CN115620886B (en) 2022-12-19 2022-12-19 Data auditing method and device

Country Status (1)

Country Link
CN (1) CN115620886B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116936021A (en) * 2023-09-18 2023-10-24 万链指数(青岛)信息科技有限公司 Medical electronic medical record information management method and system based on blockchain

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782998A (en) * 2019-10-12 2020-02-11 平安医疗健康管理股份有限公司 Data auditing method and device, computer equipment and storage medium
CN111191020A (en) * 2019-12-27 2020-05-22 江苏省人民医院(南京医科大学第一附属医院) Prescription recommendation method and system based on machine learning and knowledge graph

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8059001B2 (en) * 2009-05-22 2011-11-15 Bio-Rad Laboratories, Inc. System and method for automatic quality control of clinical diagnostic processes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782998A (en) * 2019-10-12 2020-02-11 平安医疗健康管理股份有限公司 Data auditing method and device, computer equipment and storage medium
CN111191020A (en) * 2019-12-27 2020-05-22 江苏省人民医院(南京医科大学第一附属医院) Prescription recommendation method and system based on machine learning and knowledge graph

Also Published As

Publication number Publication date
CN115620886A (en) 2023-01-17

Similar Documents

Publication Publication Date Title
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
US10496748B2 (en) Method and apparatus for outputting information
CN108920453A (en) Data processing method, device, electronic equipment and computer-readable medium
US20200279147A1 (en) Method and apparatus for intelligently recommending object
WO2023015935A1 (en) Method and apparatus for recommending physical examination item, device and medium
CN110265099B (en) Method and device for outputting medical records
US20160188701A1 (en) File recognition system and method
Chou et al. Integrating XBRL data with textual information in Chinese: A semantic web approach
CN113707299A (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
CN115620886B (en) Data auditing method and device
CN113782195A (en) Physical examination package customization method and device
CN112309565A (en) Method, apparatus, electronic device, and medium for matching drug information and disorder information
CN111523309A (en) Medicine information normalization method and device, storage medium and electronic equipment
CN109086438B (en) Method and device for inquiring information
US20220293253A1 (en) Systems and methods using natural language processing to improve computer-assisted coding
CN115762704A (en) Prescription auditing method, device, equipment and storage medium
CN115862840A (en) Intelligent auxiliary diagnosis method and device for arthralgia diseases
CN114882985A (en) Medicine multimedia management system and method based on database and AI algorithm identification
CN113220896A (en) Multi-source knowledge graph generation method and device and terminal equipment
CN113821641A (en) Method, device, equipment and storage medium for medicine classification based on weight distribution
CN114116838B (en) Data processing method, data processing device, electronic equipment and storage medium
CN113053522B (en) Method, apparatus, device, medium, and product for processing medical data
CN112786132B (en) Medical record text data segmentation method and device, readable storage medium and electronic equipment
CN112925876B (en) Method, device, medium and equipment for processing structured medical record migrated across sites
US11636933B2 (en) Summarization of clinical documents with end points thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant