WO2021114626A1 - 一种病历数据的质量检测方法和相关装置 - Google Patents

一种病历数据的质量检测方法和相关装置 Download PDF

Info

Publication number
WO2021114626A1
WO2021114626A1 PCT/CN2020/099270 CN2020099270W WO2021114626A1 WO 2021114626 A1 WO2021114626 A1 WO 2021114626A1 CN 2020099270 W CN2020099270 W CN 2020099270W WO 2021114626 A1 WO2021114626 A1 WO 2021114626A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature vector
sample
medical record
anchor
record data
Prior art date
Application number
PCT/CN2020/099270
Other languages
English (en)
French (fr)
Inventor
唐蕊
李彦轩
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021114626A1 publication Critical patent/WO2021114626A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of computer technology, and in particular to a method and related devices for quality detection of medical record data.
  • the embodiment of the present application provides a method and related device for detecting the quality of medical record data.
  • the implementation of the embodiment of the present application improves the efficiency of quality testing of the medical record data, and can reflect the quality of the medical record data in a timely manner.
  • the first aspect of the present application provides a method for detecting the quality of medical record data, which includes: obtaining medical record data to be tested for quality; vectorizing the medical record data to be tested for quality to obtain a first feature vector; and obtaining anchor sample medical record data Set a one-to-one corresponding first anchor sample feature vector set; input each first anchor sample feature vector in the first anchor sample feature vector set into a trained generator to obtain multiple second anchor sample feature vectors; According to the plurality of second anchor sample feature vectors, determine the average feature vector of the first anchor sample; perform vector operations on the first feature vector and the average feature vector of the first anchor sample to obtain the second feature vector; The second feature vector is input into the trained discriminator, and the quality detection result is obtained.
  • the second aspect of the present application provides a quality testing device for medical record data, which includes: an acquisition module for acquiring medical record data to be tested for quality; a processing module for vectorizing the medical record data for quality testing to obtain The first feature vector; the acquisition module is also used to acquire a first anchor sample feature vector set corresponding to the anchor sample medical record data set; the processing module is also used to collect the first anchor sample feature vector set The feature vector of each first anchor sample is input into the trained generator to obtain multiple second anchor sample feature vectors; the processing module is further configured to determine the first anchor sample feature vector according to the multiple second anchor sample feature vectors The sample average feature vector; the processing module is also used to perform vector operations on the first feature vector and the first anchor sample average feature vector to obtain a second feature vector; the processing module is also used to The second feature vector is input into the trained discriminator, and the quality detection result is obtained.
  • the third aspect of the present application provides an electronic device for quality detection of medical record data, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory, And it is generated and executed by the processor to implement the following steps: obtain medical record data to be quality tested; vectorize the medical record data to be quality tested to obtain a first feature vector; obtain an anchor sample medical record data set one by one Corresponding first anchor sample feature vector set; input each first anchor sample feature vector in the first anchor sample feature vector set into the trained generator to obtain multiple second anchor sample feature vectors; Multiple second anchor sample feature vectors to determine the average feature vector of the first anchor sample; perform vector operations on the first feature vector and the average feature vector of the first anchor sample to obtain a second feature vector; The feature vector is input into the trained discriminator, and the quality detection result is obtained.
  • the fourth aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the stored computer program is executed by the processor to implement the following steps: The detected medical record data; vectorize the medical record data to be quality-tested to obtain the first feature vector; obtain the first anchor sample feature vector set corresponding to the anchor sample medical record data set; The feature vector of each first anchor sample in the vector set is input into the trained generator to obtain a plurality of second anchor sample feature vectors; according to the plurality of second anchor sample feature vectors, an average feature vector of the first anchor sample is determined; Perform vector operations on the first feature vector and the average feature vector of the first anchor sample to obtain a second feature vector; input the second feature vector into a trained discriminator to obtain a quality detection result.
  • the fifth aspect of the present application provides a computer program product, which includes computer instructions, which when the computer instructions run on the device as described in the third aspect, cause the device to execute a method for detecting the quality of medical record data Any of the methods
  • the second feature vector obtained by performing vector calculation between the first feature vector corresponding to the medical record data to be quality-tested and the average feature vector of the first anchor sample is input into the trained discriminator, and the quality is obtained.
  • the detection result avoids the problem of inaccurate quality detection results obtained after processing the first feature vector corresponding to the medical record data to be quality-detected by directly using the trained discriminator.
  • the discriminator by using the discriminator to determine the quality inspection result of the medical record data to be quality-tested, the quality inspection efficiency of the medical record data is improved, and the quality of the medical record data can be reflected in a timely manner.
  • Fig. 1 is a schematic diagram of a quality inspection system for medical record data provided by an embodiment of the present application.
  • FIG. 2A is a schematic flowchart of a method for detecting the quality of medical record data according to an embodiment of the application.
  • FIG. 2B is a schematic diagram of an encoder provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of another method for detecting the quality of medical record data according to an embodiment of the application.
  • FIG. 4 is a schematic diagram of a device for detecting the quality of medical record data provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of the structure of an electronic device in a hardware operating environment related to an embodiment of the application.
  • FIG. 1 is a schematic diagram of a quality inspection system for medical record data provided by an embodiment of the present application.
  • the quality inspection system 100 includes a quality inspection processing device 110.
  • the quality testing and processing device 110 is used to process the medical record data to be quality tested.
  • the quality inspection system 100 may include an integrated single device or multiple devices.
  • the quality inspection system 100 is collectively referred to as an electronic device in this application.
  • the electronic device can include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (UE), mobile Station (Mobile Station, MS), terminal device (terminal device), etc.
  • UE user equipment
  • MS mobile Station
  • terminal device terminal device
  • an embodiment of the present application proposes a method for detecting the quality of medical record data to solve the above-mentioned problems.
  • FIG. 2A is a schematic flowchart of a method for detecting the quality of medical record data according to an embodiment of the application. As shown in Figure 2A, the method includes:
  • the medical record data to be tested for quality may include words, symbols, charts, graphs, data, images, etc. Furthermore, the medical record data to be tested for quality includes gender, age, birth date, name, drug name, etc.
  • the medical record data to be tested for quality can be obtained from the blockchain.
  • the blockchain is a chained data structure that connects data blocks in chronological order, and is a distributed ledger that cannot be tampered with or forged that is guaranteed by cryptography.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the characteristics of the blockchain include openness, consensus, decentralization, trustlessness, transparency, anonymity of both parties, immutability, and traceability.
  • openness and transparency mean that anyone can participate in the blockchain network, each device can be used as a node, and each node is allowed to obtain a complete copy of the database.
  • Nodes are based on a set of consensus mechanisms and jointly maintain the entire blockchain through competitive computing. If any node fails, the remaining nodes can still work normally.
  • decentralization and de-trusting are arbitrary blockchains. Many nodes jointly form an end-to-end network, and there is no centralized equipment and management organization.
  • the data exchange between nodes is verified by digital signature technology, without mutual trust, as long as it is carried out in accordance with the established rules of the system, the nodes cannot and cannot deceive other nodes.
  • transparency and anonymity of both parties mean that the operating rules of the blockchain are open, and all data information is also open, so every transaction is visible to all nodes. Since there is no trust between nodes, there is no need to disclose identities between nodes, and each participating node is anonymous.
  • non-tampering and traceability means that the modification of the database by each or even multiple nodes cannot affect the databases of other nodes, unless it can control more than 51% of the nodes in the entire network to modify at the same time, which is almost impossible to happen.
  • each transaction is connected in series with two adjacent blocks through cryptographic methods, so it can be traced back to any transaction record.
  • the blockchain can use the block chain data structure to verify and store data, use the distributed node consensus algorithm to generate and update data, use cryptography to ensure the security of data transmission and access, and use automated script codes.
  • the smart contract makes all the terms written as a program, these terms can be automatically executed on the blockchain, which ensures that when there are conditions that trigger the smart contract, the blockchain can be forced to execute according to the content in the smart contract, and not Obstructed by any external force, thus ensuring the validity and execution of the contract, not only can greatly reduce costs, but also improve efficiency.
  • Each node on the blockchain has the same ledger, which can ensure that the ledger recording process is open and transparent.
  • Blockchain technology can realize a kind of point-to-point, open and transparent direct interaction, making a high-efficiency, large-scale, non-centralized agent information interaction method a reality.
  • the data type corresponding to the medical record data to be tested for quality includes a continuous type or a categorical type.
  • the method further includes : Perform coding processing on the data of the category type in the medical record data to be tested for quality, to obtain the coded medical record data to be tested for quality.
  • one-hot encoding may be used to encode data of the category type in the medical record data to be tested for quality, to obtain the encoded medical record data to be tested for quality. It is understandable that there is no need to perform encoding processing on data whose data type is continuous in the medical record data to be tested for quality.
  • the data type corresponding to the gender in the medical record data to be tested for quality is categorical, then the gender needs to be coded.
  • the data type corresponding to the age in the medical record data to be quality-tested is continuous, so there is no need to encode the age.
  • the vectorization of the medical record data to be quality-tested to obtain the first feature vector includes: encoding the medical record data to be quality-tested and the data type of the medical record data to be quality-tested as continuous The data is vectorized to obtain the first feature vector. It is understandable that the encoded medical record data to be tested for quality and the data of the continuous type in the medical record data to be tested for quality can be input into the encoder to obtain the first feature vector.
  • FIG. 2B is a schematic diagram of an encoder provided by an embodiment of the application.
  • the encoder includes an input layer, at least one hidden layer, and an output layer.
  • the input layer is an n-dimensional input layer
  • the hidden layer is an m-dimensional hidden layer
  • the output layer is an n-dimensional output layer.
  • n and m are both integers greater than 1, and m is much smaller than n.
  • the mapping of the hidden layer is used as an encoder
  • the mapping of the output layer is used as a decoder
  • the network structure of the encoder is a multi-layer self-encoding network structure.
  • the encoded medical record data to be tested for quality and the data whose data type is continuous in the medical record data to be tested for quality are n-dimensional high-dimensional vectors.
  • the first feature vector is an m-dimensional low-dimensional vector, that is, the first feature vector is the output data of the last hidden layer.
  • the obtaining the first anchor sample feature vector set corresponding to the anchor sample medical record data set one-to-one includes: obtaining the sample medical record data set to be trained and the sample medical record data to be trained Collect the quality scores corresponding to each sample medical record data to be trained; according to the quality scores corresponding to each sample medical record data to be trained in the sample medical record data set to be trained, determine the sample medical records to be trained with the quality score in the first preset scoring interval Data to obtain an anchor sample medical record data set; each anchor sample medical record data in the anchor sample medical record data set is vectorized to obtain the first anchor sample feature vector set.
  • the medical record data of the sample to be trained with the quality score in the second preset scoring interval can be determined to obtain the positive sample medical record data set; It is also possible to determine the sample medical record data to be trained with a quality score in the third preset scoring interval according to the quality score corresponding to each sample medical record data to be trained in the sample medical record data set to be trained to obtain a negative sample medical record data set. It is understandable that the first preset scoring interval is higher than the second preset scoring interval, and the second preset scoring interval is higher than the third preset scoring interval.
  • each anchor sample medical record data in the anchor sample medical record data set is higher than the quality score corresponding to each positive sample medical record data in the positive sample medical record data set, and each positive sample medical record in the positive sample medical record data set The quality score corresponding to the data is higher than the quality score corresponding to each negative sample medical record data in the negative sample medical record data set.
  • each negative sample medical record data in the negative sample medical record data set has quality problems.
  • each positive sample medical record data in the positive sample medical record data set and each anchor sample medical record data in the anchor sample medical record data set have no quality problems.
  • the sample medical record data to be trained with the quality score in the first preset scoring interval, the completion time of the medical record, writing format paragraphs, medical terminology, three-level rounds, informed consent, anesthesia visits, diagnosis and treatment, and auxiliary examinations , Nosocomial infections and/or use of antibacterial drugs are in compliance with the regulations;
  • the medical record data of the sample to be trained with the quality score in the second preset score interval, its medical terminology, three-level rounds, informed consent, anesthesia visits, diagnosis and treatment, and auxiliary examinations , Nosocomial infections and/or use of antibacterial drugs are in compliance with regulations;
  • Sample medical record data to be trained with quality scores in the third preset scoring interval, its medical terms, three-level rounds, anesthesia visits, diagnosis and treatment, nosocomial infections and/or The use of antibacterial drugs is in compliance with the regulations.
  • the data type corresponding to each sample medical record data to be trained in the sample medical record data set to be trained includes continuous type or categorical type. That is, the data type corresponding to each anchor sample medical record data in the anchor sample medical record data set includes continuous type or categorical type, and the data type corresponding to each positive sample medical record data in the positive sample medical record data set includes continuous type or categorical type. The data type corresponding to each negative sample medical record data in the negative sample medical record data set includes continuous type or categorical type.
  • the anchor sample medical record data P is any piece of data in the anchor sample medical record data set, and each anchor sample medical record data in the anchor sample medical record data set is vectorized to obtain the first anchor sample feature vector Before the collection, the method further includes: encoding the data of the category type in the anchor sample medical record data P to obtain the encoded anchor sample medical record data P.
  • the one-hot encoding may be used to encode the data of the category type in the anchor sample medical record data P to obtain the encoded anchor sample medical record data P. It is understandable that there is no need to perform encoding processing on the data whose data type is continuous in the anchor sample medical record data P.
  • vectorizing each anchor sample medical record data in the anchor sample medical record data set to obtain the first anchor sample feature vector set includes: encoding the anchor sample medical record data P and the anchor sample medical record data P The data whose data type is continuous is vectorized to obtain the first anchor sample feature vector corresponding to the anchor sample medical record data P. It is understandable that the encoded anchor sample medical record data P and the continuous data in the anchor sample medical record data P can be input into the encoder to obtain the first anchor sample feature vector corresponding to the anchor sample medical record data P. It is understandable that the anchor sample medical record data set has a one-to-one correspondence with the first anchor sample feature vector set.
  • the encoded anchor sample medical record data P and the anchor sample medical record data P whose data type is continuous are n-dimensional high-dimensional vectors.
  • the first anchor sample feature vector corresponding to the anchor sample medical record data P is an m-dimensional low-dimensional vector, that is, the first anchor sample feature vector corresponding to the anchor sample medical record data P is the output data of the last hidden layer.
  • the sample medical record data set to be trained and the quality score corresponding to each sample medical record data to be trained in the sample medical record data set to be trained are obtained; according to each sample medical record data set to be trained The quality score corresponding to the sample medical record data is determined, the sample medical record data to be trained with the quality score in the first preset scoring interval is determined, and the anchor sample medical record data set is obtained; each anchor sample medical record data in the anchor sample medical record data set is vectored To obtain the first anchor sample feature vector set.
  • the classification of the sample medical record data set to be trained is realized by determining the medical record data of the sample to be trained with the quality score in the first preset scoring interval, and the anchor sample medical record data set with the quality score in the first preset scoring interval is also obtained.
  • the anchor sample feature vector set is obtained, and it is prepared for the subsequent determination of the average feature vector of the anchor sample.
  • high-dimensional vectors are converted to low-dimensional vectors, which simplifies the difficulty of learning the distribution of anchor sample medical record data for the trained generator, and improves the learning efficiency.
  • the average feature vector of the first anchor sample is an m-dimensional low-dimensional vector.
  • the value in the i-th row and j-th column in the first anchor sample average feature vector is an average value of the values in the i-th row and j-th column in each second anchor sample feature vector in the plurality of second anchor sample feature vectors.
  • i and j are integers greater than 0, and the values of i and j are related to the average feature vector of the first anchor sample.
  • the plurality of second anchor sample feature vectors include a second anchor sample feature vector N1 and a second anchor sample feature vector N2.
  • the second anchor sample feature vector N1 is The second anchor sample feature vector N2 is Then, the average feature vector of the first anchor sample is
  • the vector operation can be, for example, vector addition, vector subtraction, vector product, etc., which is not limited here.
  • the medical record data to be tested for quality is obtained; the medical record data to be tested for quality is vectorized to obtain the first feature vector; the first anchor sample corresponding to the anchor sample medical record data set is obtained one-to-one Feature vector set; input each first anchor sample feature vector in the first anchor sample feature vector set into the trained generator to obtain multiple second anchor sample feature vectors; according to the multiple second anchor samples Feature vector, determine the average feature vector of the first anchor sample; perform vector operations on the first feature vector and the average feature vector of the first anchor sample to obtain a second feature vector; input the second feature vector into the trained In the discriminator, the quality inspection result is obtained.
  • the second feature vector after the vector calculation of the first feature vector corresponding to the medical record data to be quality-tested and the average feature vector of the first anchor sample is input to the trained discriminator, and the quality detection result is obtained, avoiding the direct use of the training.
  • the discriminator determines the quality inspection result of the medical record data to be quality-tested, the quality inspection efficiency of the medical record data is improved, and the quality of the medical record data can be reflected in a timely manner.
  • the inputting the second feature vector into the trained discriminator to obtain the quality detection result includes: inputting the second feature vector into the trained discriminator, Obtain the quality detection value; when the quality detection value is higher than the threshold value, determine that the quality detection result is that the medical record data to be quality detection has no quality problem; when the quality detection value is lower than the threshold value, determine the quality The detection result is that the medical record data to be quality-tested has quality problems.
  • the quality detection value is a floating point number in a preset interval, and the preset interval is [0,1]. Further, when the quality test value is higher than the threshold, the label is 1, that is, the quality test result is that the medical record data to be quality tested has no quality problems; when the quality test value is lower than the threshold, the label is 0, and the quality test result is to be quality The medical record data tested has quality problems.
  • a threshold adjustment interface may also be displayed, and the threshold adjustment interface includes a threshold input box and a confirmation button. The user can input the threshold in the threshold input box and operate the confirmation button to realize the dynamic adjustment of the threshold.
  • the second feature vector is input into the trained discriminator to obtain the quality detection value; when the quality detection value is higher than the threshold, it is determined that the quality detection result is the pending
  • the medical record data of the quality test has no quality problem; when the quality test value is lower than the threshold, it is determined that the quality test result is that the medical record data to be quality tested has a quality problem, and the medical record data to be quality tested is determined by using a discriminator
  • the quality inspection results of the medical records improve the efficiency of the quality inspection of the medical record data, and can reflect the quality of the medical record data in a timely manner.
  • the quality test results of the medical record data can be dynamically controlled.
  • FIG. 3 is a schematic flowchart of another method for detecting the quality of medical record data according to an embodiment of the application. Wherein, as shown in FIG. 3, the method further includes:
  • the generator to be trained includes an input layer, multiple hidden layers, and an output layer. It should be noted that the input layer is an m-dimensional input layer, the hidden layer is a k-dimensional hidden layer, and the output layer is an m-dimensional output layer. Wherein, k is an integer greater than 1 and less than m. Further, the network structure of the generator to be trained is a deep neural network.
  • each first anchor sample feature vector in the first anchor sample feature vector set is an m-dimensional low-dimensional vector.
  • Each third anchor sample feature vector in the multiple third anchor sample feature vectors is an m-dimensional low-dimensional vector, that is, each third anchor sample feature vector in the multiple third anchor sample feature vectors is to be trained The output data of the output layer of the generator.
  • the average feature vector of the second anchor sample is a low-dimensional vector with m dimensions.
  • the value in the a-th row and the b-th column in the second anchor sample average feature vector is an average of the values in the a-th row and the b-th column in each of the multiple third anchor sample feature vectors.
  • a and b are integers greater than 0, and the values of a and b are related to the average feature vector of the second anchor sample.
  • the plurality of third anchor sample feature vectors include a third anchor sample feature vector M1 and a third anchor sample feature vector M2.
  • the third anchor sample feature vector M1 is The third anchor sample feature vector M2 is Then, the average feature vector of the second anchor sample is
  • the positive sample medical record data X is any piece of data in the positive sample medical record data set
  • the obtaining a one-to-one correspondence of the positive sample feature vector set of the positive sample medical record data set includes: The data type of the positive sample medical record data X is encoded to obtain the encoded positive sample medical record data X; the encoded positive sample medical record data X and the data type of the positive sample medical record data X will be continuous
  • the data of is vectorized, and the positive sample feature vector corresponding to the positive sample medical record data X is obtained.
  • one-hot encoding may be used to encode data of the type of data in the positive sample medical record data X to obtain the encoded positive sample medical record data X. It is understandable that there is no need to encode data whose data type is continuous in the positive sample medical record data X.
  • the encoded positive sample medical record data X and the data whose data type is continuous in the positive sample medical record data X are n-dimensional high-dimensional vectors.
  • the positive sample feature vector corresponding to the positive sample medical record data X is an m-dimensional low-dimensional vector, that is, the positive sample feature vector corresponding to the positive sample medical record data X is the output data of the last hidden layer.
  • the negative sample medical record data Y is any piece of data in the negative sample medical record data set
  • the obtaining a one-to-one corresponding negative sample feature vector set of the negative sample medical record data set includes: the data type in the negative sample medical record data Y is The categorical data is encoded to obtain the encoded negative sample medical record data Y; the encoded negative sample medical record data Y and the data of the continuous type in the negative sample medical record data Y are vectorized to obtain the negative sample medical record The negative sample feature vector corresponding to data Y.
  • one-hot encoding may be used to encode the data of the category type in the negative sample medical record data Y, to obtain the encoded negative sample medical record data Y. It is understandable that there is no need to perform encoding processing on data whose data type is continuous in the negative sample medical record data Y.
  • the encoded negative sample medical record data Y and the data of the continuous type in the negative sample medical record data Y are n-dimensional high-dimensional vectors.
  • the negative sample feature vector corresponding to the negative sample medical record data Y is an m-dimensional low-dimensional vector, that is, the negative sample feature vector corresponding to the negative sample medical record data Y is the output data of the last hidden layer.
  • the first sample feature vector A is any vector in the first sample feature vector set, and the value of the first sample feature vector A is used to indicate the value corresponding to the first sample feature vector A The distance between the positive sample feature vector and the average feature vector of the second anchor sample.
  • the value of the first sample feature vector A is greater, the positive sample feature corresponding to the first sample feature vector A The closer the vector is to the average feature vector of the second anchor sample, the smaller the value of the first sample feature vector A is, the farther away the positive sample feature vector corresponding to the first sample feature vector A is.
  • the second sample feature vector B is any vector in the second sample feature vector set, and the value of the second sample feature vector B is used to represent the negative sample feature vector corresponding to the second sample feature vector B The distance between the average feature vector of the second anchor sample and the average feature vector of the second sample.
  • the negative sample feature vector corresponding to the second sample feature vector B is closer to all the features.
  • the second anchor sample average feature vector the smaller the value of the second sample feature vector B is, the further away the negative sample feature vector corresponding to the second sample feature vector B is from the second anchor sample average feature vector .
  • each first anchor sample feature vector in the first anchor sample feature vector set is input into the generator to be trained to obtain multiple third anchor sample feature vectors;
  • a third anchor sample feature vector to determine the average feature vector of the second anchor sample; to obtain the one-to-one correspondence of the positive sample feature vector set of the positive sample medical record data set and the negative sample feature vector set of the negative sample medical record data set;
  • the second anchor sample average feature vector is subjected to vector operations with each positive sample feature vector in the positive sample feature vector set to obtain a first sample feature vector set;
  • the second anchor sample average feature vector is compared with the negative Perform vector operations on each negative sample feature vector in the sample feature vector set to obtain a second sample feature vector set; input the first sample feature vector set and the second sample feature vector set into the discriminator to be trained, respectively ,
  • Obtain the trained discriminator use the anchor sample feature vector generated by the generator to be trained to obtain the anchor sample average feature vector, and use the anchor sample average feature vector to perform vector operations with the positive sample feature vector and the negative sample feature vector respectively
  • FIG. 4 is a schematic diagram of a device for detecting the quality of medical record data according to an embodiment of the application.
  • an apparatus 400 for detecting the quality of medical record data provided by an embodiment of the present application may include:
  • the obtaining module 401 is used to obtain medical record data to be tested for quality
  • the processing module 402 is configured to vectorize the medical record data to be tested for quality to obtain a first feature vector
  • the acquiring module 401 is further configured to acquire a first anchor sample feature vector set corresponding to the anchor sample medical record data set one-to-one;
  • the obtaining module 401 is specifically configured to obtain the sample medical record data set to be trained and The quality score corresponding to each sample medical record data to be trained in the sample medical record data set to be trained;
  • the processing module 402 is specifically configured to perform according to the quality score corresponding to each sample medical record data to be trained in the sample medical record data set to be trained, Determine the medical record data of the sample to be trained with the quality score in the first preset scoring interval to obtain the anchor sample medical record data set; vectorize each anchor sample medical record data in the anchor sample medical record data set to obtain the first anchor Sample feature vector set.
  • the processing module 402 is further configured to input each first anchor sample feature vector in the first anchor sample feature vector set into a trained generator to obtain multiple second anchor sample feature vectors;
  • the processing module 402 is further configured to determine an average feature vector of the first anchor sample according to the multiple second anchor sample feature vectors;
  • the processing module 402 is further configured to perform vector operations on the first feature vector and the average feature vector of the first anchor sample to obtain a second feature vector;
  • the processing module 402 is further configured to input the second feature vector into the trained discriminator to obtain a quality detection result.
  • the processing module 402 is further configured to input each first anchor sample feature vector in the first anchor sample feature vector set into the generator to be trained to obtain A plurality of third anchor sample feature vectors; the processing module 402 is further configured to determine the average feature vector of the second anchor sample according to the plurality of third anchor sample feature vectors; the acquisition module 401 is also configured to acquire positive One-to-one correspondence between the positive sample feature vector set of the sample medical record data set and the negative sample feature vector set of the negative sample medical record data set; the processing module 402 is further configured to compare the average feature vector of the second anchor sample with all the feature vectors of the second anchor sample.
  • Each positive sample feature vector in the positive sample feature vector set is subjected to vector operations to obtain a first sample feature vector set; the processing module 402 is further configured to compare the average feature vector of the second anchor sample with the negative sample Perform vector operations on each negative sample feature vector in the feature vector set to obtain a second sample feature vector set; the processing module 402 is further configured to combine the first sample feature vector set and the second sample feature vector set Input into the discriminator to be trained respectively to obtain the trained discriminator.
  • the first sample feature vector A is any vector in the first sample feature vector set, and the value of the first sample feature vector A is used to indicate the value corresponding to the first sample feature vector A The distance between the positive sample feature vector and the average feature vector of the second anchor sample.
  • the value of the first sample feature vector A is greater, the positive sample feature corresponding to the first sample feature vector A The closer the vector is to the average feature vector of the second anchor sample, the smaller the value of the first sample feature vector A is, the farther away the positive sample feature vector corresponding to the first sample feature vector A is.
  • the second sample feature vector B is any vector in the second sample feature vector set, and the value of the second sample feature vector B is used to represent the negative sample feature vector corresponding to the second sample feature vector B The distance between the average feature vector of the second anchor sample and the average feature vector of the second sample.
  • the negative sample feature vector corresponding to the second sample feature vector B is closer to all the features.
  • the second anchor sample average feature vector the smaller the value of the second sample feature vector B is, the further away the negative sample feature vector corresponding to the second sample feature vector B is from the second anchor sample average feature vector .
  • the processing module 402 when the second feature vector is input to the trained discriminator, and the quality detection result is obtained, the processing module 402 is specifically configured to use the second feature vector The vector is input into the trained discriminator to obtain the quality detection value; when the quality detection value is higher than the threshold, it is determined that the quality detection result is that the medical record data to be quality tested has no quality problems; in the quality detection value When it is lower than the threshold, it is determined that the quality test result is that the medical record data to be quality tested has a quality problem.
  • FIG. 5 is a schematic diagram of the structure of an electronic device in a hardware operating environment involved in an embodiment of the application.
  • the embodiment of the present application provides an electronic device for detecting the quality of medical record data, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory, and It is configured to be executed by the processor to execute instructions including any one of the steps in the method for detecting the quality of medical record data.
  • the electronic device of the hardware operating environment involved in the embodiment of the present application may include:
  • the processor 501 is, for example, a CPU.
  • the memory 502 optionally, the memory may be a high-speed RAM memory, or a stable memory, such as a disk memory.
  • the communication interface 503 is used to implement connection and communication between the processor 501 and the memory 502.
  • FIG. 5 does not constitute a limitation to it, and may include more or less components than those shown in the figure, or a combination of certain components, or different component arrangements.
  • the memory 502 may include an operating system, a network communication module, and a verification program issued by grayscale.
  • the operating system is a program that manages and controls the hardware and software resources of the server, and supports the operation of one or more programs.
  • the network communication module is used to implement communication between various components in the memory 502 and communication with other hardware and software in the electronic device.
  • the processor 501 is configured to execute one or more programs stored in the memory 502 to implement the following steps: obtain medical record data to be tested for quality; To obtain the first feature vector; obtain the first anchor sample feature vector set corresponding to the anchor sample medical record data set one-to-one; input each first anchor sample feature vector in the first anchor sample feature vector set into the trained generation
  • a plurality of second anchor sample feature vectors are obtained; according to the plurality of second anchor sample feature vectors, the first anchor sample average feature vector is determined; the first feature vector and the first anchor sample feature vector are averaged
  • the vector performs a vector operation to obtain a second feature vector; the second feature vector is input into a trained discriminator to obtain a quality detection result.
  • the processor when obtaining the first anchor sample feature vector set corresponding to the anchor sample medical record data set one-to-one, is configured to obtain the sample medical record data set to be trained and the sample medical record to be trained The quality score corresponding to the medical record data of each sample to be trained in the data set; according to the quality score corresponding to the medical record data of each sample to be trained in the sample medical record data set to be trained, the sample to be trained with the quality score in the first preset scoring interval is determined The medical record data is used to obtain an anchor sample medical record data set; each anchor sample medical record data in the anchor sample medical record data set is vectorized to obtain the first anchor sample feature vector set.
  • the processor is configured to input each first anchor sample feature vector in the first anchor sample feature vector set into the generator to be trained to obtain multiple third anchor samples Feature vector; determine the average feature vector of the second anchor sample according to the multiple third anchor sample feature vectors; obtain the one-to-one correspondence between the positive sample feature vector set of the positive sample medical record data set and the negative sample medical record data set corresponding to the negative Sample feature vector set; perform vector operations on the average feature vector of the second anchor sample and each positive sample feature vector in the positive sample feature vector set to obtain the first sample feature vector set; combine the second anchor sample Perform vector operations on the average feature vector and each negative sample feature vector in the negative sample feature vector set to obtain a second sample feature vector set; separate the first sample feature vector set and the second sample feature vector set Enter the discriminator to be trained to obtain the trained discriminator.
  • the first sample feature vector A is any vector in the first sample feature vector set, and the value of the first sample feature vector A is used to represent the first sample feature vector A.
  • the first sample feature vector The positive sample feature vector corresponding to A is closer to the average feature vector of the second anchor sample, and the smaller the value of the first sample feature vector A is, the positive that the first sample feature vector A corresponds to The sample feature vector is farther away from the second anchor sample average feature vector.
  • the second sample feature vector B is any vector in the second sample feature vector set, and the value of the second sample feature vector B is used to represent the second sample feature vector B The distance between the corresponding negative sample feature vector and the average feature vector of the second anchor sample.
  • the value of the second sample feature vector B is greater, the negative sample corresponding to the second sample feature vector B The more the feature vector approaches the average feature vector of the second anchor sample, the smaller the value of the second sample feature vector B is, the further away the negative sample feature vector corresponding to the second sample feature vector B is from the The average feature vector of the second anchor sample.
  • the processor when the second feature vector is input to the trained discriminator, and the quality detection result is obtained, the processor is further configured to input the second feature vector into the trained discriminator In the discriminator, a quality detection value is obtained; when the quality detection value is higher than a threshold value, it is determined that the quality detection result is that the medical record data to be quality tested has no quality problem; when the quality detection value is lower than the threshold value, It is determined that the quality test result is that the medical record data to be quality tested has a quality problem.
  • This application also provides a computer-readable storage medium for storing a computer program, and the stored computer program is executed by the processor to implement the following steps: Obtain medical record data to be tested for quality Vectorize the medical record data to be quality-tested to obtain the first feature vector; obtain the first anchor sample feature vector set corresponding to the anchor sample medical record data set one-to-one; combine each of the first anchor sample feature vectors The first anchor sample feature vectors are input into the trained generator to obtain multiple second anchor sample feature vectors; according to the multiple second anchor sample feature vectors, the average feature vector of the first anchor sample is determined; A feature vector is subjected to a vector operation with the average feature vector of the first anchor sample to obtain a second feature vector; the second feature vector is input into a trained discriminator to obtain a quality detection result.
  • the processor when obtaining the first anchor sample feature vector set corresponding to the anchor sample medical record data set one-to-one, is configured to obtain the sample medical record data set to be trained and the sample medical record to be trained The quality score corresponding to the medical record data of each sample to be trained in the data set; according to the quality score corresponding to the medical record data of each sample to be trained in the sample medical record data set to be trained, the sample to be trained with the quality score in the first preset scoring interval is determined The medical record data is used to obtain an anchor sample medical record data set; each anchor sample medical record data in the anchor sample medical record data set is vectorized to obtain the first anchor sample feature vector set.
  • the processor is configured to input each first anchor sample feature vector in the first anchor sample feature vector set into the generator to be trained to obtain multiple third anchor samples Feature vector; determine the average feature vector of the second anchor sample according to the plurality of third anchor sample feature vectors; obtain the one-to-one correspondence between the positive sample feature vector set of the positive sample medical record data set and the negative sample medical record data set corresponding to the negative Sample feature vector set; perform vector operations on the average feature vector of the second anchor sample and each positive sample feature vector in the positive sample feature vector set to obtain the first sample feature vector set; combine the second anchor sample Perform vector operations on the average feature vector and each negative sample feature vector in the negative sample feature vector set to obtain a second sample feature vector set; separate the first sample feature vector set and the second sample feature vector set Enter the discriminator to be trained to obtain the trained discriminator.
  • the first sample feature vector A is any vector in the first sample feature vector set, and the value of the first sample feature vector A is used to represent the first sample feature vector A.
  • the first sample feature vector The positive sample feature vector corresponding to A is closer to the average feature vector of the second anchor sample, and the smaller the value of the first sample feature vector A is, the positive that the first sample feature vector A corresponds to The sample feature vector is farther away from the second anchor sample average feature vector.
  • the second sample feature vector B is any vector in the second sample feature vector set, and the value of the second sample feature vector B is used to represent the second sample feature vector B The distance between the corresponding negative sample feature vector and the average feature vector of the second anchor sample.
  • the value of the second sample feature vector B is greater, the negative sample corresponding to the second sample feature vector B The more the feature vector approaches the average feature vector of the second anchor sample, the smaller the value of the second sample feature vector B is, the further away the negative sample feature vector corresponding to the second sample feature vector B is from the The average feature vector of the second anchor sample.
  • the processor when the second feature vector is input to the trained discriminator, and the quality detection result is obtained, the processor is further configured to input the second feature vector into the trained discriminator In the discriminator, a quality detection value is obtained; when the quality detection value is higher than a threshold value, it is determined that the quality detection result is that the medical record data to be quality tested has no quality problem; when the quality detection value is lower than the threshold value, It is determined that the quality test result is that the medical record data to be quality tested has a quality problem.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the modules is only a logical function division, and there may be other divisions in actual implementation, for example, multiple modules or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or modules, and may be in electrical or other forms.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed on multiple network modules. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • a computer readable storage medium includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

一种病历数据的质量检测方法和相关装置,涉及人工智能中的智能决策,该方法包括:获取待质量检测的病历数据(201);将待质量检测的病历数据进行向量化得到第一特征向量(202);获取锚样本病历数据集对应的第一锚样本特征向量集(203);将第一锚样本特征向量集输入训练好的生成器中,得到多个第二锚样本特征向量(204);根据该第二锚样本特征向量确定第一锚样本平均特征向量(205);将第一特征向量与第一锚样本平均特征向量进行向量运算得到第二特征向量(206);将第二特征向量输入训练好的判别器中得到质量检测结果(207)。该方法还涉及区块链技术,且该方法可应用于智慧医疗领域中,从而推动智慧城市的建设。

Description

一种病历数据的质量检测方法和相关装置
本申请要求于2020年05月15日提交中国专利局、申请号为2020104167979、发明名称为“一种病历数据的质量检测方法和相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种病历数据的质量检测方法和相关装置。
背景技术
随着信息技术的快速发展,医疗行业也迈入了新的发展阶段。现如今,电子病历系统已经普及到了大多数的医院,在计算机上填写电子病历已经逐步取代手工书写病历。
一般来说,无论是手工书写病历,还是利用电子病历系统实现病历填写,发明人意识到两者都需要医生对病历数据进行检查,从而实时把控病历数据的质量问题。即,现有技术中,都是通过人工来完成病历数据的质量控制,这种质量检测方式效率低,无法及时地反映病历数据的质量情况。
发明内容
本申请实施例提供了一种病历数据的质量检测方法和相关装置,实施本申请实施例,提高了对病历数据的质量检测效率,能够及时反映病历数据的质量情况。
本申请第一方面提供了一种病历数据的质量检测方法,包括:获取待质量检测的病历数据;将所述待质量检测的病历数据进行向量化,得到第一特征向量;获取锚样本病历数据集一一对应的第一锚样本特征向量集;将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
本申请第二方面提供了一种病历数据的质量检测装置,包括:获取模块,用于获取待质量检测的病历数据;处理模块,用于将所述待质量检测的病历数据进行向量化,得到第一特征向量;所述获取模块,还用于获取锚样本病历数据集一一对应的第一锚样本特征向量集;所述处理模块,还用于将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;所述处理模块,还用于根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;所述处理模块,还用于将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;所述处理模块,还用于将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
本申请第三方面提供了一种病历数据的质量检测的电子设备,包括处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被生成由所述处理器执行,以实现以下步骤:获取待质量检测的病历数据;将所述待质量检测的病历数据进行向量化,得到第一特征向量;获取锚样本病历数据集一一对应的第一锚样本特征向量集;将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
本申请第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述存储计算机程序被所述处理器执行,以实现以实现以下步骤:获取待质量检测的病历数据;将所述待质量检测的病历数据进行向量化,得到第一特征向量;获取锚样本病历数据集一一对应的第一锚样本特征向量集;将所述第一锚样本特征向量集中的 每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
本申请第五方面提供了一种计算机程序产品,其中,包括计算机指令,当所述计算机指令在如第三方面所述的设备上运行时,使得所述设备执行一种病历数据的质量检测方法任一项所述的方法
可以看出,上述技术方案中,通过将待质量检测的病历数据对应的第一特征向量与第一锚样本平均特征向量进行向量运算后的第二特征向量输入训练好的判别器,并得到质量检测结果,避免了直接利用训练好的判别器对待质量检测的病历数据对应的第一特征向量进行处理后得到的质量检测结果不准确的问题。同时,通过利用判别器确定待质量检测的病历数据的质量检测结果,提高了对病历数据的质量检测效率,能够及时反映病历数据的质量情况。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种病历数据的质量检测系统的示意图。
图2A为本申请实施例提供的一种病历数据的质量检测方法的流程示意图。
图2B为本申请实施例提供的一种编码器的示意图。
图3为本申请实施例提供的又一种病历数据的质量检测方法的流程示意图。
图4为本申请实施例提供的一种病历数据的质量检测装置的示意图。
图5为本申请的实施例涉及的硬件运行环境的电子设备结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
以下分别进行详细说明。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
首先,参见图1,图1是本申请实施例提供的一种病历数据的质量检测系统的示意图,该质量检测系统100包括质量检测处理装置110。该质量检测处理装置110用于处理待质量检测的病历数据。该质量检测系统100可以包括集成式单体设备或者多设备,为方便描述,本申请将质量检测系统100统称为电子设备。显然该电子设备可以包括各种具有无线通信功能的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(User Equipment,UE),移动台(Mobile Station,MS),终端设备(terminal device)等等。
一般来说,无论是手工书写病历,还是利用电子病历系统实现病历填写,都需要医生对病历数据进行检查,从而实时把控病历数据的质量问题。即,现有技术中,都是通过人 工来完成病历数据的质量控制,这种质量检测方式效率低,无法及时地反映病历数据的质量情况。
基于此,本申请实施例提出一种病历数据的质量检测方法以解决上述问题,下面对本申请实施例进行详细介绍。
参见图2A,图2A为本申请实施例提供的一种病历数据的质量检测方法的流程示意图。如图2A所示,所述方法包括:
201、获取待质量检测的病历数据;
其中,待质量检测的病历数据可以包括文字、符号、图表、图形、数据、影像等。进一步来说,待质量检测的病历数据包括性别、年龄、出生年月、姓名、药名等。
另外,可以从区块链中获取待质量检测的病历数据。
其中,区块链是一种按照时间顺序将数据区块相连的一种链式数据结构,并以密码学方式保证的不可篡改和不可伪造的分布式账本。该区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
进一步的,区块链的特性有开放、共识、去中心、去信任、透明、双方匿名、不可篡改以及可追溯等。其中,开放与透明意为任何人都可以参与到区块链网络,每一台设备都能作为一个节点,每个节点都允许获得一份完整的数据库拷贝。节点基于一套共识机制,通过竞争计算共同维护整个区块链。任一节点失效,其余节点仍能正常工作。其中,去中心化与去信任意为区块链由众多节点共同组成一个端到端的网络,不存在中心化的设备和管理机构。节点之间数据交换通过数字签名技术进行验证,无需互相信任,只要按照系统既定的规则进行,节点之间不能也无法欺骗其他节点。其中,透明与双方匿名意为区块链的运行规则是公开的,所有的数据信息也是公开的,因此每一笔交易都对所有节点可见。由于节点与节点之间是去信任的,因此节点之间无需公开身份,每个参与的节点都是匿名的。其中,不可篡改和可追溯意为每个甚至多个节点对数据库的修改无法影响其他节点的数据库,除非能控制整个网络中超过51%的节点同时修改,这是几乎不可能发生的。区块链中的,每一笔交易都通过密码学方法与相邻两个区块串联,因此可以追溯到任何一笔交易记录。
具体的,区块链可以利用块链式数据结构来验证与存储数据、利用分布式节点共识算法来生成和更新数据、利用密码学的方式保证数据传输和访问的安全、利用由自动化脚本代码组成的智能合约来编程和操作数据的一种全新的分布式基础架构与计算方式。因此,区块链技术不可篡改的特性从根本上改变了中心化的信用创建方式,有效提高了数据的不可更改性以及安全性。其中,由于智能合约使得所有的条款编写为程序,这些条款可在区块链上自动执行,保证了当存在触发智能合约的条件时,区块链能强制根据智能合约中的内容执行,且不受任何外力阻挡,从而保证了合约的有效性和执行力,不仅能够大大降低成本,也能提高效率。区块链上的各个节点都有相同的账本,能够确保账本记录过程是公开透明的。区块链技术可以实现了一种点对点的、公开透明的直接交互,使得高效率、大规模、无中心化代理的信息交互方式成为了现实。
202、将所述待质量检测的病历数据进行向量化,得到第一特征向量;
需要说明的,待质量检测的病历数据所对应的数据类型包括连续型或类别型,在所述将所述待质量检测的病历数据进行向量化,得到第一特征向量之前,所述方法还包括:对所述待质量检测的病历数据中数据类型为类别型的数据进行编码处理,得到编码后的待质量检测的病历数据。其中,可以采用独热编码对所述待质量检测的病历数据中数据类型为类别型的数据进行编码处理,得到编码后的待质量检测的病历数据。可以理解的,无需对所述待质量检测的病历数据中数据类型为连续型的数据进行编码处理。
举例来说,待质量检测的病历数据中的性别所对应的数据类型为类别型,那么,需要 对性别进行编码处理。而待质量检测的病历数据中的年龄所对应的数据类型为连续型,那么,无需对年龄进行编码处理。
进一步的,所述将所述待质量检测的病历数据进行向量化,得到第一特征向量,包括:将编码后的待质量检测的病历数据以及待质量检测的病历数据中数据类型为连续型的数据进行向量化,得到第一特征向量。可以理解的,可以将编码后的待质量检测的病历数据以及待质量检测的病历数据中数据类型为连续型的数据输入编码器中,得到第一特征向量。
其中,参见图2B,图2B为本申请实施例提供的一种编码器的示意图。如图2B所示,可以看出,该编码器包括输入层、至少一个隐藏层和输出层。需要说明的,该输入层为一个n维输入层,该隐藏层为一个m维隐藏层,该输出层为一个n维输出层。其中,n和m均为大于1的整数,m远小于n。进一步的,隐藏层的映射作为编码器,输出层的映射作为解码器,且编码器的网络结构为多层自编码网络结构。
需要说明的,编码后的待质量检测的病历数据以及待质量检测的病历数据中数据类型为连续型的数据是n维度的高维向量。第一特征向量是m维度的低维向量,即第一特征向量是最后一层隐藏层的输出数据。
203、获取锚样本病历数据集一一对应的第一锚样本特征向量集;
可选的,在一种可能的实施方式中,所述获取锚样本病历数据集一一对应的第一锚样本特征向量集,包括:获取待训练样本病历数据集以及所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分;根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第一预设评分区间的待训练样本病历数据,得到锚样本病历数据集;将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特征向量集。
其中,还可以根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第二预设评分区间的待训练样本病历数据,得到正样本病历数据集;也可以根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第三预设评分区间的待训练样本病历数据,得到负样本病历数据集。可以理解的,第一预设评分区间高于第二预设评分区间,第二预设评分区间高于第三预设评分区间。即,锚样本病历数据集中的每条锚样本病历数据所对应的质量评分高于正样本病历数据集中的每条正样本病历数据所对应的质量评分,正样本病历数据集中的每条正样本病历数据所对应的质量评分高于负样本病历数据集中的每条负样本病历数据所对应的质量评分。另外,负样本病历数据集中的每条负样本病历数据均存在质量问题。而,正样本病历数据集中的每条正样本病历数据以及锚样本病历数据集中的每条锚样本病历数据均无质量问题。
举例来说,质量评分在第一预设评分区间的待训练样本病历数据,其病历的完成时间、书写格式段落、医学术语、三级查房、知情同意、麻醉访视、诊断治疗、辅助检查、院内感染和/或抗菌药物使用都符合规定;质量评分在第二预设评分区间的待训练样本病历数据,其医学术语、三级查房、知情同意、麻醉访视、诊断治疗、辅助检查、院内感染和/或抗菌药物使用都符合规定;质量评分在第三预设评分区间的待训练样本病历数据,其医学术语、三级查房、麻醉访视、诊断治疗、院内感染和/或抗菌药物使用都符合规定。
需要说明的,待训练样本病历数据集中的每条待训练样本病历数据所对应的数据类型包括连续型或类别型。即,锚样本病历数据集中的每条锚样本病历数据所对应的数据类型包括连续型或类别型,正样本病历数据集中的每条正样本病历数据所对应的数据类型包括连续型或类别型,负样本病历数据集中的每条负样本病历数据所对应的数据类型包括连续型或类别型。进一步的,锚样本病历数据P为所述锚样本病历数据集中的任意一条数据,在将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特 征向量集之前,所述方法还包括:对所述锚样本病历数据P中数据类型为类别型的数据进行编码处理,得到编码后的锚样本病历数据P。其中,可以采用独热编码对所述锚样本病历数据P中数据类型为类别型的数据进行编码处理,得到编码后的锚样本病历数据P。可以理解的,无需对锚样本病历数据P中数据类型为连续型的数据进行编码处理。
进一步的,将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特征向量集,包括:将编码后的锚样本病历数据P以及锚样本病历数据P中数据类型为连续型的数据进行向量化,得到锚样本病历数据P对应的第一锚样本特征向量。可以理解的,可以将编码后的锚样本病历数据P以及锚样本病历数据P中数据类型为连续型的数据输入编码器中,得到锚样本病历数据P对应的第一锚样本特征向量。可以理解的,锚样本病历数据集与第一锚样本特征向量集一一对应。
另外,编码后的锚样本病历数据P以及锚样本病历数据P中数据类型为连续型的数据是n维度的高维向量。锚样本病历数据P对应的第一锚样本特征向量是m维度的低维向量,即锚样本病历数据P对应的第一锚样本特征向量是最后一层隐藏层的输出数据。
可以看出,上述技术方案中,获取待训练样本病历数据集以及所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分;根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第一预设评分区间的待训练样本病历数据,得到锚样本病历数据集;将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特征向量集。通过将确定出质量评分在第一预设评分区间的待训练样本病历数据,实现了待训练样本病历数据集的分类,也得到了质量评分在第一预设评分区间的锚样本病历数据集。同时,通过将锚样本病历数据集中的每条锚样本病历数据进行向量化,从而得到锚样本特征向量集,为后续确定锚样本平均特征向量做准备。另外,通过向量化,将高维向量转为低维向量,简化了训练好的生成器学习锚样本病历数据的分布难度,提高了学习效率。
204、将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;
205、根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;
其中,第一锚样本平均特征向量为m维度的低维向量。第一锚样本平均特征向量中第i行第j列的值为多个第二锚样本特征向量中每个第二锚样本特征向量中第i行第j列的值的平均值。其中,i和j为大于0的整数,i和j的取值与第一锚样本平均特征向量有关。
举例来说,所述多个第二锚样本特征向量包括第二锚样本特征向量N1和第二锚样本特征向量N2。其中,第二锚样本特征向量N1为
Figure PCTCN2020099270-appb-000001
第二锚样本特征向量N2为
Figure PCTCN2020099270-appb-000002
那么,第一锚样本平均特征向量为
Figure PCTCN2020099270-appb-000003
206、将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;
其中,向量运算例如可以为向量加法、向量减法、向量积等,在此不做限制。
207、将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
可以看出,上述技术方案中,获取待质量检测的病历数据;将所述待质量检测的病历数据进行向量化,得到第一特征向量;获取锚样本病历数据集一一对应的第一锚样本特征向量集;将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;将所述第二特征向量输入训练好的判别器中,得到质量检测结果。通过将 待质量检测的病历数据对应的第一特征向量与第一锚样本平均特征向量进行向量运算后的第二特征向量输入训练好的判别器,并得到质量检测结果,避免了直接利用训练好的判别器对待质量检测的病历数据对应的第一特征向量进行处理后得到的质量检测结果不准确的问题。同时,通过利用判别器确定待质量检测的病历数据的质量检测结果,提高了对病历数据的质量检测效率,能够及时反映病历数据的质量情况。
其中,在一种可能的实施方式中,所述将所述第二特征向量输入训练好的判别器中,得到质量检测结果,包括:将所述第二特征向量输入训练好的判别器中,得到质量检测数值;在所述质量检测数值高于阈值时,确定所述质量检测结果为所述待质量检测的病历数据无质量问题;在所述质量检测数值低于阈值时,确定所述质量检测结果为所述待质量检测的病历数据存在质量问题。
其中,质量检测数值是预设区间内的浮点数,预设区间为[0,1]。进一步的,在质量检测数值高于阈值时,标签为1,即质量检测结果为待质量检测的病历数据无质量问题;在质量检测数值低于阈值时,标签为0,质量检测结果为待质量检测的病历数据存在质量问题。
其中,还可以显示阈值调整界面,该阈值调整界面包括阈值输入框和确认按钮。用户可以在阈值输入框中输入阈值,并对确认按钮进行操作,从而实现对阈值的动态调整。
可以看出,上述技术方案中,将所述第二特征向量输入训练好的判别器中,得到质量检测数值;在所述质量检测数值高于阈值时,确定所述质量检测结果为所述待质量检测的病历数据无质量问题;在所述质量检测数值低于阈值时,确定所述质量检测结果为所述待质量检测的病历数据存在质量问题,通过利用判别器确定待质量检测的病历数据的质量检测结果,提高了对病历数据的质量检测效率,能够及时反映病历数据的质量情况。同时,结合阈值,实现动态把控病历数据的质量检测结果。
参见图3,图3为本申请实施例提供的又一种病历数据的质量检测方法的流程示意图。其中,如图3所示,所述方法还包括:
301、将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入待训练的生成器中,得到多个第三锚样本特征向量;
其中,待训练的生成器包括一个输入层、多个隐藏层以及一个输出层。需要说明的,该输入层为一个m维输入层,该隐藏层为一个k维隐藏层,该输出层为一个m维输出层。其中,k为大于1且小于m的整数。进一步的,待训练的生成器的网络结构为深度神经网络。
另外,第一锚样本特征向量集中的每个第一锚样本特征向量均为m维的低维向量。多个第三锚样本特征向量中的每个第三锚样本特征向量均为m维的低维向量,即多个第三锚样本特征向量中的每个第三锚样本特征向量均是待训练的生成器的输出层的输出数据。
可以理解的,待训练的生成器和训练好的生成器,在内部参数上有很大的差别。因此,在将第一锚样本特征向量集中的每个第一锚样本特征向量分别输入待训练的生成器和训练好的生成器时,待训练的生成器输出的输出数据与训练好的生成器输出的输出数据有很大的差别。
另外,将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入待训练的生成器,可以完成对生成器的训练,得到训练好的生成器。
302、根据所述多个第三锚样本特征向量,确定第二锚样本平均特征向量;
其中,第二锚样本平均特征向量为m维度的低维向量。第二锚样本平均特征向量中第a行第b列的值为多个第三锚样本特征向量中每个第三锚样本特征向量中第a行第b列的值的平均值。其中,a和b为大于0的整数,a和b的取值与第二锚样本平均特征向量有关。
举例来说,所述多个第三锚样本特征向量包括第三锚样本特征向量M1和第三锚样本 特征向量M2。其中,第三锚样本特征向量M1为
Figure PCTCN2020099270-appb-000004
第三锚样本特征向量M2为
Figure PCTCN2020099270-appb-000005
那么,第二锚样本平均特征向量为
Figure PCTCN2020099270-appb-000006
303、获取正样本病历数据集一一对应的正样本特征向量集和负样本病历数据集一一对应的负样本特征向量集;
可选的,在一种可能的实施方式中,正样本病历数据X为正样本病历数据集中任意一条数据,所述获取正样本病历数据集一一对应的正样本特征向量集,包括:对所述正样本病历数据X中数据类型为类别型的数据进行编码处理,得到编码后的正样本病历数据X;将将编码后的正样本病历数据X以及正样本病历数据X中数据类型为连续型的数据进行向量化,得到正样本病历数据X对应的正样本特征向量。
其中,可以采用独热编码对所述正样本病历数据X中数据类型为类别型的数据进行编码处理,得到编码后的正样本病历数据X。可以理解的,无需对正样本病历数据X中数据类型为连续型的数据进行编码处理。
另外,编码后的正样本病历数据X以及正样本病历数据X中数据类型为连续型的数据是n维度的高维向量。正样本病历数据X对应的正样本特征向量是m维度的低维向量,即正样本病历数据X对应的正样本特征向量是最后一层隐藏层的输出数据。
同理,负样本病历数据Y为负样本病历数据集中任意一条数据,所述获取负样本病历数据集一一对应的负样本特征向量集,包括:对所述负样本病历数据Y中数据类型为类别型的数据进行编码处理,得到编码后的负样本病历数据Y;将将编码后的负样本病历数据Y以及负样本病历数据Y中数据类型为连续型的数据进行向量化,得到负样本病历数据Y对应的负样本特征向量。
其中,可以采用独热编码对所述负样本病历数据Y中数据类型为类别型的数据进行编码处理,得到编码后的负样本病历数据Y。可以理解的,无需对负样本病历数据Y中数据类型为连续型的数据进行编码处理。
另外,编码后的负样本病历数据Y以及负样本病历数据Y中数据类型为连续型的数据是n维度的高维向量。负样本病历数据Y对应的负样本特征向量是m维度的低维向量,即负样本病历数据Y对应的负样本特征向量是最后一层隐藏层的输出数据。
304、将所述第二锚样本平均特征向量与所述正样本特征向量集中的每条正样本特征向量进行向量运算,得到第一样本特征向量集;
其中,第一样本特征向量A为所述第一样本特征向量集中的任意一个向量,所述第一样本特征向量A的值用于表示所述第一样本特征向量A所对应的正样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第一样本特征向量A的值越大时,所述第一样本特征向量A所对应的正样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第一样本特征向量A的值越小时,所述第一样本特征向量A所对应的正样本特征向量越远离于所述第二锚样本平均特征向量。
305、将所述第二锚样本平均特征向量与所述负样本特征向量集中的每条负样本特征向量进行向量运算,得到第二样本特征向量集;
其中,第二样本特征向量B为所述第二样本特征向量集中的任意一个向量,所述第二样本特征向量B的值用于表示所述第二样本特征向量B所对应的负样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第二样本特征向量B的值越大时,所述第二样本特征向量B所对应的负样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第二样本特征向量B的值越小时,所述第二样本特征向量B所对应的负样本特征向量越远离于所述第二锚样本平均特征向量。
306、将所述第一样本特征向量集和所述第二样本特征向量集分别输入待训练的判别器中,得到所述训练好的判别器。
可以看出,上述技术方案中,将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入待训练的生成器中,得到多个第三锚样本特征向量;根据所述多个第三锚样本特征向量,确定第二锚样本平均特征向量;获取正样本病历数据集一一对应的正样本特征向量集和负样本病历数据集一一对应的负样本特征向量集;将所述第二锚样本平均特征向量与所述正样本特征向量集中的每条正样本特征向量进行向量运算,得到第一样本特征向量集;将所述第二锚样本平均特征向量与所述负样本特征向量集中的每条负样本特征向量进行向量运算,得到第二样本特征向量集;将所述第一样本特征向量集和所述第二样本特征向量集分别输入待训练的判别器中,得到所述训练好的判别器,利用待训练的生成器生成的锚样本特征向量得到锚样本平均特征向量,并采用锚样本平均特征向量分别与正样本特征向量、负样本特征向量进行向量运算后的向量来训练待训练的判别器,从而实现让负样本与锚样本越远离,正样本与锚样本越接近,进而让训练好的判别器可以更好分类出有质量问题的病历数据和无质量问题的病历数据。同时,本方案可应用于智慧医疗领域中,通过让训练好的判别器可以更好分类出有质量问题的病历数据和无质量问题的病历数据,从而更好的推动了智慧城市的建设。
参见图4,图4为本申请实施例提供的一种病历数据的质量检测装置的示意图。其中,如图4所示,本申请实施例提供的一种病历数据的质量检测装置400可以包括:
获取模块401,用于获取待质量检测的病历数据;
处理模块402,用于将所述待质量检测的病历数据进行向量化,得到第一特征向量;
所述获取模块401,还用于获取锚样本病历数据集一一对应的第一锚样本特征向量集;
可选的,在一种可能的实施方式中,在获取锚样本病历数据集一一对应的第一锚样本特征向量集时,所述获取模块401,具体用于获取待训练样本病历数据集以及所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分;所述处理模块402,具体用于根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第一预设评分区间的待训练样本病历数据,得到锚样本病历数据集;将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特征向量集。
所述处理模块402,还用于将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;
所述处理模块402,还用于根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;
所述处理模块402,还用于将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;
所述处理模块402,还用于将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
可选的,在一种可能的实施方式中,所述处理模块402,还用于将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入待训练的生成器中,得到多个第三锚样本特征向量;所述处理模块402,还用于根据所述多个第三锚样本特征向量,确定第二锚样本平均特征向量;所述获取模块401,还用于获取正样本病历数据集一一对应的正样本特征向量集和负样本病历数据集一一对应的负样本特征向量集;所述处理模块402,还用于将所述第二锚样本平均特征向量与所述正样本特征向量集中的每条正样本特征向量进行向量运算,得到第一样本特征向量集;所述处理模块402,还用于将所述第二锚样本平均特征向量与所述负样本特征向量集中的每条负样本特征向量进行向量运算,得到第二样本特征向量集;所述处理模块402,还用于将所述第一样本特征向量集和所述第二样本特征向量集 分别输入待训练的判别器中,得到所述训练好的判别器。
其中,第一样本特征向量A为所述第一样本特征向量集中的任意一个向量,所述第一样本特征向量A的值用于表示所述第一样本特征向量A所对应的正样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第一样本特征向量A的值越大时,所述第一样本特征向量A所对应的正样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第一样本特征向量A的值越小时,所述第一样本特征向量A所对应的正样本特征向量越远离于所述第二锚样本平均特征向量。
其中,第二样本特征向量B为所述第二样本特征向量集中的任意一个向量,所述第二样本特征向量B的值用于表示所述第二样本特征向量B所对应的负样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第二样本特征向量B的值越大时,所述第二样本特征向量B所对应的负样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第二样本特征向量B的值越小时,所述第二样本特征向量B所对应的负样本特征向量越远离于所述第二锚样本平均特征向量。
可选的,在一种可能的实施方式中,在将所述第二特征向量输入训练好的判别器中,得到质量检测结果时,所述处理模块402,具体用于将所述第二特征向量输入训练好的判别器中,得到质量检测数值;在所述质量检测数值高于阈值时,确定所述质量检测结果为所述待质量检测的病历数据无质量问题;在所述质量检测数值低于阈值时,确定所述质量检测结果为所述待质量检测的病历数据存在质量问题。
参见图5,图5为本申请的实施例涉及的硬件运行环境的电子设备结构示意图。
本申请实施例提供了一种病历数据的质量检测的电子设备,包括处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,以执行包括任一项病历数据的质量检测方法中的步骤的指令。其中,如图5所示,本申请的实施例涉及的硬件运行环境的电子设备可以包括:
处理器501,例如CPU。
存储器502,可选的,存储器可以为高速RAM存储器,也可以是稳定的存储器,例如磁盘存储器。
通信接口503,用于实现处理器501和存储器502之间的连接通信。
本领域技术人员可以理解,图5中示出的电子设备的结构并不构成对其的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图5所示,存储器502中可以包括操作系统、网络通信模块以及灰度发布的验证的程序。操作系统是管理和控制服务器硬件和软件资源的程序,支持一个或多个程序的运行。网络通信模块用于实现存储器502内部各组件之间的通信,以及与电子设备内部其他硬件和软件之间通信。
在图5所示的电子设备中,处理器501用于执行存储器502中存储的一个或多个程序,实现以下步骤:获取待质量检测的病历数据;将所述待质量检测的病历数据进行向量化,得到第一特征向量;获取锚样本病历数据集一一对应的第一锚样本特征向量集;将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
在一种可能的实施方式中,在获取锚样本病历数据集一一对应的第一锚样本特征向量集时,所述处理器,用于获取待训练样本病历数据集以及所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分;根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第一预设评分区间的待训练样本病历数据, 得到锚样本病历数据集;将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特征向量集。
在一种可能的实施方式中,所述处理器,用于将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入待训练的生成器中,得到多个第三锚样本特征向量;根据所述多个第三锚样本特征向量,确定第二锚样本平均特征向量;获取正样本病历数据集一一对应的正样本特征向量集和负样本病历数据集一一对应的负样本特征向量集;将所述第二锚样本平均特征向量与所述正样本特征向量集中的每条正样本特征向量进行向量运算,得到第一样本特征向量集;将所述第二锚样本平均特征向量与所述负样本特征向量集中的每条负样本特征向量进行向量运算,得到第二样本特征向量集;将所述第一样本特征向量集和所述第二样本特征向量集分别输入待训练的判别器中,得到所述训练好的判别器。
在一种可能的实施方式中,第一样本特征向量A为所述第一样本特征向量集中的任意一个向量,所述第一样本特征向量A的值用于表示所述第一样本特征向量A所对应的正样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第一样本特征向量A的值越大时,所述第一样本特征向量A所对应的正样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第一样本特征向量A的值越小时,所述第一样本特征向量A所对应的正样本特征向量越远离于所述第二锚样本平均特征向量。
在一种可能的实施方式中,第二样本特征向量B为所述第二样本特征向量集中的任意一个向量,所述第二样本特征向量B的值用于表示所述第二样本特征向量B所对应的负样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第二样本特征向量B的值越大时,所述第二样本特征向量B所对应的负样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第二样本特征向量B的值越小时,所述第二样本特征向量B所对应的负样本特征向量越远离于所述第二锚样本平均特征向量。
在一种可能的实施方式中,在将所述第二特征向量输入训练好的判别器中,得到质量检测结果时,所述处理器,还用于将所述第二特征向量输入训练好的判别器中,得到质量检测数值;在所述质量检测数值高于阈值时,确定所述质量检测结果为所述待质量检测的病历数据无质量问题;在所述质量检测数值低于阈值时,确定所述质量检测结果为所述待质量检测的病历数据存在质量问题。
本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述存储计算机程序被所述处理器执行,以实现以下步骤:获取待质量检测的病历数据;将所述待质量检测的病历数据进行向量化,得到第一特征向量;获取锚样本病历数据集一一对应的第一锚样本特征向量集;将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
在一种可能的实施方式中,在获取锚样本病历数据集一一对应的第一锚样本特征向量集时,所述处理器,用于获取待训练样本病历数据集以及所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分;根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第一预设评分区间的待训练样本病历数据,得到锚样本病历数据集;将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特征向量集。
在一种可能的实施方式中,所述处理器,用于将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入待训练的生成器中,得到多个第三锚样本特征向量;根据所述多个第三锚样本特征向量,确定第二锚样本平均特征向量;获取正样本病历数据集一一对应 的正样本特征向量集和负样本病历数据集一一对应的负样本特征向量集;将所述第二锚样本平均特征向量与所述正样本特征向量集中的每条正样本特征向量进行向量运算,得到第一样本特征向量集;将所述第二锚样本平均特征向量与所述负样本特征向量集中的每条负样本特征向量进行向量运算,得到第二样本特征向量集;将所述第一样本特征向量集和所述第二样本特征向量集分别输入待训练的判别器中,得到所述训练好的判别器。
在一种可能的实施方式中,第一样本特征向量A为所述第一样本特征向量集中的任意一个向量,所述第一样本特征向量A的值用于表示所述第一样本特征向量A所对应的正样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第一样本特征向量A的值越大时,所述第一样本特征向量A所对应的正样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第一样本特征向量A的值越小时,所述第一样本特征向量A所对应的正样本特征向量越远离于所述第二锚样本平均特征向量。
在一种可能的实施方式中,第二样本特征向量B为所述第二样本特征向量集中的任意一个向量,所述第二样本特征向量B的值用于表示所述第二样本特征向量B所对应的负样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第二样本特征向量B的值越大时,所述第二样本特征向量B所对应的负样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第二样本特征向量B的值越小时,所述第二样本特征向量B所对应的负样本特征向量越远离于所述第二锚样本平均特征向量。
在一种可能的实施方式中,在将所述第二特征向量输入训练好的判别器中,得到质量检测结果时,所述处理器,还用于将所述第二特征向量输入训练好的判别器中,得到质量检测数值;在所述质量检测数值高于阈值时,确定所述质量检测结果为所述待质量检测的病历数据无质量问题;在所述质量检测数值低于阈值时,确定所述质量检测结果为所述待质量检测的病历数据存在质量问题。
另外,所述计算机可读存储介质可以是非易失性,也可以是易失性。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应所述知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应所述知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应所述理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性或者其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者 说对现有技术做出贡献的部分或者所述技术方案的全部或部分可以以软件产品的形式体现出来,所述计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (20)

  1. 一种病历数据的质量检测方法,其中,包括:
    获取待质量检测的病历数据;
    将所述待质量检测的病历数据进行向量化,得到第一特征向量;
    获取锚样本病历数据集一一对应的第一锚样本特征向量集;
    将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;
    根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;
    将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;
    将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
  2. 根据权利要求1所述的方法,其中,所述获取锚样本病历数据集一一对应的第一锚样本特征向量集,包括:
    获取待训练样本病历数据集以及所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分;
    根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第一预设评分区间的待训练样本病历数据,得到锚样本病历数据集;
    将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特征向量集。
  3. 根据权利要求1或2所述的方法,其中,所述方法还包括:
    将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入待训练的生成器中,得到多个第三锚样本特征向量;
    根据所述多个第三锚样本特征向量,确定第二锚样本平均特征向量;
    获取正样本病历数据集一一对应的正样本特征向量集和负样本病历数据集一一对应的负样本特征向量集;
    将所述第二锚样本平均特征向量与所述正样本特征向量集中的每条正样本特征向量进行向量运算,得到第一样本特征向量集;
    将所述第二锚样本平均特征向量与所述负样本特征向量集中的每条负样本特征向量进行向量运算,得到第二样本特征向量集;
    将所述第一样本特征向量集和所述第二样本特征向量集分别输入待训练的判别器中,得到所述训练好的判别器。
  4. 根据权利要求3所述的方法,其中,第一样本特征向量A为所述第一样本特征向量集中的任意一个向量,所述第一样本特征向量A的值用于表示所述第一样本特征向量A所对应的正样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第一样本特征向量A的值越大时,所述第一样本特征向量A所对应的正样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第一样本特征向量A的值越小时,所述第一样本特征向量A所对应的正样本特征向量越远离于所述第二锚样本平均特征向量。
  5. 根据权利要求3所述的方法,其中,第二样本特征向量B为所述第二样本特征向量集中的任意一个向量,所述第二样本特征向量B的值用于表示所述第二样本特征向量B所对应的负样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第二样本特征向量B的值越大时,所述第二样本特征向量B所对应的负样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第二样本特征向量B的值越小时,所述第二样本特征向量B所对应的负样本特征向量越远离于所述第二锚样本平均特征向量。
  6. 根据权利要求1所述的方法,其中,所述将所述第二特征向量输入训练好的判别 器中,得到质量检测结果,包括:
    将所述第二特征向量输入训练好的判别器中,得到质量检测数值;
    在所述质量检测数值高于阈值时,确定所述质量检测结果为所述待质量检测的病历数据无质量问题;
    在所述质量检测数值低于阈值时,确定所述质量检测结果为所述待质量检测的病历数据存在质量问题。
  7. 一种病历数据的质量检测装置,其中,包括:
    获取模块,用于获取待质量检测的病历数据;
    处理模块,用于将所述待质量检测的病历数据进行向量化,得到第一特征向量;
    所述获取模块,还用于获取锚样本病历数据集一一对应的第一锚样本特征向量集;
    所述处理模块,还用于将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;
    所述处理模块,还用于根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;
    所述处理模块,还用于将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;
    所述处理模块,还用于将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
  8. 一种病历数据的质量检测的电子设备,其中,包括处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被生成由所述处理器执行,以执行以下步骤的指令:
    获取待质量检测的病历数据;将所述待质量检测的病历数据进行向量化,得到第一特征向量;获取锚样本病历数据集一一对应的第一锚样本特征向量集;将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
  9. 根据权利要求8所述的设备,其中,在获取锚样本病历数据集一一对应的第一锚样本特征向量集时,所述处理器,用于获取待训练样本病历数据集以及所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分;根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第一预设评分区间的待训练样本病历数据,得到锚样本病历数据集;将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特征向量集。
  10. 根据权利要求8或9所述的设备,其中,所述处理器,用于将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入待训练的生成器中,得到多个第三锚样本特征向量;根据所述多个第三锚样本特征向量,确定第二锚样本平均特征向量;获取正样本病历数据集一一对应的正样本特征向量集和负样本病历数据集一一对应的负样本特征向量集;将所述第二锚样本平均特征向量与所述正样本特征向量集中的每条正样本特征向量进行向量运算,得到第一样本特征向量集;将所述第二锚样本平均特征向量与所述负样本特征向量集中的每条负样本特征向量进行向量运算,得到第二样本特征向量集;将所述第一样本特征向量集和所述第二样本特征向量集分别输入待训练的判别器中,得到所述训练好的判别器。
  11. 根据权利要求10所述的设备,其中,第一样本特征向量A为所述第一样本特征向量集中的任意一个向量,所述第一样本特征向量A的值用于表示所述第一样本特征向量 A所对应的正样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第一样本特征向量A的值越大时,所述第一样本特征向量A所对应的正样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第一样本特征向量A的值越小时,所述第一样本特征向量A所对应的正样本特征向量越远离于所述第二锚样本平均特征向量。
  12. 根据权利要求10所述的设备,其中,第二样本特征向量B为所述第二样本特征向量集中的任意一个向量,所述第二样本特征向量B的值用于表示所述第二样本特征向量B所对应的负样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第二样本特征向量B的值越大时,所述第二样本特征向量B所对应的负样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第二样本特征向量B的值越小时,所述第二样本特征向量B所对应的负样本特征向量越远离于所述第二锚样本平均特征向量。
  13. 根据权利要求8所述的设备,其中,在将所述第二特征向量输入训练好的判别器中,得到质量检测结果时,所述处理器,还用于将所述第二特征向量输入训练好的判别器中,得到质量检测数值;在所述质量检测数值高于阈值时,确定所述质量检测结果为所述待质量检测的病历数据无质量问题;在所述质量检测数值低于阈值时,确定所述质量检测结果为所述待质量检测的病历数据存在质量问题。
  14. 一种计算机可读存储介质,其中,所述计算机可读存储介质用于存储计算机程序,所述存储计算机程序被所述处理器执行,以实现以下步骤:获取待质量检测的病历数据;将所述待质量检测的病历数据进行向量化,得到第一特征向量;获取锚样本病历数据集一一对应的第一锚样本特征向量集;将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入训练好的生成器中,得到多个第二锚样本特征向量;根据所述多个第二锚样本特征向量,确定第一锚样本平均特征向量;将所述第一特征向量与所述第一锚样本平均特征向量进行向量运算,得到第二特征向量;将所述第二特征向量输入训练好的判别器中,得到质量检测结果。
  15. 根据权利要求14所述的介质,其中,在获取锚样本病历数据集一一对应的第一锚样本特征向量集时,所述处理器,用于获取待训练样本病历数据集以及所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分;根据所述待训练样本病历数据集中每个待训练样本病历数据对应的质量评分,确定出质量评分在第一预设评分区间的待训练样本病历数据,得到锚样本病历数据集;将所述锚样本病历数据集中的每条锚样本病历数据进行向量化,得到所述第一锚样本特征向量集。
  16. 根据权利要求14或15所述的介质,其中,所述处理器,用于将所述第一锚样本特征向量集中的每个第一锚样本特征向量输入待训练的生成器中,得到多个第三锚样本特征向量;根据所述多个第三锚样本特征向量,确定第二锚样本平均特征向量;获取正样本病历数据集一一对应的正样本特征向量集和负样本病历数据集一一对应的负样本特征向量集;将所述第二锚样本平均特征向量与所述正样本特征向量集中的每条正样本特征向量进行向量运算,得到第一样本特征向量集;将所述第二锚样本平均特征向量与所述负样本特征向量集中的每条负样本特征向量进行向量运算,得到第二样本特征向量集;将所述第一样本特征向量集和所述第二样本特征向量集分别输入待训练的判别器中,得到所述训练好的判别器。
  17. 根据权利要求16所述的介质,其中,第一样本特征向量A为所述第一样本特征向量集中的任意一个向量,所述第一样本特征向量A的值用于表示所述第一样本特征向量A所对应的正样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第一样本特征向量A的值越大时,所述第一样本特征向量A所对应的正样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第一样本特征向量A的值越小时,所述第一样本特征向量A所对应的正样本特征向量越远离于所述第二锚样本平均特征向量。
  18. 根据权利要求16所述的介质,其中,第二样本特征向量B为所述第二样本特征向量集中的任意一个向量,所述第二样本特征向量B的值用于表示所述第二样本特征向量B所对应的负样本特征向量与所述第二锚样本平均特征向量之间的距离,在所述第二样本特征向量B的值越大时,所述第二样本特征向量B所对应的负样本特征向量越趋近于所述第二锚样本平均特征向量,在所述第二样本特征向量B的值越小时,所述第二样本特征向量B所对应的负样本特征向量越远离于所述第二锚样本平均特征向量。
  19. 根据权利要求14所述的介质,其中,在将所述第二特征向量输入训练好的判别器中,得到质量检测结果时,所述处理器,还用于将所述第二特征向量输入训练好的判别器中,得到质量检测数值;在所述质量检测数值高于阈值时,确定所述质量检测结果为所述待质量检测的病历数据无质量问题;在所述质量检测数值低于阈值时,确定所述质量检测结果为所述待质量检测的病历数据存在质量问题。
  20. 一种计算机程序产品,其中,包括计算机指令,当所述计算机指令在如权利要求8所述的设备上运行时,使得所述设备执行如权利要求1-6任一项所述的方法。
PCT/CN2020/099270 2020-05-15 2020-06-30 一种病历数据的质量检测方法和相关装置 WO2021114626A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010416797.9 2020-05-15
CN202010416797.9A CN111696637A (zh) 2020-05-15 2020-05-15 一种病历数据的质量检测方法和相关装置

Publications (1)

Publication Number Publication Date
WO2021114626A1 true WO2021114626A1 (zh) 2021-06-17

Family

ID=72477882

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099270 WO2021114626A1 (zh) 2020-05-15 2020-06-30 一种病历数据的质量检测方法和相关装置

Country Status (2)

Country Link
CN (1) CN111696637A (zh)
WO (1) WO2021114626A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883222B (zh) * 2020-09-28 2020-12-22 平安科技(深圳)有限公司 文本数据的错误检测方法、装置、终端设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109003678A (zh) * 2018-06-12 2018-12-14 清华大学 一种仿真文本病历的生成方法及系统
CN109656878A (zh) * 2018-12-12 2019-04-19 中电健康云科技有限公司 健康档案数据生成方法及装置
US20190251449A1 (en) * 2018-02-09 2019-08-15 Google Llc Learning longer-term dependencies in neural network using auxiliary losses
CN110335653A (zh) * 2019-06-30 2019-10-15 浙江大学 基于openEHR病历格式的非标准病历解析方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190251449A1 (en) * 2018-02-09 2019-08-15 Google Llc Learning longer-term dependencies in neural network using auxiliary losses
CN109003678A (zh) * 2018-06-12 2018-12-14 清华大学 一种仿真文本病历的生成方法及系统
CN109656878A (zh) * 2018-12-12 2019-04-19 中电健康云科技有限公司 健康档案数据生成方法及装置
CN110335653A (zh) * 2019-06-30 2019-10-15 浙江大学 基于openEHR病历格式的非标准病历解析方法

Also Published As

Publication number Publication date
CN111696637A (zh) 2020-09-22

Similar Documents

Publication Publication Date Title
CN112949786B (zh) 数据分类识别方法、装置、设备及可读存储介质
Bolón-Canedo et al. Feature selection for high-dimensional data
WO2022105115A1 (zh) 问答对匹配方法、装置、电子设备及存储介质
CN107016438B (zh) 一种基于中医辨证人工神经网络算法模型的系统
CN110991532B (zh) 基于关系视觉注意机制的场景图产生方法
WO2021217867A1 (zh) 基于XGBoost的数据分类方法、装置、计算机设备及存储介质
CN108197532A (zh) 人脸识别的方法、装置及计算机装置
CN110659723B (zh) 基于人工智能的数据处理方法、装置、介质及电子设备
WO2022105118A1 (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
CN109558902A (zh) 一种快速目标检测方法
Bianchini et al. Deep learning in science
CN111477337B (zh) 基于个体自适应传播网络的传染病预警方法、系统及介质
CN109948735A (zh) 一种多标签分类方法、系统、装置及存储介质
WO2023109631A1 (zh) 数据处理方法、装置、设备、存储介质及程序产品
WO2021120587A1 (zh) 基于oct的视网膜分类方法、装置、计算机设备及存储介质
CN109935337A (zh) 一种基于相似性度量的病案查找方法及系统
CN112201359A (zh) 基于人工智能的重症问诊数据识别方法及装置
CN116340793A (zh) 一种数据处理方法、装置、设备以及可读存储介质
CN115222443A (zh) 客户群体划分方法、装置、设备及存储介质
WO2021114626A1 (zh) 一种病历数据的质量检测方法和相关装置
CN114416929A (zh) 实体召回模型的样本生成方法、装置、设备及存储介质
Hasan et al. Improving Medical Image Decision‐Making by Leveraging Metacognitive Processes and Representational Similarity
CN116468043A (zh) 嵌套实体识别方法、装置、设备及存储介质
CN116433970A (zh) 甲状腺结节分类方法、系统、智能终端及存储介质
CN115762721A (zh) 一种基于计算机视觉技术的医疗影像质控方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20899985

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20899985

Country of ref document: EP

Kind code of ref document: A1