CN111696636A - Data processing method and device based on deep neural network - Google Patents

Data processing method and device based on deep neural network Download PDF

Info

Publication number
CN111696636A
CN111696636A CN202010412571.1A CN202010412571A CN111696636A CN 111696636 A CN111696636 A CN 111696636A CN 202010412571 A CN202010412571 A CN 202010412571A CN 111696636 A CN111696636 A CN 111696636A
Authority
CN
China
Prior art keywords
vector
category
quality
medical record
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010412571.1A
Other languages
Chinese (zh)
Other versions
CN111696636B (en
Inventor
李彦轩
唐蕊
孙行智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010412571.1A priority Critical patent/CN111696636B/en
Priority to PCT/CN2020/099539 priority patent/WO2021114637A1/en
Publication of CN111696636A publication Critical patent/CN111696636A/en
Application granted granted Critical
Publication of CN111696636B publication Critical patent/CN111696636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment discloses a data processing method and a device based on a deep neural network, relating to the technical field of artificial intelligence, wherein the method comprises the following steps: obtaining at least 2 training samples, sequentially inputting the at least 2 training samples into a constructed deep neural network DNN model for training, reducing a loss function of the DNN model after training to a preset fluctuation range, inputting a feature vector of medical record data to be predicted into the trained DNN model for processing to obtain a target embedded vector corresponding to the medical record data to be predicted, and determining the quality of the medical record data to be predicted according to the distance between the target embedded vector and the quality embedded vector and a preset quality abnormal distance. By adopting the embodiment of the application, the quality of medical record data can be screened from multiple aspects/angles, and the accuracy of quality screening is improved. In addition, the method and the device can be applied to the field of intelligent medical treatment, and therefore the construction of a smart city is promoted.

Description

Data processing method and device based on deep neural network
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a data processing method and device based on a deep neural network.
Background
Electronic medical records are digital medical records that are stored, managed, transmitted, and reproduced using electronic equipment, and record the overall process of diagnosis and treatment of patients in hospitals. However, in the process of recording the electronic medical records, the quality of the medical records, such as unqualified medical records or abnormal medical records, is often caused by professional factors such as misdiagnosis or non-professional factors such as recording errors.
With the development of computer technology, the quality problem of the electronic medical record can be screened by using a computer. However, at present, a computer mainly screens based on artificially formulated objective rules, so the coverage of computer screening is narrow, and the screening accuracy is low.
Disclosure of Invention
The embodiment of the application provides a data processing method and device based on a deep neural network, which can screen the quality of medical record data from multiple aspects/multiple angles and improve the accuracy of quality screening.
In a first aspect, an embodiment of the present application provides a data processing method based on a deep neural network, where the method includes:
acquiring at least 2 training samples, wherein each training sample in the at least 2 training samples is a quadruple, the quadruple comprises a feature vector of an anchor point, a feature vector of a positive sample, a feature vector of a negative sample and a feature vector of a false sample, the anchor point is medical record data with qualified quality, the positive sample is medical record data with the same category as the anchor point and with qualified quality, the negative sample is medical record data with different category from the anchor point and with qualified quality, and the false sample is medical record data with unqualified quality;
sequentially inputting the at least 2 training samples into a constructed deep neural network DNN model for training, so that a loss function of the DNN model after training is reduced to a preset fluctuation range, wherein the loss function of the DNN model is a four-tuple loss function, and the four-tuple loss function is determined by the difference between an embedded vector obtained by inputting the feature vector of the anchor point into the DNN model and an embedded vector obtained by inputting the feature vector of the positive sample, the feature vector of the negative sample and the feature vector of the false sample into the DNN model;
inputting the characteristic vector of the medical record data to be predicted into a trained DNN model for processing to obtain a target embedded vector corresponding to the medical record data to be predicted;
and determining the quality of the medical record data to be predicted according to the distance between the target embedded vector and the quality embedded vector and a preset quality abnormal distance.
With reference to the first aspect, in a possible implementation manner, the quadruple loss function is:
L=d(a,p)-d(a,n)-k*d(a,F);
wherein L represents the quadruple loss function, a represents the embedded vector obtained after the anchor point eigenvector is input into the DNN model, p represents the embedded vector obtained after the positive sample eigenvector is input into the DNN model, n represents the embedded vector obtained after the negative sample eigenvector is input into the DNN model, F represents the embedded vector obtained after the dummy sample eigenvector is input into the DNN model, k is a coefficient, d (a, p) represents the distance between a and p, d (a, n) represents the distance between a and n, and d (a, F) represents the distance between a and F.
With reference to the first aspect, in a possible implementation manner, determining the quality of the medical record data to be predicted according to a distance between the target embedded vector and the quality embedded vector and a preset quality anomaly distance includes:
if the distance between the target embedded vector and the quality embedded vector is larger than or equal to the preset quality abnormal distance, determining that the quality of the medical record data to be predicted is unqualified; and if the distance between the target embedded vector and the quality embedded vector is smaller than the quality abnormal distance, determining that the quality of the medical record data to be predicted is qualified.
With reference to the first aspect, in a possible implementation manner, before determining the quality of the medical record data to be predicted according to a distance between the target embedded vector and the quality embedded vector and a preset quality anomaly distance, the method further includes:
sequentially inputting the feature vectors of all the false samples in the at least 2 training samples into the trained DNN model for processing to obtain embedded vectors corresponding to all the false samples, wherein one false sample corresponds to one embedded vector; and determining a mean vector between the embedding vectors corresponding to all the false samples as a quality embedding vector.
With reference to the first aspect, in a possible implementation manner, after determining that the quality of the medical record data to be predicted is qualified, the method further includes: and determining the category of the medical record data to be predicted according to the distance between the target embedded vector and each category embedded vector and the category distance corresponding to each category embedded vector.
With reference to the first aspect, in a possible implementation manner, determining a category of medical record data to be predicted according to a distance between the target embedding vector and each category embedding vector and a category distance corresponding to each category embedding vector includes: and if the distance between the target embedding vector and the category embedding vector w in each category embedding vector is less than or equal to the category distance corresponding to the category embedding vector w, determining that the category of the medical record data to be predicted is a first category, and the first category is the category corresponding to the category embedding vector w.
With reference to the first aspect, in one possible implementation, the method further includes: and if the distance between the target embedded vector and each category embedded vector is greater than the category distance corresponding to each category embedded vector, determining that the category of the medical record data to be predicted is a second category, wherein the second category is different from the category corresponding to each category embedded vector.
In a second aspect, an embodiment of the present application provides a data classification apparatus, including:
the acquisition unit is used for acquiring at least 2 training samples, each training sample in the at least 2 training samples is a quadruple, the quadruple comprises a feature vector of an anchor point, a feature vector of a positive sample, a feature vector of a negative sample and a feature vector of a false sample, the anchor point is medical record data with qualified quality, the positive sample is medical record data with the same category as the anchor point and with qualified quality, the negative sample is medical record data with different category from the anchor point and with qualified quality, and the false sample is medical record data with unqualified quality;
the training unit is used for sequentially inputting the at least 2 training samples into a constructed deep neural network DNN model for training, so that the loss function of the DNN model after training is reduced to a preset fluctuation range, the loss function of the DNN model is a quadruple loss function, and the quadruple loss function is determined by the difference between an embedded vector obtained by inputting the feature vector of the anchor point into the DNN model and an embedded vector obtained by inputting the feature vector of the positive sample, the feature vector of the negative sample and the feature vector of the false sample into the DNN model;
the processing unit is used for inputting the feature vector of the medical record data to be predicted into the trained DNN model for processing to obtain a target embedded vector corresponding to the medical record data to be predicted;
and the first determining unit is used for determining the quality of the medical record data to be predicted according to the distance between the target embedded vector and the quality embedded vector and a preset quality abnormal distance.
With reference to the second aspect, in one possible implementation manner, the quadruple loss function is:
L=d(a,p)-d(a,n)-k*d(a,F);
wherein L represents the quadruple loss function, a represents the embedded vector obtained after the anchor point eigenvector is input into the DNN model, p represents the embedded vector obtained after the positive sample eigenvector is input into the DNN model, n represents the embedded vector obtained after the negative sample eigenvector is input into the DNN model, F represents the embedded vector obtained after the dummy sample eigenvector is input into the DNN model, k is a coefficient, d (a, p) represents the distance between a and p, d (a, n) represents the distance between a and n, and d (a, F) represents the distance between a and F.
With reference to the second aspect, in a possible implementation manner, the first determining unit is specifically configured to: when the distance between the target embedded vector and the quality embedded vector is larger than or equal to a preset quality abnormal distance, determining that the quality of the medical record data to be predicted is unqualified; and when the distance between the target embedded vector and the quality embedded vector is smaller than the quality abnormal distance, determining the quality of the medical record data to be predicted as qualified.
With reference to the second aspect, in a possible implementation manner, the processing unit is further configured to sequentially input the feature vectors of all false samples in the at least 2 training samples into the trained DNN model for processing, so as to obtain embedded vectors corresponding to all the false samples, where one false sample corresponds to one embedded vector; the data classification apparatus further includes a second determination unit configured to determine, as the quality embedding vector, a mean vector between the embedding vectors corresponding to all the false samples.
With reference to the second aspect, in a possible implementation manner, the first determining unit is further configured to: and determining the category of the medical record data to be predicted according to the distance between the target embedded vector and each category embedded vector and the category distance corresponding to each category embedded vector.
With reference to the second aspect, in a possible implementation manner, the first determining unit is further specifically configured to: and when the distance between the target embedding vector and the category embedding vector w in each category embedding vector is less than or equal to the category distance corresponding to the category embedding vector w, determining that the category of the medical record data to be predicted is a first category, wherein the first category is the category corresponding to the category embedding vector w.
With reference to the second aspect, in a possible implementation manner, the first determining unit is further configured to: and when the distance between the target embedded vector and each category embedded vector is greater than the category distance corresponding to each category embedded vector, determining that the category of the medical record data to be predicted is a second category, wherein the second category is different from the category corresponding to each category embedded vector.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program that supports a terminal to execute the above method, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the deep neural network-based data processing method of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute the deep neural network-based data processing method of the first aspect.
According to the embodiment of the application, at least 2 training samples are obtained, the at least 2 training samples are sequentially input into a constructed deep neural network DNN model for training, the loss function of the DNN model after training is reduced to a preset fluctuation range, the loss function of the DNN model is a four-tuple loss function, the feature vector of medical record data to be predicted is input into the trained DNN model for processing, a target embedded vector corresponding to the medical record data to be predicted is obtained, the quality of the medical record data to be predicted is determined according to the distance between the target embedded vector and the quality embedded vector and a preset quality abnormal distance, the quality of the medical record data to be predicted can be screened from multiple aspects or multiple angles, and the quality screening accuracy is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is an architecture diagram of a DNN model provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart of a deep neural network-based data processing method provided by an embodiment of the present application;
FIG. 3 is another schematic flow chart diagram of a deep neural network-based data processing method provided by an embodiment of the present application;
FIG. 4 is a schematic block diagram of a data processing apparatus provided by an embodiment of the present application;
fig. 5 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It is to be understood that the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
It should also be appreciated that reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
In order to better understand the data processing method based on the Deep Neural Network provided in the embodiment of the present application, a brief description will be given below of an architecture of the Deep Neural Network (DNN) provided in the embodiment of the present application.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a DNN model provided in an embodiment of the present application. The DNN may be divided according to the location of different layers, for example, the neural network layers inside the DNN may be divided into three types, an input layer, a hidden layer, and an output layer. As shown in fig. 1, the first layer of the DNN model is an input layer (input layer), the last layer is an output layer (output layer), and the middle layers are hidden layers (hidden layers), such as hidden layer 1(hidden layer 1), hidden layer 2(hidden layer 2), and hidden layer 3(hidden layer 3) in fig. 1. The layers of the DNN model are fully connected, namely any neuron at the ith layer is connected with any neuron at the (i + 1) th layer. The output layer of the neuron can have more than one output, and can have a plurality of outputs, and such a DNN model can be flexibly applied to classification regression and other machine learning fields, such as dimension reduction, clustering and the like. Understandably, the neurons of the output layer of the DNN model of the embodiment of the present application have multiple outputs, and are mainly used for dimensionality reduction and clustering in the field of machine learning. It is further understood that fig. 1 is only a schematic diagram, and the number of hidden layers of the DNN model is not limited in the embodiment of the present application.
The following describes a data processing method and apparatus based on a deep neural network according to an embodiment of the present application with reference to fig. 2 to 5. The data processing method based on the deep neural network can be applied to the field of intelligent medical treatment, the problem that coverage, efficiency and accuracy are low in the traditional manual screening process can be solved in the screening process of the electronic medical record through the method, the quality of medical record data is screened in multiple aspects/multiple angles, the accuracy of quality screening is improved, and then the construction of a smart city is promoted.
It can be understood that the medical record data with qualified quality mentioned in the embodiment of the present application refers to medical record data without quality problems such as misdiagnosis and recording errors, and the medical record data with unqualified quality refers to medical record data with quality problems such as misdiagnosis or recording errors.
Referring to fig. 2, fig. 2 is a schematic flow chart of a deep neural network-based data processing method provided by an embodiment of the present application. As shown in fig. 2, the data processing method based on the deep neural network may include:
s201, the electronic equipment obtains at least 2 training samples.
In some possible embodiments, each of the at least 2 training samples is a quadruple. Each quadruple includes 4 features, which are respectively a feature vector of an anchor point, a feature vector of a positive sample, a feature vector of a negative sample, and a feature vector of a false sample. The anchor point in the embodiment of the application refers to medical record data with qualified quality, the positive sample refers to medical record data with the same category as the anchor point and qualified quality, the negative sample refers to medical record data with different category from the anchor point and qualified quality, and the false sample refers to medical record data with unqualified quality.
In some possible implementations, the electronic device can randomly extract N medical record data from the medical record database. The developer can label the extracted N medical record data, respectively mark whether the quality of each medical record data in the N medical record data is qualified, and mark the category of the medical record data with qualified quality in the N medical record data. For convenience of description, the medical record data with qualified quality will be referred to as qualified medical record, and the medical record data with unqualified quality will be referred to as abnormal medical record. Optionally, the developer may classify qualified medical records in the N medical record data according to departments or disease parts recorded in the medical record data. For example, the department can be classified into medical, surgical, gynecological, pediatric, ENT, oncology, or infectious department, etc. It will be appreciated that the above-mentioned department classification can be more detailed, and the medical department can be further classified into respiratory department, digestive department, blood department, etc., and the surgery can be further classified into general surgery, cardiothoracic surgery, cardiovascular surgery, breast surgery, hepatobiliary surgery, etc. For example, the disease location can be classified into heart, liver, spleen, lung, kidney, ear, nose, throat, eye, etc.
The electronic device can use the N medical record data labeled with the quality and the category as a training data set. The electronic device can randomly select a qualified medical record from the K qualified medical records of the training data set as an anchor point and extract the feature vector of the anchor point. The electronic device can randomly select a qualified medical record with the same category as the anchor point from K-1 qualified medical records in the training data set as a positive sample, and extract the feature vector of the positive sample. The electronic device can randomly select a qualified medical record with a different category from the anchor point from the K-2 qualified medical records in the training data set as a negative sample, and extract a feature vector of the negative sample. The electronic device can randomly select an abnormal medical record from the N-K abnormal medical records in the training data set as a false sample, and extract a feature vector of the false sample. The electronic device may combine the feature vector of the anchor point, the feature vector of the positive sample, the feature vector of the negative sample, and the feature vector of the false sample into a quadruple, and may use the quadruple as a training sample. The electronic device determines at least 2 training samples from the training data set according to the method, wherein each training sample is a quadruple. The feature vector is used to describe feature information of medical record data, for example, the feature vector may include feature information of symptoms, examination results, diagnosis, and the like. The feature dimensions and the feature categories of the anchor point feature vector, the positive sample feature vector, the negative sample feature vector and the false sample feature vector are the same.
For example, assume that each piece of medical record data includes 5 features, feature a, feature B, feature C, feature D, and feature E. The feature vector X of anchor point ii=(Ai,Bi,Ci,Di,Ei) Feature vector X of positive sample jj=(Aj,Bj,Cj,Dj,Ej) Feature vector X of negative sample hh=(Ah,Bh,Ch,Dh,Eh) Feature vector X of false sample gg=(Ag,Bg,Cg,Dg,Eg). Therefore, the quadruple consisting of the feature vector of the anchor point i, the feature vector of the positive sample j, the feature vector of the negative sample h and the feature vector of the false sample g is (X)i,Xj,Xh,Xg) I.e. the training sample is (X)i,Xj,Xh,Xg)。
S202, the electronic equipment inputs at least 2 training samples into the constructed deep neural network DNN model in sequence for training.
In some possible embodiments, the loss function of the DNN model may be a quadruple loss function. The quadruple loss function can be determined by the difference between the embedded vector obtained after the characteristic vector of the anchor point is input into the DNN model and the embedded vector obtained after the characteristic vector of the positive sample, the characteristic vector of the negative sample and the characteristic vector of the false sample are input into the DNN model.
In some possible implementations, the electronic device may construct a DNN model including an input layer, one or more hidden layers, and an output layer with full connectivity between the layers according to the developer's settings (e.g., number of hidden layers, number of neurons in the input layer, number of neurons in the output layer, loss function, etc.). The electronic device may sequentially input the at least 2 training samples (i.e., the quadruples) into the constructed DNN model for training, so that a loss function of the DNN model after training is reduced to a preset fluctuation range. The loss function of the DNN model is a quadruple loss function, and in the training process, the quadruple loss function is used for constraining a quadruple embedded vector output by the DNN model. The quadruple embedding vector comprises an embedding vector corresponding to an anchor point, an embedding vector corresponding to a positive sample, an embedding vector corresponding to a negative sample and an embedding vector corresponding to a false sample.
Optionally, the above quadruple loss function satisfies formula (1-1):
L=d(a,p)-d(a,n)-k*d(a,F), (1-1)
where L represents the quadruple loss function and d (x, y) represents the L2 distance of x and y in sample space. a represents an embedded vector obtained after the feature vector of the anchor point is input into the DNN model, p represents an embedded vector obtained after the feature vector of the positive sample is input into the DNN model, n represents an embedded vector obtained after the feature vector of the negative sample is input into the DNN model, F represents an embedded vector obtained after the feature vector of the false sample is input into the DNN model, and k is a coefficient. d (a, p) represents the L2 distance between a and p, d (a, n) represents the L2 distance between a and n, and d (a, F) represents the L2 distance between a and F.
Alternatively, the L2 distance satisfies formula (1-2):
Figure BDA0002493820540000091
wherein Q represents the number of elements included in x, y, xiRepresenting the i-th element, y, in the vector xiRepresenting the ith element in the vector y. For example, assuming that x is (1,2,3,4) and y is (5,6,7,8), then
Figure BDA0002493820540000092
Figure BDA0002493820540000093
In the training process, the DNN model minimizes the value of the quadruple loss function L, so that the distance between the embedded vector corresponding to the anchor point and the embedded vector corresponding to the positive sample in the sample space is as close as possible, that is: the value of d (a, p) in the quadruple loss function L is made as small as possible (indicating that the distance of the embedded vectors corresponding to the qualified medical records of the same category in the sample space is short). Meanwhile, the distance between the embedded vector corresponding to the anchor point and the embedded vector corresponding to the negative sample and the embedded vector corresponding to the anchor point and the embedded vector corresponding to the false sample in the sample space are made as far as possible, that is: d (a, n) and k x d (a, F) in the four-tuple loss function L are made as large as possible (indicating that the distance between the embedding vectors corresponding to qualified medical records with different categories in the sample space is long and the distance between the embedding vectors corresponding to the qualified medical records and the embedding vectors corresponding to the abnormal medical records in the sample space is long). The DNN model minimizes the value of the quadruple loss function L, and can also make the distance between anchor points and false samples, between positive samples and false samples, and between negative samples and false samples in the sample space much larger than the distance between anchor points and embedded vectors corresponding to negative samples in the sample space, that is: let d (a, F), d (p, F) and d (n, F) in the quadruple loss function L be much larger than d (a, n).
It can be understood that, since the anchor point, the positive sample and the negative sample are qualified medical records and the false sample is an abnormal medical record, when the quadruple loss function is minimized, d (a, F), d (p, F) and d (n, F) are far greater than d (a, n), so that the DNN model can learn the difference between the qualified medical record and the abnormal medical record, thereby identifying the abnormal medical record. Since the anchor point is a qualified medical record and the positive sample is a qualified medical record of the same category as the anchor point, the DNN model can learn the distribution (or characteristics) of the qualified medical records of the same category by minimizing the value of d (a, p) when minimizing the quadruple loss function. It can also be understood that the DNN model learns the mapping relationship between the input feature vector and the output embedded vector by minimizing the value of the quadruple loss function L in the training process, i.e., adjusting the value of each dimension element in the embedded vector, thereby gradually constraining the output result of the DNN model to the distribution corresponding to the model learning target.
The dimension of the characteristic vector input into the DNN model is larger than the dimension of the embedded vector output by the DNN model, and the characteristic in the embedded vector belongs to the characteristic of the characteristic vector. For example, the feature vector is a 1000-dimensional vector, and the embedded vector is a fixed 100-dimensional vector. As another example, the feature vector of the input DNN model includes A, B, C, D and E, and the embedded vector of the DNN model output may include B, D and E.
It can be understood that, according to the training process, the qualified medical records of the same category are concentrated in the same sample cluster, the qualified medical records of different categories are distributed in different sample clusters within a certain range, and the abnormal medical records are distributed at a position far away from the qualified medical records.
In some possible embodiments, when the value of the quadruple loss function L is no longer reduced (or fluctuates within a certain preset fluctuation range) during the training process, it indicates that the DNN model tends to be stable at this time, and the constraint condition of the quadruple loss function L is satisfied, and then the DNN model training is completed. Understandably, the more training samples used in the training process, the better the performance of the trained DNN model.
S203, the electronic equipment inputs the feature vector of the medical record data to be predicted into the trained DNN model for processing to obtain a target embedded vector corresponding to the medical record data to be predicted.
In some possible embodiments, the electronic device may randomly acquire a piece of medical record data to be predicted from the medical record database, and may extract a feature vector of the medical record data to be predicted. The medical record data to be predicted refers to medical record data with unknown quality and/or unknown category. The feature vector of the medical record data to be predicted comprises the same features (the feature types and feature sequences are the same) and the same dimensions as those of the anchor point, the positive sample, the negative sample and the false sample in the training process. For example, the dimension of the feature vector used in the training process is 1000 dimensions, and then the dimension of the feature vector of the medical record data to be predicted is also 1000 dimensions; assuming that the feature vector used in the training process includes A, B, C, D and E, the feature vector of the predicted medical record data also includes A, B, C, D and E.
The electronic device may input the feature vector of the medical record data to be predicted into the trained DNN model for processing, and the trained DNN model maps the input feature vector and outputs a target embedded vector corresponding to the medical record data to be predicted. As can be appreciated, the embedded vector has dimensions that are lower and denser than the dimensions of the feature vector, and the DNN model projects the feature vector into a feature space with a lower dimension, resulting in an embedded vector.
S204, the electronic equipment determines the quality of the medical record data to be predicted according to the distance between the target embedded vector and the quality embedded vector and the preset quality abnormal distance.
In some possible implementations, the electronic device may obtain the quality embedding vector and may obtain a preset quality anomaly distance. The electronic device may calculate a distance between the target embedding vector and the quality embedding vector, and may compare a magnitude relationship between the distance between the target embedding vector and the quality embedding vector and a preset quality anomaly distance. And if the distance between the target embedded vector and the quality embedded vector is greater than or equal to the preset quality abnormal distance, the electronic equipment determines that the quality of the medical record data to be predicted is unqualified, namely the medical record data to be predicted is an abnormal medical record. And if the distance between the target embedded vector and the quality embedded vector is smaller than the quality abnormal distance, the electronic equipment determines that the quality of the medical record data to be predicted is qualified, namely the medical record data to be predicted is a qualified medical record. Wherein the quality embedded vector can be used to reflect characteristics of an abnormal medical record. The distance in the embodiment of the present application may refer to an L2 distance. The preset quality anomaly distance can be determined based on medical record data in the training data set.
The embodiment of the application trains the deep learning model by using the electronic medical record data, so that the model can learn the potential distribution of the medical record data with qualified quality, and the quality evaluation is carried out according to the distribution that whether the medical record data accords with the model learning, thereby enlarging the coverage of the quality evaluation, screening the quality of the medical record data from multiple aspects/multiple angles and improving the accuracy of the quality screening.
In some possible embodiments, the method for acquiring the quality embedding vector by the electronic device specifically includes: the electronic device may extract feature vectors of all the false samples (i.e., the N-K abnormal medical records) in the at least 2 training samples, and may sequentially input the feature vectors of all the false samples in the at least 2 training samples into the trained DNN model for processing, so as to obtain embedded vectors corresponding to all the false samples. One of the false samples corresponds to one of the embedded vectors (N-K abnormal medical records correspond to N-K embedded vectors). The electronic device can calculate a mean vector between the embedding vectors corresponding to all the false samples (i.e., N-K embedding vectors corresponding to N-K abnormal medical records), and can use the mean vector as a quality embedding vector.
Optionally, in order to ensure reliability and privacy of medical record data, the medical record data (including training samples and medical record data to be predicted) may be uploaded to a block link point in a block link system in advance, and when the data processing method based on the deep neural network of the present application is executed, related data of the training samples may be obtained from the block link node in the block link system, a DNN model is trained, the medical record data to be predicted is obtained from the block link node, the DNN model is input to determine a target embedding vector, and then the quality of the medical record data with prediction is determined according to the target embedding vector. The quality evaluation of the medical record data of the patient is accurately, safely and privately realized.
Optionally, the data processing method based on the deep neural network in the present application may also be executed based on an intelligent contract deployed in the block chain system, for example, after the DNN model training is completed, the distance between the target embedded vector and the quality embedded vector may be determined by the intelligent contract, and the quality of the data with the predicted medical record is determined by the intelligent contract according to the distance and the preset quality abnormal distance. Further optionally, after the quality of the medical record to be predicted is determined, the quality of the medical record to be predicted determined by the intelligent contract can be uploaded to the block chain, so that the reliability and the privacy of medical record data are ensured.
It should be noted that the blockchain in the present invention is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. The blockchain is essentially a decentralized database, which is a string of data blocks associated by using cryptography, each data block contains information of a batch of network transactions, and the information is used for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In the embodiment of the application, the electronic device obtains at least 2 training samples, the at least 2 training samples are sequentially input into a constructed DNN model for training, so that a loss function of the DNN model after training is reduced to a preset fluctuation range, the loss function of the DNN model is a four-tuple loss function, a feature vector of medical record data to be predicted is input into the trained DNN model for processing, a target embedded vector corresponding to the medical record data to be predicted is obtained, the quality of the medical record data to be predicted is determined according to the distance between the target embedded vector and the quality embedded vector and a preset quality abnormal distance, the coverage of quality evaluation can be enlarged, the quality of the medical record data is screened from multiple aspects/multiple angles, and the accuracy of quality screening is improved.
Referring to fig. 3, fig. 3 is another schematic flow chart of a deep neural network-based data processing method provided in an embodiment of the present application. As shown in fig. 3, the data processing method based on the deep neural network may include:
s301, the electronic equipment obtains at least 2 training samples.
S302, the electronic equipment inputs at least 2 training samples into the constructed deep neural network DNN model in sequence for training.
And S303, the electronic equipment inputs the feature vector of the medical record data to be predicted into the trained DNN model for processing to obtain a target embedded vector corresponding to the medical record data to be predicted.
In some possible implementations, the implementations of steps S301 to S303 in the embodiment of the present application may refer to the implementations of steps S201 to S203 in the embodiment shown in fig. 2, and are not described herein again.
S304, the electronic equipment sequentially inputs the feature vectors of all the false samples in at least 2 training samples into the trained DNN model for processing to obtain the embedded vectors corresponding to all the false samples, wherein one false sample corresponds to one embedded vector.
S305, the electronic device determines a vector mean between the embedded vectors corresponding to all the false samples as a quality embedded vector.
In some possible embodiments, each of the at least 2 training samples is a quadruple. Each quadruple includes 4 features, which are respectively a feature vector of an anchor point, a feature vector of a positive sample, a feature vector of a negative sample, and a feature vector of a false sample. The anchor point in the embodiment of the application refers to medical record data with qualified quality, the positive sample refers to medical record data with the same category as the anchor point and qualified quality, the negative sample refers to medical record data with different category from the anchor point and qualified quality, and the false sample refers to medical record data with unqualified quality. For convenience of description, the medical record data with qualified quality will be referred to as qualified medical record, and the medical record data with unqualified quality will be referred to as abnormal medical record.
In some possible embodiments, the electronic device may extract feature vectors of all false samples in the at least 2 training samples, and may sequentially input the feature vectors of all false samples in the at least 2 training samples into the trained DNN model for processing, so as to obtain embedded vectors corresponding to all false samples. Where one false sample corresponds to one embedded vector. The electronic device may calculate a mean vector between the embedded vectors corresponding to all the false samples and may use the mean vector as the quality embedded vector.
S306, if the distance between the target embedded vector and the quality embedded vector is larger than or equal to the preset quality abnormal distance, the electronic equipment determines that the quality of the medical record data to be predicted is unqualified.
S307, if the distance between the target embedded vector and the quality embedded vector is smaller than the quality abnormal distance, the electronic equipment determines that the quality of the medical record data to be predicted is qualified.
In some possible embodiments, after obtaining the quality embedding vector, the electronic device may obtain a preset quality anomaly distance. The electronic device may calculate a distance between the target embedding vector and the quality embedding vector, and may compare a magnitude relationship between the distance between the target embedding vector and the quality embedding vector and a preset distance of the quality anomaly. And if the distance between the target embedded vector and the quality embedded vector is greater than or equal to the preset quality abnormal distance, the electronic equipment determines that the quality of the medical record data to be predicted is unqualified, namely the medical record data to be predicted is an abnormal medical record. And if the distance between the target embedded vector and the quality embedded vector is smaller than the quality abnormal distance, the electronic equipment determines that the quality of the medical record data to be predicted is qualified, namely the medical record data to be predicted is a qualified medical record. Wherein the quality embedded vector can be used to reflect characteristics of an abnormal medical record. The distance in the embodiment of the present application may refer to an L2 distance. The preset quality anomaly distance can be determined based on medical record data in the training data set.
And S308, under the condition that the quality of the medical record data to be predicted is determined to be qualified, the electronic equipment determines the category of the medical record data to be predicted according to the distance between the target embedded vector and each category embedded vector and the category distance corresponding to each category embedded vector.
In some possible embodiments, when the quality of the medical record data to be predicted is determined to be qualified, the electronic device may obtain each category embedding vector, and may obtain a preset each category distance. Wherein one class embedding vector corresponds to one class distance. The electronic device may calculate distances between the target embedding vector and each category embedding vector, and may compare magnitude relationships between the distance between the target embedding vector and each category embedding vector and the category distance corresponding to each category embedding vector. If the distance between the target embedding vector and the category embedding vector w in each category embedding vector is less than or equal to the category distance corresponding to the category embedding vector w, the electronic device may determine that the category of the medical record data to be predicted is a first category, and the first category may be a category corresponding to the category embedding vector w. If the distance between the target embedding vector and the category embedding vector w is greater than the category distance corresponding to the category embedding vector w, it is indicated that the category of the medical record data to be predicted is different from the category corresponding to the category embedding vector.
On the one hand, the deep learning model is trained by using the electronic medical record data, so that the model can learn potential distribution of medical record data with qualified quality, and quality evaluation is performed according to the distribution that whether the medical record data accords with the model learning, therefore, the coverage of quality evaluation can be enlarged, the quality of the medical record data is screened from multiple aspects/multiple angles, and the accuracy of quality screening is improved. On the other hand, the embedded vector output by the model is constrained by the four-tuple loss function, so that medical record data with qualified quality can be distinguished from medical record data with unqualified quality, and the medical record data with qualified quality can be classified according to medical record categories.
Optionally, if the distance between the target embedding vector and each category embedding vector is greater than the category distance corresponding to each category embedding vector, which indicates that the category of the medical record data to be predicted does not belong to any existing category, the electronic device regards the category of the medical record data to be predicted as a second category. The second category is different from the categories corresponding to the various category embedding vectors.
For example, assume that there are 4 class-embedded vectors, class-embedded vector S1, class-embedded vector S2, class-embedded vector S3, and class-embedded vector S4, respectively. Assume that there are 4 category distances, category distances 1,2,3, and 4; the category embedding vector S1 corresponds to a category distance of 1, and the category embedding vector S2 corresponds to a category distance of 2; the category embedding vector S3 corresponds to category distance 3, and the category embedding vector S4 corresponds to category distance 4. Assume that the classes corresponding to the class embedding vectors S1, S2, S3, and S4 are class 1, class 2, class 3, and class 4, respectively. The electronic device calculates distances D1, D2, D3, D4 between the target and class embedded vectors S1, S2, S3, and S4 in order. The electronic device compares the magnitude relationship between the distances D1, D2, D3, and D4 between the target and class embedded vectors S1, S2, S3, and S4. If D1 is less than or equal to the category distance 1, D2 is greater than the category distance 2, D3 is greater than the category distance 3, and D4 is greater than the category distance 4, the electronic device determines the category of the medical record data to be predicted as the category corresponding to the category embedding vector S1, namely category 1. If the D1 is greater than the category distance 1, the D2 is greater than the category distance 2, the D3 is greater than the category distance 3, and the D4 is greater than the category distance 4, it is indicated that the category of the medical record data to be predicted is different from the categories corresponding to the category embedding vectors S1, S2, S3 and S4, and it is also indicated that the medical record data to be predicted does not belong to any existing category, the electronic device takes the category of the medical record data to be predicted as an independent category, such as a second category. It is to be understood that if D1 is less than or equal to the category distance 1, D2 is greater than the category distance 2, D3 is also less than or equal to the category distance 3, and D4 is greater than the category distance 4, the electronic device determines that the category of the medical record data to be predicted is the category corresponding to the category embedding vector S1, i.e., category 1, and the category corresponding to the category embedding vector S3, i.e., category 3.
Alternatively, the electronic device may compare the magnitude relationship between the distance between the target embedding vector and a class embedding vector and the class distance corresponding to the class embedding vector every time the distance between the target embedding vector and the class embedding vector is calculated. For example, the electronic device calculates a distance D1 between the target embedding vector and the category embedding vector S1, and compares the magnitude relationship between the distance D1 and the category distance 1 corresponding to the category embedding vector S1. If D1 is less than or equal to the category distance 1, the electronic device determines the category of the medical record data to be predicted as the category corresponding to the category embedding vector S1, namely category 1. If D1 is greater than the category distance 1, the electronic device calculates a distance D2 between the target embedded vector and the category embedded vector S2 and compares the magnitude relationship between the distance D2 and the category distance 2 corresponding to the category embedded vector S2. If D2 is less than or equal to the category distance 2, the electronic device determines the category of the medical record data to be predicted as the category corresponding to the category embedding vector S2, namely category 2. If D2 is greater than the category distance 2, the electronic device calculates the distance D3 between the target embedding vector and the category embedding vector S3, compares the magnitude relationship between the distance D3 and the category distance 3 corresponding to the category embedding vector S3, and so on until the electronic device determines the category of the medical record data to be predicted.
In some possible embodiments, if the distance between the target embedding vector and each category embedding vector is greater than the category distance corresponding to each category embedding vector, which indicates that the category of the medical record data to be predicted does not belong to any existing category, the electronic device may calculate an absolute difference between the distance between the target embedding vector and each category embedding vector and the category distance corresponding to each category embedding vector. And the electronic equipment determines the category of the medical record data to be predicted as the category corresponding to the minimum absolute difference value in all the absolute difference values. For example, assume that the distance D1 between the target and category embedding vectors S1 is greater than category distance 1, the distance D2 between the target and category embedding vectors S2 is greater than category distance 2, the distance D3 between the target and category embedding vectors S3 is greater than category distance 3, and the distance D4 between the target and category embedding vectors S4 is greater than category distance 4. The electronic device calculates the absolute difference a1 of the distance D1 from the category distance 1, the absolute difference a2 of the distance D2 from the category distance 2, the absolute difference A3 of the distance D3 from the category distance 3, and the absolute difference a4 of the distance D4 from the category distance 4, respectively. The electronic equipment determines the minimum absolute difference value from the absolute difference values A1, A2, A3 and A4, and determines the category of the medical record data to be predicted as the category corresponding to the minimum absolute difference value. Assuming that the minimum absolute difference is A3 and the category corresponding to A3 is category 3, the category of the medical record data to be predicted is the category corresponding to A3, that is, category 3.
In some possible embodiments, the electronic device obtaining each class embedding vector specifically includes: the electronic device can extract the feature vectors of the qualified medical records belonging to the same category in the at least 2 training samples, and can sequentially input the feature vectors of the qualified medical records belonging to the same category into the trained DNN model for processing to obtain a plurality of embedded vectors corresponding to the qualified medical records of the same category. One of the qualified medical records corresponds to one of the embedded vectors. The electronic device can calculate a mean vector of a plurality of embedding vectors corresponding to a plurality of qualified medical records of the same category, and take the mean vector as the category embedding vector of the category. For example, assuming that at least 2 training samples include 4 classes, namely class 1, class 2, class 3, and class 4, the electronic device finally determines 4 class embedding vectors, namely, a class embedding vector of class 1, a class embedding vector of class 2, a class embedding vector of class 3, and a class embedding vector of class 4.
In the embodiment of the application, the electronic device obtains at least 2 training samples, sequentially inputs the at least 2 training samples into a constructed DNN model for training, so that a loss function of the DNN model after training is reduced to a preset fluctuation range, the loss function of the DNN model is a quadruple loss function, characteristic vectors of medical record data to be predicted are input into the trained DNN model for processing, target embedded vectors corresponding to the medical record data to be predicted are obtained, then the characteristic vectors of all false samples in the at least 2 training samples are sequentially input into the trained DNN model for processing, embedded vectors corresponding to all the false samples are obtained, vector means between the embedded vectors corresponding to all the false samples are determined as quality embedded vectors, when the distance between the target embedded vectors and the quality embedded vectors is greater than or equal to a preset quality abnormal distance, the electronic device determines that the quality of the medical record data to be predicted is unqualified, when the distance between the target embedded vector and the quality embedded vector is smaller than the quality abnormal distance, the electronic equipment determines that the quality of the medical record data to be predicted is qualified, and under the condition that the quality of the medical record data to be predicted is determined to be qualified, the electronic equipment determines the category of the medical record data to be predicted according to the distance between the target embedded vector and each category embedded vector and the category distance corresponding to each category embedded vector. The method can enlarge the coverage of quality evaluation, screen the quality of medical record data from multiple aspects/angles, improve the accuracy of quality screening, and classify the medical record data with qualified quality according to the medical record category.
Referring to fig. 4, fig. 4 is a schematic block diagram of a data processing apparatus provided in an embodiment of the present application. As shown in fig. 4, the data processing apparatus according to the embodiment of the present application may include: an acquisition unit 10, a training unit 20, a processing unit 30 and a first determination unit 40.
The acquiring unit 10 is configured to acquire at least 2 training samples, where each training sample in the at least 2 training samples is a quadruple, the quadruple includes a feature vector of an anchor point, a feature vector of a positive sample, a feature vector of a negative sample, and a feature vector of a dummy sample, the anchor point is medical record data with qualified quality, the positive sample is medical record data with the same category as the anchor point and with qualified quality, the negative sample is medical record data with different category from the anchor point and with qualified quality, and the dummy sample is medical record data with unqualified quality;
the training unit 20 is configured to sequentially input the at least 2 training samples into the constructed deep neural network DNN model for training, so that a loss function of the DNN model after training is reduced to a preset fluctuation range, where the loss function of the DNN model is a quadruple loss function, and the quadruple loss function is determined by a difference between an embedded vector obtained by inputting the feature vector of the anchor point into the DNN model and an embedded vector obtained by inputting the feature vector of the positive sample, the feature vector of the negative sample, and the feature vector of the dummy sample into the DNN model;
the processing unit 30 is configured to input the feature vector of the medical record data to be predicted into the trained DNN model for processing, so as to obtain a target embedded vector corresponding to the medical record data to be predicted;
the first determining unit 40 is configured to determine the quality of the medical record data to be predicted according to a distance between the target embedded vector and the quality embedded vector and a preset quality abnormal distance.
In some possible embodiments, the above-mentioned quadruple loss function is:
L=d(a,p)-d(a,n)-k*d(a,F);
wherein L represents the quadruple loss function, a represents the embedded vector obtained after the anchor point eigenvector is input into the DNN model, p represents the embedded vector obtained after the positive sample eigenvector is input into the DNN model, n represents the embedded vector obtained after the negative sample eigenvector is input into the DNN model, F represents the embedded vector obtained after the dummy sample eigenvector is input into the DNN model, k is a coefficient, d (a, p) represents the distance between a and p, d (a, n) represents the distance between a and n, and d (a, F) represents the distance between a and F.
In some possible embodiments, the first determining unit 40 is specifically configured to: when the distance between the target embedded vector and the quality embedded vector is larger than or equal to a preset quality abnormal distance, determining that the quality of the medical record data to be predicted is unqualified; and when the distance between the target embedded vector and the quality embedded vector is smaller than the quality abnormal distance, determining the quality of the medical record data to be predicted as qualified.
In some possible embodiments, the data classification apparatus further includes a second determination unit 50. The processing unit 30 is further configured to sequentially input the feature vectors of all the false samples in the at least 2 training samples into the trained DNN model for processing, so as to obtain embedded vectors corresponding to all the false samples, where one false sample corresponds to one embedded vector; the second determining unit 50 is configured to determine a mean vector between the embedded vectors corresponding to all the false samples as a quality embedded vector.
In some possible embodiments, the first determining unit 40 is further configured to: and determining the category of the medical record data to be predicted according to the distance between the target embedded vector and each category embedded vector and the category distance corresponding to each category embedded vector.
In some possible embodiments, the first determining unit 40 is further specifically configured to: and when the distance between the target embedding vector and the category embedding vector w in each category embedding vector is less than or equal to the category distance corresponding to the category embedding vector w, determining that the category of the medical record data to be predicted is a first category, wherein the first category is the category corresponding to the category embedding vector w.
In some possible embodiments, the first determining unit 40 is further configured to: and when the distance between the target embedded vector and each category embedded vector is greater than the category distance corresponding to each category embedded vector, determining that the category of the medical record data to be predicted is a second category, wherein the second category is different from the category corresponding to each category embedded vector.
The acquiring unit 10, the training unit 20, the processing unit 30, the first determining unit 40, and the second determining unit 50 may be integrated into a module, such as a processing module.
In a specific implementation, the data processing apparatus may execute, by using the modules, the implementation manners provided in the steps in the implementation manners provided in fig. 2 or fig. 3 to implement the functions implemented in the embodiments, which may specifically refer to the corresponding descriptions provided in the steps in the method embodiment shown in fig. 2 or fig. 3, and are not described herein again.
In the embodiment of the application, the data processing device obtains at least 2 training samples, sequentially inputs the at least 2 training samples into the constructed deep neural network DNN model for training, so that the loss function of the DNN model after training is reduced to a preset fluctuation range, the loss function of the DNN model is a four-tuple loss function, inputs the feature vector of the medical record data to be predicted into the trained DNN model for processing, obtains the target embedded vector corresponding to the medical record data to be predicted, determines the quality of the medical record data to be predicted according to the distance between the target embedded vector and the quality embedded vector and the preset quality abnormal distance, can screen the quality of the medical record data from multiple aspects/multiple angles, and improves the accuracy of quality screening.
Referring to fig. 5, fig. 5 is a schematic block diagram of an electronic device provided in an embodiment of the present application. As shown in fig. 5, the electronic device in the embodiment of the present application may include: one or more processors 501 and memory 502. The processor 501 and the memory 502 are connected by a bus 503. The memory 502 is used to store a computer program comprising program instructions and the processor 501 is used to execute the program instructions stored by the memory 502. Wherein the processor 501 is configured to call the program instruction to perform:
acquiring at least 2 training samples, wherein each training sample in the at least 2 training samples is a quadruple, the quadruple comprises a feature vector of an anchor point, a feature vector of a positive sample, a feature vector of a negative sample and a feature vector of a false sample, the anchor point is medical record data with qualified quality, the positive sample is medical record data with the same category as the anchor point and with qualified quality, the negative sample is medical record data with different category from the anchor point and with qualified quality, and the false sample is medical record data with unqualified quality;
sequentially inputting the at least 2 training samples into a constructed deep neural network DNN model for training, so that a loss function of the DNN model after training is reduced to a preset fluctuation range, wherein the loss function of the DNN model is a four-tuple loss function, and the four-tuple loss function is determined by the difference between an embedded vector obtained by inputting the feature vector of the anchor point into the DNN model and an embedded vector obtained by inputting the feature vector of the positive sample, the feature vector of the negative sample and the feature vector of the false sample into the DNN model;
inputting the characteristic vector of the medical record data to be predicted into a trained DNN model for processing to obtain a target embedded vector corresponding to the medical record data to be predicted;
and determining the quality of the medical record data to be predicted according to the distance between the target embedded vector and the quality embedded vector and a preset quality abnormal distance.
It should be understood that, in the embodiment of the present Application, the Processor 501 may be a Central Processing Unit (CPU), and may also be other general processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 502 may include both read-only memory and random access memory, and provides instructions and data to the processor 501. A portion of the memory 502 may also include non-volatile random access memory. For example, the memory 502 may also store device type information.
In a specific implementation, the processor 501 described in this embodiment of the present application may execute the implementation manner of the data processing method based on the deep neural network provided in this embodiment of the present application, and may also execute the implementation manner of the data processing apparatus described in this embodiment of the present application, which is not described herein again.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a processor, the data processing method based on a deep neural network shown in fig. 2 or fig. 3 is implemented, for specific details, please refer to the description of the embodiment shown in fig. 2 or fig. 3, which is not described herein again.
The computer readable storage medium may be the data processing apparatus or an internal storage unit of the electronic device according to any of the foregoing embodiments, for example, a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, which are provided on the electronic device. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the electronic device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable medical data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable medical data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable medical data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable medical data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the present application has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and figures are merely exemplary of the present application as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the present application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A data processing method based on a deep neural network is characterized by comprising the following steps:
acquiring at least 2 training samples, wherein each training sample in the at least 2 training samples is a quadruple, the quadruple comprises a feature vector of an anchor point, a feature vector of a positive sample, a feature vector of a negative sample and a feature vector of a false sample, the anchor point is medical record data with qualified quality, the positive sample is medical record data with the same type as the anchor point and with qualified quality, the negative sample is medical record data with different type from the anchor point and with qualified quality, and the false sample is medical record data with unqualified quality;
sequentially inputting the at least 2 training samples into a constructed deep neural network DNN model for training, and reducing a loss function of the DNN model after training to a preset fluctuation range, wherein the loss function of the DNN model is a four-tuple loss function, and the four-tuple loss function is determined by the difference between an embedded vector obtained by inputting the feature vector of the anchor point into the DNN model and an embedded vector obtained by inputting the feature vector of the positive sample, the feature vector of the negative sample and the feature vector of the false sample into the DNN model;
inputting the characteristic vector of the medical record data to be predicted into a trained DNN model for processing to obtain a target embedded vector corresponding to the medical record data to be predicted;
and determining the quality of the medical record data to be predicted according to the distance between the target embedded vector and the quality embedded vector and a preset quality abnormal distance.
2. The method of claim 1, wherein the quadruple loss function is:
L=d(a,p)-d(a,n)-k*d(a,F);
wherein L represents the quadruple loss function, a represents the embedding vector obtained after the feature vector of the anchor point is input into the DNN model, p represents the embedding vector obtained after the feature vector of a positive sample is input into the DNN model, n represents the embedding vector obtained after the feature vector of a negative sample is input into the DNN model, F represents the embedding vector obtained after the feature vector of a false sample is input into the DNN model, k is a coefficient, d (a, p) represents the distance between a and p, d (a, n) represents the distance between a and n, and d (a, F) represents the distance between a and F.
3. The method according to claim 1 or 2, wherein the determining the quality of the medical record data to be predicted according to the distance between the target embedding vector and the quality embedding vector and a preset quality anomaly distance comprises:
if the distance between the target embedded vector and the quality embedded vector is larger than or equal to a preset quality abnormal distance, determining that the quality of the medical record data to be predicted is unqualified;
and if the distance between the target embedded vector and the quality embedded vector is smaller than the quality abnormal distance, determining that the quality of the medical record data to be predicted is qualified.
4. The method according to claim 1, wherein before determining the quality of the medical record data to be predicted according to the distance between the target embedding vector and the quality embedding vector and a preset quality anomaly distance, the method further comprises:
sequentially inputting the feature vectors of all the false samples in the at least 2 training samples into a trained DNN model for processing to obtain embedded vectors corresponding to all the false samples, wherein one false sample corresponds to one embedded vector;
and determining a mean vector among the embedding vectors corresponding to all the false samples as a quality embedding vector.
5. The method of claim 3, wherein after determining that the medical record data to be predicted is qualified in quality, the method further comprises:
and determining the category of the medical record data to be predicted according to the distance between the target embedded vector and each category embedded vector and the category distance corresponding to each category embedded vector.
6. The method of claim 5, wherein the determining the category of the medical record data to be predicted according to the distance between the target embedding vector and each category embedding vector and the category distance corresponding to each category embedding vector comprises:
and if the distance between the target embedded vector and the category embedded vector w in each category embedded vector is less than or equal to the category distance corresponding to the category embedded vector w, determining that the category of the medical record data to be predicted is a first category, wherein the first category is the category corresponding to the category embedded vector w.
7. The method of claim 6, further comprising:
and if the distance between the target embedded vector and each category of embedded vector is greater than the category distance corresponding to each category of embedded vector, determining that the category of the medical record data to be predicted is a second category, wherein the second category is different from the category corresponding to each category of embedded vector.
8. A data processing apparatus, comprising:
the acquisition unit is used for acquiring at least 2 training samples, each training sample in the at least 2 training samples is a quadruple, the quadruple comprises a feature vector of an anchor point, a feature vector of a positive sample, a feature vector of a negative sample and a feature vector of a false sample, the anchor point is medical record data with qualified quality, the positive sample is medical record data with the same type as the anchor point and with qualified quality, the negative sample is medical record data with different type from the anchor point and with qualified quality, and the false sample is medical record data with unqualified quality;
the training unit is used for sequentially inputting the at least 2 training samples into a constructed deep neural network DNN model for training, so that a loss function of the DNN model after training is reduced to a preset fluctuation range, the loss function of the DNN model is a quadruple loss function, and the quadruple loss function is determined by the difference between an embedded vector obtained by inputting the feature vector of the anchor point into the DNN model and an embedded vector obtained by inputting the feature vector of the positive sample, the feature vector of the negative sample and the feature vector of the false sample into the DNN model;
the processing unit is used for inputting the feature vector of the medical record data to be predicted into the trained DNN model for processing to obtain a target embedded vector corresponding to the medical record data to be predicted;
and the first determining unit is used for determining the quality of the medical record data to be predicted according to the distance between the target embedded vector and the quality embedded vector and a preset quality abnormal distance.
9. An electronic device, comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to any of claims 1-7.
CN202010412571.1A 2020-05-15 2020-05-15 Data processing method and device based on deep neural network Active CN111696636B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010412571.1A CN111696636B (en) 2020-05-15 2020-05-15 Data processing method and device based on deep neural network
PCT/CN2020/099539 WO2021114637A1 (en) 2020-05-15 2020-06-30 Deep neural network-based method and device for data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010412571.1A CN111696636B (en) 2020-05-15 2020-05-15 Data processing method and device based on deep neural network

Publications (2)

Publication Number Publication Date
CN111696636A true CN111696636A (en) 2020-09-22
CN111696636B CN111696636B (en) 2023-09-22

Family

ID=72477848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010412571.1A Active CN111696636B (en) 2020-05-15 2020-05-15 Data processing method and device based on deep neural network

Country Status (2)

Country Link
CN (1) CN111696636B (en)
WO (1) WO2021114637A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883222A (en) * 2020-09-28 2020-11-03 平安科技(深圳)有限公司 Text data error detection method and device, terminal equipment and storage medium
CN112099739A (en) * 2020-11-10 2020-12-18 大象慧云信息技术有限公司 Classified batch printing method and system for paper invoices

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359669A (en) * 2018-09-10 2019-02-19 平安科技(深圳)有限公司 Method for detecting abnormality, device, computer equipment and storage medium are submitted an expense account in medical insurance
US20190197429A1 (en) * 2016-12-12 2019-06-27 Tencent Technology (Shenzhen) Company Limited Method and apparatus for training classification model, and method and apparatus for classifying data
CN110597878A (en) * 2019-09-16 2019-12-20 广东工业大学 Cross-modal retrieval method, device, equipment and medium for multi-modal data
CN110598006A (en) * 2019-09-17 2019-12-20 南京医渡云医学技术有限公司 Model training method, triplet embedding method, apparatus, medium, and device
WO2020073507A1 (en) * 2018-10-11 2020-04-16 平安科技(深圳)有限公司 Text classification method and terminal
CN111062495A (en) * 2019-11-28 2020-04-24 深圳市华尊科技股份有限公司 Machine learning method and related device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103076334B (en) * 2013-01-25 2014-12-17 上海理工大学 Method for quantitatively evaluating perceived quality of digital printed lines and texts
CN106484681B (en) * 2015-08-25 2019-07-09 阿里巴巴集团控股有限公司 A kind of method, apparatus and electronic equipment generating candidate translation
CN110232675B (en) * 2019-03-28 2022-11-11 昆明理工大学 Texture surface defect detection and segmentation device and method in industrial environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190197429A1 (en) * 2016-12-12 2019-06-27 Tencent Technology (Shenzhen) Company Limited Method and apparatus for training classification model, and method and apparatus for classifying data
CN109359669A (en) * 2018-09-10 2019-02-19 平安科技(深圳)有限公司 Method for detecting abnormality, device, computer equipment and storage medium are submitted an expense account in medical insurance
WO2020073507A1 (en) * 2018-10-11 2020-04-16 平安科技(深圳)有限公司 Text classification method and terminal
CN110597878A (en) * 2019-09-16 2019-12-20 广东工业大学 Cross-modal retrieval method, device, equipment and medium for multi-modal data
CN110598006A (en) * 2019-09-17 2019-12-20 南京医渡云医学技术有限公司 Model training method, triplet embedding method, apparatus, medium, and device
CN111062495A (en) * 2019-11-28 2020-04-24 深圳市华尊科技股份有限公司 Machine learning method and related device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883222A (en) * 2020-09-28 2020-11-03 平安科技(深圳)有限公司 Text data error detection method and device, terminal equipment and storage medium
CN111883222B (en) * 2020-09-28 2020-12-22 平安科技(深圳)有限公司 Text data error detection method and device, terminal equipment and storage medium
CN112099739A (en) * 2020-11-10 2020-12-18 大象慧云信息技术有限公司 Classified batch printing method and system for paper invoices

Also Published As

Publication number Publication date
WO2021114637A1 (en) 2021-06-17
CN111696636B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
CN111461168B (en) Training sample expansion method and device, electronic equipment and storage medium
US11854194B2 (en) Method and system for analyzing image
US11630985B2 (en) Method and system for analyzing image
CN114945951A (en) Uncertainty guided semi-supervised neural network training for image classification
CN109948680B (en) Classification method and system for medical record data
CN111785384A (en) Abnormal data identification method based on artificial intelligence and related equipment
CN111696636B (en) Data processing method and device based on deep neural network
TWI814154B (en) Method for predicting disease based on medical image
CN111883222B (en) Text data error detection method and device, terminal equipment and storage medium
CN114494263B (en) Medical image lesion detection method, system and equipment integrating clinical information
Yang et al. DBAN: Adversarial network with multi-scale features for cardiac MRI segmentation
CN116113356A (en) Method and device for determining user dementia degree
CN112420125A (en) Molecular attribute prediction method and device, intelligent equipment and terminal
CN117591953A (en) Cancer classification method and system based on multiple groups of study data and electronic equipment
Bi et al. Hypergraph structural information aggregation generative adversarial networks for diagnosis and pathogenetic factors identification of Alzheimer’s disease with imaging genetic data
Lamia et al. Detection of pneumonia infection by using deep learning on a mobile platform
CN108319580A (en) Diagnose word normalizing method and device
CN116994687B (en) Clinical decision support model interpretation system based on inverse fact comparison
CN114822691B (en) Clinical event prediction device based on graph convolution neural network
CN115359040B (en) Method, device and medium for predicting tissue sample properties of object to be measured
CN115620053B (en) Airway type determining system and electronic equipment
US8374415B2 (en) Shape modeling and detection of catheter
CN117093948A (en) Multi-mode medical data fusion modeling method and device based on multi-task cascading
Guo et al. Advanced Technologies in Healthcare: AI, Signal Processing, Digital Twins and 5G
Razavi et al. Daniel Rueckert, Moritz Knolle, Nicolas Duchateau

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40030005

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant