CN111291131A

CN111291131A - Data processing method, data processing device, storage medium and electronic equipment

Info

Publication number: CN111291131A
Application number: CN201911361092.5A
Authority: CN
Inventors: 闻英友; 尚红; 陈剑; 康辉; 何涛; 张宁宁; 满冬亮; 杨乐游
Original assignee: Neusoft Corp; First Hospital of China Medical University
Current assignee: Neusoft Corp; First Hospital of China Medical University
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2020-06-16

Abstract

The disclosure relates to a data processing method, a data processing device, a storage medium and an electronic device, which are used for improving the accuracy of data anomaly identification. The method comprises the following steps: determining a target detection item corresponding to data to be processed; determining an association item which has a correlation relation with the target detection item; determining associated data content according to the data to be processed, the target detection item and the associated item, wherein the associated data content comprises data content corresponding to the target detection item and data content corresponding to the associated item in the data to be processed; constructing target characteristics of the data to be processed according to the associated data content; and inputting the target features into a target data classification model, and obtaining a classification result output by the target data classification model, wherein the classification result is used for indicating whether the data to be processed is abnormal or not.

Description

Data processing method, data processing device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, a storage medium, and an electronic device.

Background

In a detection scene, a detection result can be obtained after a target detection object is detected for a series of detection items, and an auditor can judge whether the target detection object is abnormal in the detection of the detection items according to the detection result, which is equivalent to a classification problem, namely, whether the detection result belongs to a classification with abnormality or a classification without abnormality is determined. Generally, a normal reference standard (for example, a numerical reference range representing that a detection value is normal) is associated with a detection item, if a detection value in a detection result meets the normal reference standard, it can be directly determined that a target detection object is not abnormal in the detection item, and if the detection value in the detection result does not meet the normal reference standard, it cannot be directly determined that the target detection object is abnormal in the detection item in consideration of the difference between target detection objects, and an auditor needs to additionally combine the characteristics of the detection item (for example, equipment for executing the detection item, a department, and the like) or the characteristics of the target detection object (for example, the performance of the target detection object in a history detection item, the degree of difference between the target detection object and the normal reference standard, and the like) to obtain a final determination result. Moreover, different target detection objects and different detection items have different performances in each detection, which means that a lot of time is needed for auditors to determine whether the detection result is abnormal, a lot of time and manpower are consumed, and the efficiency is not high.

Disclosure of Invention

The present disclosure provides a data processing method, an apparatus, a storage medium, and an electronic device, so as to improve the accuracy of data classification.

In order to achieve the above object, according to a first aspect of the present disclosure, there is provided a data processing method including:

determining a target detection item corresponding to data to be processed;

determining an association item which has a correlation relation with the target detection item;

determining associated data content according to the data to be processed, the target detection item and the associated item, wherein the associated data content comprises data content corresponding to the target detection item and data content corresponding to the associated item in the data to be processed;

constructing target characteristics of the data to be processed according to the associated data content;

and inputting the target features into a target data classification model, and obtaining a classification result output by the target data classification model, wherein the classification result is used for indicating whether the data to be processed is abnormal or not.

Optionally, the association item having a correlation with the target detection item is determined by:

acquiring first historical data corresponding to the target detection item, which is stored in a database, wherein the first historical data comprises first historical data content corresponding to a preset data item, which is obtained by historical detection;

for each preset data item, determining a correlation coefficient of the preset data item and the target detection item according to a correlation coefficient of first historical data content corresponding to the preset data item and the target detection item;

sorting the correlation coefficients of the preset data items and the target detection items according to the sequence from large to small;

and determining the associated item according to a preset data item corresponding to the correlation coefficient of the first N bits.

Optionally, the determining the association item according to the preset data item corresponding to the correlation coefficient of the first N bits includes:

acquiring a preset correlation corresponding to the target detection item, wherein the preset correlation is used for indicating a first data item which can be used as a correlation item of the target detection item and a second data item which cannot be used as a correlation item of the target detection item;

and determining the association item according to the preset data item, the first data item and the second data item corresponding to the correlation coefficient of the first N bits, wherein the association item is taken from a complement of the second data item in a collection formed by the preset data item corresponding to the correlation coefficient of the first N bits and the first data item.

Optionally, the associated data content comprises at least one of: the detection value, discrete information, historical detection value of target detection item, and detection value fluctuation upper limit value of target detection item, wherein the discrete information is information corresponding to any one of age, sex, weight, and department of delivery;

the constructing the target characteristics of the data to be processed according to the associated data content comprises the following steps:

if the associated data content comprises detection values, determining a first difference value between the detection values and target upper limit values corresponding to the detection values and a second difference value between the detection values and target lower limit values corresponding to the detection values for each detection value, and determining characteristic values corresponding to the detection values, the first difference values and the second difference values as characteristic values related to the associated data content, wherein the target upper limit values corresponding to the detection values are used for representing normal range upper limits of detection items corresponding to the detection values, and the target lower limit values corresponding to the detection values are used for representing normal range lower limits of the detection items corresponding to the detection values;

if the associated data content comprises discrete information, determining a characteristic value corresponding to the discrete information as a characteristic value related to the associated data content for each discrete information;

if the associated data content comprises the historical detection value of the target detection item, determining a target ratio of the difference between the detection value corresponding to the target detection item and the historical detection value of the target detection item to the historical detection value of the target detection item, and determining the characteristic value corresponding to the target ratio as the characteristic value related to the associated data content;

if the associated data content comprises a detection value fluctuation upper limit value of a target detection item, determining a third difference value between the detection value fluctuation upper limit value of the target detection item and the target ratio, and determining a characteristic value corresponding to the third difference value as a characteristic value related to the associated data content;

determining a multi-dimensional feature vector formed by feature values related to the associated data content as the target feature;

wherein the characteristic value is obtained by a normalization mode.

Optionally, before the step of constructing the target feature of the data to be processed according to the associated data content, the method further includes at least one of:

if the data content corresponding to the target detection item or the associated item in the data to be processed is missing, taking a preset filling content corresponding to the missing item as the data content corresponding to the missing item in the data to be processed, wherein the missing item is the target detection item or the associated item corresponding to the missing data content in the data to be processed;

and for each data content in the associated data contents, determining abnormal data contents with abnormality by using an abnormal value detection mode based on a boxed graph, and deleting the abnormal data contents from the associated data contents.

Optionally, the target data classification model is obtained by:

acquiring training data corresponding to the target detection item, wherein each training data comprises a historical feature corresponding to second historical data and marking information used for indicating whether the second historical data is abnormal or not, and the historical feature corresponding to the second historical data is determined according to the data content corresponding to the target detection item and the data content corresponding to the related item in the second historical data;

and training a deep learning model by taking the historical characteristics as input data and taking the input marking information corresponding to the historical characteristics as output data to obtain the target data classification model.

According to a second aspect of the present disclosure, there is provided a data processing apparatus, the apparatus comprising:

the first determining module is used for determining a target detection item corresponding to the data to be processed;

the second determination module is used for determining an association item which has a correlation relation with the target detection item;

a third determining module, configured to determine associated data content according to the to-be-processed data, the target detection item, and the associated item, where the associated data content includes a data content corresponding to the target detection item and a data content corresponding to the associated item in the to-be-processed data;

the characteristic construction module is used for constructing the target characteristic of the data to be processed according to the associated data content;

and the processing module is used for inputting the target characteristics into a target data classification model and obtaining a classification result output by the target data classification model, wherein the classification result is used for indicating whether the data to be processed is abnormal or not.

Optionally, the apparatus is configured to determine an association item having a correlation with the target detection item by:

the acquisition module is used for acquiring first historical data corresponding to the target detection item, which are stored in a database, wherein the first historical data comprise first historical data contents corresponding to preset data items, which are obtained by historical detection of each time;

a fourth determining module, configured to determine, for each preset data item, a correlation coefficient between the preset data item and the target detection item according to a correlation coefficient between first historical data content corresponding to the preset data item and the target detection item;

the sorting module is used for sorting the correlation coefficients of the preset data items and the target detection items according to the descending order;

and the fifth determining module is used for determining the associated item according to the preset data item corresponding to the correlation coefficient of the previous N bits.

Optionally, the fifth determining module includes:

the acquisition sub-module is used for acquiring a preset correlation corresponding to the target detection item, wherein the preset correlation is used for indicating a first data item which can be used as a correlation item of the target detection item and a second data item which cannot be used as a correlation item of the target detection item;

and the determining submodule is used for determining the association item according to the preset data item, the first data item and the second data item corresponding to the correlation coefficient of the first N bits, wherein the association item is taken from a complementary set of the second data item in a combined set formed by the preset data item corresponding to the correlation coefficient of the first N bits and the first data item.

the characteristic construction module is used for determining a first difference value between the detection value and a target upper limit value corresponding to the detection value and a second difference value between the detection value and a target lower limit value corresponding to the detection value for each detection value if the associated data content comprises the detection value, and determining the characteristic value corresponding to each of the detection value, the first difference value and the second difference value as a characteristic value related to the associated data content, wherein the target upper limit value corresponding to the detection value is used for representing a normal range upper limit of a detection item corresponding to the detection value, and the target lower limit value corresponding to the detection value is used for representing a normal range lower limit of the detection item corresponding to the detection value; if the associated data content comprises discrete information, determining a characteristic value corresponding to the discrete information as a characteristic value related to the associated data content for each discrete information; if the associated data content comprises the historical detection value of the target detection item, determining a target ratio of the difference between the detection value corresponding to the target detection item and the historical detection value of the target detection item to the historical detection value of the target detection item, and determining the characteristic value corresponding to the target ratio as the characteristic value related to the associated data content; if the associated data content comprises a detection value fluctuation upper limit value of a target detection item, determining a third difference value between the detection value fluctuation upper limit value of the target detection item and the target ratio, and determining a characteristic value corresponding to the third difference value as a characteristic value related to the associated data content; determining a multi-dimensional feature vector formed by feature values related to the associated data content as the target feature; wherein the characteristic value is obtained by a normalization mode.

Optionally, the apparatus further comprises at least one of:

a missing processing module, configured to, before the feature construction module constructs the target feature of the to-be-processed data according to the associated data content, if a data content corresponding to the target detection item or the associated item in the to-be-processed data is missing, use a preset filler content corresponding to a missing item as a data content corresponding to the missing item in the to-be-processed data, where the missing item is a target detection item or an associated item corresponding to the missing data content in the to-be-processed data;

and the abnormal processing module is used for determining abnormal data content with abnormality by using an abnormal value detection mode based on a boxed graph aiming at each data content in the associated data content before the characteristic construction module constructs the target characteristic of the data to be processed according to the associated data content, and deleting the abnormal data content from the associated data content.

Optionally, the target data classification model is obtained by:

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect of the present disclosure.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of the first aspect of the disclosure.

According to the technical scheme, the target detection item corresponding to the data to be processed is determined, the association item having a correlation relation with the target detection item is determined, the content of the association data is determined according to the data to be processed, the target detection item and the association item, the target feature of the data to be processed is constructed according to the content of the association data, then the target feature is input into the target data classification model, and the classification result output by the target data classification model is obtained. That is to say, an association item having a correlation relation with a target detection item corresponding to the data to be processed is determined, and an association data content corresponding to the association item is determined, and then, a target feature of the data to be processed is constructed based on the association data content for classification of the target data classification model. Therefore, the associated data content related to the target detection item in the data to be processed can be extracted, the content which is not associated with the target detection item in the data to be processed is abandoned, then, the classification result is obtained based on the associated data content to determine whether the data to be processed is abnormal or not, namely, in the data processing process, the data to be processed is screened, so that the data content for judging the classification is higher in quality, the obtained classification result is more accurate, the accuracy of data abnormity identification is improved, and the data processing efficiency is also improved to a certain extent.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow diagram of a data processing method provided in accordance with one embodiment of the present disclosure;

FIG. 2 is a block diagram of a data processing apparatus provided in accordance with one embodiment of the present disclosure;

FIG. 3 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment;

FIG. 4 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

In a detection scene, a detection result can be obtained after a target detection object is detected for a series of detection items, and an auditor can judge whether the target detection object is abnormal in the detection of the detection items according to the detection result, which is equivalent to a classification problem, namely, whether the detection result belongs to a classification with abnormality or a classification without abnormality is determined. Generally, a normal reference standard (for example, a numerical reference range representing that a detection value is normal) is associated with a detection item, if a detection value in a detection result meets the normal reference standard, it can be directly determined that a target detection object is not abnormal in the detection item, and if the detection value in the detection result does not meet the normal reference standard, it cannot be directly determined that the target detection object is abnormal in the detection item in consideration of the difference between target detection objects, and an auditor needs to additionally combine the characteristics of the detection item (for example, equipment for executing the detection item, a department, and the like) or the characteristics of the target detection object (for example, the performance of the target detection object in a history detection item, the degree of difference between the detection result and the normal reference standard, and the like) to obtain a final determination result. Moreover, different target detection objects and different detection items have different performances in each detection, which means that a lot of time is needed for auditors to determine whether the detection result is abnormal, a lot of time and manpower are consumed, and the efficiency is not high.

For example, in a medical scenario, the above-described testing process may correspond to clinical testing, the target test object corresponds to a patient, and the auditor corresponds to a doctor. Clinical tests are methods in which specimens such as blood, body fluids, secretions, excretions, and casts of patients are examined by visual observation, physical, chemical, instrumental, or molecular biological methods. Clinical tests provide a series of results for test items (i.e. the test results) for clinical medicine, which can assist doctors to determine whether there is an abnormality in the test results of the test items corresponding to patients. In the clinical examination process, a doctor can judge whether the patient is abnormal on each examination item according to the specific condition of the examination result of each examination item of the patient. The doctor can judge the detection result according to a standardized value range (namely the normal reference standard) which is generally set by a standard setting department, and if the detection result of the patient is within the standardized value range, the doctor can directly judge that the patient has no abnormity. If the detection result is not within the standardized value range, the doctor can comprehensively judge according to a plurality of factors such as the historical examination record of the patient, the degree exceeding the standardized value range, the department for treatment and the like, and finally determine whether the detection result of the user is abnormal. From the actual data processing situation of a hospital, even if the detection result of the patient is not within the standardized value range, there is a high possibility that no abnormality exists. In the process, doctors obviously need to spend a great deal of time for manual examination, and personnel and time are occupied.

Therefore, the present disclosure provides a data processing method, an apparatus, a storage medium, and an electronic device, so as to solve the problem that data classification in the related art takes a lot of time and labor, and improve the efficiency and accuracy of data classification results.

Fig. 1 is a flowchart of a data processing method provided according to an embodiment of the present disclosure. As shown in fig. 1, the method may include the following steps.

In step 11, a target detection item corresponding to the data to be processed is determined.

The data to be processed is essentially a detection result obtained after a detection process, and may include a plurality of data contents, each data content is an actual value corresponding to a data item, where the data item may be a detection item or a non-detection item, the detection item is an item requiring actual detection, the data content corresponding to the detection item is a detection value obtained after detection for the detection item, the non-detection item is an item not requiring actual detection, the non-detection item may be, for example, age, gender, weight, delivery department, and the like, and the corresponding data contents are, in turn, age of the target detection object, gender of the target detection object, weight of the target detection object, and delivery department of the target detection object.

The data to be processed used in the scheme is data which needs to be judged whether an abnormity exists. In practical application, after detection, the generated detection results can be obtained, and if it is required to judge whether the detection result is abnormal, the detection result can be used as data to be processed, and a series of methods provided by the present disclosure can be executed.

Whether the data to be processed is abnormal or not is judged, and whether the data to be processed is abnormal or not on the index of the target detection item is actually judged. And, the data to be processed may contain data contents corresponding to a plurality of detection items, one of which is a target detection item.

The target detection item is a detection item aimed at by the data to be processed during detection, is known in the detection stage and can be directly obtained. For example, the data to be processed may carry identification information about the target detection item, and the target detection item corresponding to the data to be processed may be determined by the identification information. Therefore, based on the data to be processed, the target detection item corresponding to the data to be processed can be directly determined. For example, if a certain piece of data to be processed is a detection result obtained by detecting potassium element, it can be directly determined that the target detection item is potassium element detection.

In step 12, an association item having a correlation with the target detection item is determined.

After the target detection item is determined, an association item having a correlation with the target detection item may be determined. Wherein, the association item can be one or more. And, the association item may be taken from a data item (for example, age, sex, weight, department of delivery, etc.) corresponding to the data content of the data to be processed, or the association item may be taken from another data item (for example, another detection item related to the target detection item, a reference standard of the target detection item, a history detection condition of the target detection item, a detection value fluctuation limit of the target detection item, etc.) which is closely related to the target detection item and is not included in the data content corresponding data item of the data to be processed.

Because the association item and the target detection item have the correlation, determining the association item having the correlation with the target detection item can provide an auxiliary effect for subsequently judging whether the data to be processed has the abnormality.

In step 13, the content of the associated data is determined according to the data to be processed, the target detection item and the associated item.

The associated data content comprises data content corresponding to the target detection item and data content corresponding to the associated item in the data to be processed.

The data content corresponding to the target detection item can be obtained directly from the data to be processed. The data content corresponding to the associated item may be obtained from the data to be processed or may be obtained from the data not to be processed. As described above, the association item may be taken from the data item corresponding to the data content of the data to be processed, or from other data items that are closely related to the target detection item and are not included in the data item corresponding to the data content of the data to be processed, and accordingly, if the association item is the data item corresponding to the data content of the data to be processed, the data content corresponding to the associated item may be obtained directly from the data to be processed, if the associated item is another data item not included in the data item corresponding to the data content of the data to be processed, the data content corresponding to the associated item may be obtained from other data than the data to be processed, for example, data content related to the target detection item stored in the database (e.g., detection values of other detection items related to the target detection item, a reference range of the target detection item, a historical detection value of the target detection item, an upper limit value of a fluctuation of the detection value of the target detection item, etc.).

In step 14, the target features of the data to be processed are constructed according to the associated data content.

For example, the target feature of the data to be processed may be constructed according to the feature value corresponding to the content of the associated data, so that the target feature may be processed by using the target data classification model subsequently.

In step 15, the target features are input into the target data classification model, and the classification result output by the target data classification model is obtained.

The target data classification model is actually a binary classification model, and is used for outputting a classification result aiming at the input content, wherein the classification result is used for indicating whether the data to be processed is abnormal or not.

By adopting the mode, the target detection item corresponding to the data to be processed is determined, the association item which has a correlation relation with the target detection item is determined, the content of the association data is determined according to the data to be processed, the target detection item and the association item, the target feature of the data to be processed is constructed according to the content of the association data, and then the target feature is input into the target data classification model to obtain the classification result output by the target data classification model. That is to say, an association item having a correlation relation with a target detection item corresponding to the data to be processed is determined, and an association data content corresponding to the association item is determined, and then, a target feature of the data to be processed is constructed based on the association data content for classification of the target data classification model. Therefore, the associated data content related to the target detection item in the data to be processed can be extracted, the content which is not associated with the target detection item in the data to be processed is abandoned, then, the classification result is obtained based on the associated data content to determine whether the data to be processed is abnormal or not, namely, in the data processing process, the data to be processed is screened, so that the data content for judging the classification is higher in quality, the obtained classification result is more accurate, the accuracy of data abnormity identification is improved, and the data processing efficiency is also improved to a certain extent.

In order to make those skilled in the art understand the technical solutions provided by the embodiments of the present invention, the following detailed descriptions of the corresponding steps and related concepts are provided.

First, how to determine the association item having a correlation with the target detectability will be described in detail.

In one possible implementation, the association item having a correlation with the target detection item can be determined by the following 4 steps:

step 1, acquiring first historical data corresponding to a target detection item stored in a database;

step 2, aiming at each preset data item, determining a correlation coefficient of the preset data item and a target detection item according to the correlation coefficient of the first historical data content of the preset data item and the target detection item;

step 3, sorting the correlation coefficients of the preset data items and the target detection items according to the descending order;

and 4, determining the association item according to the preset data item corresponding to the correlation coefficient of the first N bits.

The first historical data comprises first historical data content which is obtained by historical detection and corresponds to preset data items. The preset data items are the data items described above, and the names are different and are only used for distinction, and the related description is given above and is not repeated herein. For example, in a medical scenario, the first historical data may be taken from the HIS and LIS databases.

In the above-mentioned 1 st step, the first history data corresponding to the target detection item stored in the database, that is, the first history data content corresponding to the preset data item obtained in the past detection, is obtained. Here, the acquired first history data is all the contents related to the target detection item in the database, and does not distinguish the object (which detection object or person belongs to) to which the data belongs, that is, the entire data set related to the target detection item in the database.

In the 2 nd step, based on each preset data item in the obtained first history data (it can be seen that the preset data item does not include the target detection item because it is not necessary to calculate the correlation coefficient between itself and itself), the correlation coefficient of the preset data item and the target detection item is determined according to the correlation coefficient of the first history data content corresponding to the preset data item and the target detection item. That is, based on the first history data content corresponding to the target detection item in the first history data and the first history data content corresponding to a predetermined data item in the first history data, the correlation coefficient between the predetermined data item and the target detection item may be determined, and further, the correlation coefficient between each predetermined data item and the target detection item may be determined. It should be noted that the way of calculating the correlation coefficient belongs to the prior art, and is not described herein.

In the 3 rd step, the correlation coefficients of the preset data items and the target detection items are sorted in descending order. It can be known that the higher the correlation coefficient between the preset data item and the target detection item, the higher the correlation.

Wherein N may be a positive integer greater than or equal to 1.

In a possible embodiment, in the 4 th step, the preset data item corresponding to the correlation coefficient of the first N bits may be directly determined as the correlation item.

In another possible implementation manner, in the 4 th step, determining the correlation item according to the preset data item corresponding to the correlation coefficient in the first N bits may include the following steps:

acquiring a preset correlation corresponding to a target detection item;

and determining an associated item according to the preset data item, the first data item and the second data item corresponding to the correlation coefficient of the first N bits, wherein the associated item is taken from a complementary set of the second data item in a collection set formed by the preset data item corresponding to the correlation coefficient of the first N bits and the first data item.

The preset correlation relationship is used for indicating a first data item which can be used as an association item of the target detection item and a second data item which cannot be used as an association item of the target detection item.

The preset correlation may be a correlation sorted out by a large accumulation by experts in the field. The first data item is used for indicating the association item capable of being the target detection item, that is, even if the correlation coefficient between a certain data item and the target detection item is not high enough, the data item can still be the association item of the target detection item according to the experience of the expert in the field. The second data item is used to indicate a second data item that cannot be a related item of the target detection item, that is, even if the correlation coefficient between a certain data item and the target detection item is high, the data item may not be a related item of the target detection item according to the experience of the expert in the art.

The associated item is taken from the complement of the second data item in the union set of the first data item and the predetermined data item corresponding to the correlation coefficient of the first N bits, that is, the predetermined data item corresponding to the correlation coefficient of the first N bits may be merged with the first data item, and the second data item may be removed from the merged result, and the final result is the associated item.

By adopting the mode, the correlation relation and the experience of experts in the field are combined, so that more comprehensive and accurate correlation items can be obtained, and the accuracy of subsequent data processing is improved.

In a possible case, the above process of determining that there is a correlation with the target detection item may be completed in an early stage, that is, for each detection item in the early stage, the correlation item having a correlation with the detection item is determined in the above manner and stored. In practical application, after the target detection item is determined, the associated item having the correlation with the target detection item can be directly obtained based on the stored associated item having the correlation with each detection item, so that the associated item having the correlation with the target detection item can be quickly determined, and the data processing efficiency is improved.

In another possible case, the process of determining that there is a correlation with the target detection item may be performed after determining the target detection item, that is, after determining the target detection item, for the target detection item, the association item corresponding to the target detection item is determined in the manner described above. Thus, the real-time performance of the associated items can be ensured.

In one possible implementation, the associated data content may include at least one of: detection value, discrete information, historical detection value of target detection item, and detection value fluctuation upper limit value of target detection item. The discrete information is information corresponding to any one of age, sex, weight and department of medical delivery, and can reflect the discrete characteristics of the target detection item. The historical detection value of the target detection item and the detection value fluctuation upper limit value of the target detection item can reflect the difference between the historical detection and the current detection.

In this embodiment, in step 14, constructing the target feature of the data to be processed according to the content of the associated data may include the following steps:

if the associated data content comprises the detection values, determining a first difference value between the detection values and target upper limit values corresponding to the detection values and a second difference value between the detection values and target lower limit values corresponding to the detection values for each detection value, and determining the characteristic values corresponding to the detection values, the first difference value and the second difference value as characteristic values related to the associated data content, wherein the target upper limit values corresponding to the detection values are used for representing the upper limit of a normal range of detection items corresponding to the detection values, and the target lower limit values corresponding to the detection values are used for representing the lower limit of the normal range of the detection items corresponding to the detection values;

if the associated data content comprises the historical detection value of the target detection item, determining a target ratio of the difference between the detection value corresponding to the target detection item and the historical detection value of the target detection item to the historical detection value of the target detection item, and determining a characteristic value corresponding to the target ratio as a characteristic value related to the associated data content;

if the associated data content comprises the detection value fluctuation upper limit value of the target detection item, determining a third difference value between the detection value fluctuation upper limit value of the target detection item and the target ratio, and determining a characteristic value corresponding to the third difference value as a characteristic value related to the associated data content;

and determining a multi-dimensional feature vector formed by feature values related to the associated data content as the target feature.

Wherein the characteristic value can be obtained by a normalization mode.

The calculation method of the target ratio can be expressed as follows:

a multi-dimensional feature vector formed by feature values related to the associated data content is determined as a target feature, that is, how many feature values constitute how many dimensions of the feature vector. For example, after the above steps, 5 feature values are obtained, which are a1 to a5 in order, the target feature can be represented as [ a1, a2, A3, a4, a5 ].

The following takes the detection of potassium as an example of the target detection item, and shows a method for constructing the target feature. Potassium is a main cation for maintaining cell physiological activities, and has important effects in maintaining normal osmotic pressure and acid-base balance of organism, participating in sugar and protein metabolism, and ensuring normal function of neuromuscular. The detection of the potassium element has important significance. Through selection of the association items, the six association items of age, a potassium element detection item, a urea detection item, a creatinine detection item, a history detection condition of potassium element and a detection value fluctuation limit of potassium element are determined, and then six association data contents of the age, the potassium element detection value, the urea detection value, the creatinine detection value, the history detection value of potassium element and a detection value fluctuation upper limit value of potassium element are determined. Based on the above steps, the resulting feature set can be shown in table 1.

TABLE 1 sample construction based on feature extension

Numbering	Characteristic value related to associated data content
		1	Age-corresponding eigenvalues
2	Characteristic value corresponding to potassium element detection value
		3	Characteristic value corresponding to urea detection value
4	Characteristic value corresponding to creatinine detection value
		5	(detection value of Potassium element-lower limit of Potassium element Range) corresponding characteristic value
6	(upper limit of potassium range-detected value of potassium) corresponding to the characteristic value
		7	(urea detection value-lower limit of Urea Range) corresponding characteristic value
8	Characteristic value corresponding to (urea range upper limit-urea detection value)
		9	(creatinine detection value-creatinine lower limit of range)
10	(Upper limit of Creatinine Range-Creatinine detection value) corresponding characteristic value
		11	(delta value of history detection of potassium element)
12	(deltaCheck-delta) corresponding characteristic value

The delta value of the potassium element historical detection is a target ratio between the difference between a potassium element detection value and a potassium element detection historical detection value and the potassium element detection historical detection value, and deltaCheck is an upper limit value of potassium element detection value fluctuation.

By combining the 12 eigenvalues in table 1, a target feature for potassium detection, i.e., a 12-dimensional feature vector, can be constructed.

In a possible embodiment, before step 14, the method provided by the present disclosure may further include at least one of:

if the data content corresponding to the target detection item or the associated item in the data to be processed is missing, taking the preset filling content corresponding to the missing item as the data content corresponding to the missing item in the data to be processed, wherein the missing item is the target detection item or the associated item corresponding to the missing data content in the data to be processed;

and determining abnormal data content with abnormality by using an abnormal value detection mode based on the boxed graph for each data content in the associated data content, and deleting the abnormal data content from the associated data content.

Each data item may correspond to a preset filler content, and the preset filler content may be determined based on existing data corresponding to the data item in the database. For example, if the data item is a detection item, the preset filling content corresponding to the detection item may be a mean value of all detection values corresponding to the detection item in the database. Therefore, if the content corresponding to the target detection item or the associated item in the data to be processed is missing, the preset filler content corresponding to the missing item can be used as the data content corresponding to the missing item in the data to be processed. Thus, the integrity of the associated data content can be guaranteed.

And for each data content in the associated data content, determining abnormal data content with abnormality in the associated data content by using an abnormal value detection mode based on the boxed graph, and deleting the abnormal data content from the associated data content. For example, for a certain data content, a lower quartile Q1 and an upper quartile Q3 (the upper quartile and the lower quartile are obtained based on existing data content corresponding to the data item in the database) of the data item corresponding to the data content may be respectively calculated, the quartile distance IQR of the data item is Q3-Q1, and an upper limit is defined as Q3+1.5IQR and a lower limit is defined as Q1-1.5IQR, and then data which is not within a numerical range formed by the upper limit and the lower limit is determined as abnormal data content. It should be noted that the abnormal value detection method based on the boxed graph belongs to the common knowledge in the art, the above example is only one of the implementation methods, and for other possible implementation methods, details are not described here.

After the abnormal data content is identified, the abnormal data content may be traced back by using domain knowledge, for example, a domain expert confirms whether the abnormal data content can be confirmed or corrected, if the abnormal data content can be confirmed or corrected, the abnormal data content is corrected, the abnormal data content in the related data content is replaced by the corrected data content to obtain the corrected related data content, and if the abnormal data content cannot be confirmed or corrected, the abnormal data content is deleted from the related data content.

By adopting the mode, the obtained associated data content is subjected to processing such as missing value filling, abnormal value detection and abnormal correction, so that the finally determined associated data is more complete and accurate, and the constructed target characteristics are more excellent in subsequent characteristic construction and are suitable for the model.

In one possible embodiment, the target data classification model may be obtained by:

acquiring training data corresponding to a target detection item, wherein each training data comprises a historical characteristic corresponding to second historical data and mark information used for indicating whether the second historical data is abnormal or not, and the historical characteristic corresponding to the second historical data is determined according to the data content corresponding to the target detection item and the data content corresponding to the associated item in the second historical data;

and training the deep learning model by taking the historical characteristics as input data and taking the marking information corresponding to the input historical characteristics as output data to obtain a target data classification model.

The construction method of the history feature is the same as that of the target feature, and is not described herein again. As described above, the target data classification model is a binary classification model. In the early stage of model training, the optimal neural network layer number, neuron setting and activation function based on the deep learning model can be determined firstly, and training is carried out based on the method so as to obtain a target data classification model. In a single training, a historical characteristic is used as input data, the marking information of the historical characteristic is used as output data, the deep learning model is trained, and after a plurality of times, the target data classification model is obtained.

Illustratively, when training the data classification model corresponding to potassium detection, on the internal design of the model, a fully-connected neuron network of 2 hidden layers can be set, each layer contains 512 neurons, the activation function selects relu, and, in order to prevent overfitting, a Dropout layer is set behind each hidden layer to inactivate the neurons with a probability of 0.5.

For each detection item, data can be collected and trained in the above manner to obtain a data classification model corresponding to the detection item. In actual use, the target data classification model can be determined according to the target detection items and the data classification models corresponding to the detection items. In addition, the target data classification model is trained according to the historical characteristics which are the same as the target characteristics in the construction mode, so that the target characteristics can be well adapted, and the target characteristics can be conveniently processed.

By adopting the mode, the deep learning model is applied to the training of the target classification model based on the excellent performance of the deep learning model, so that the target classification model has excellent classification performance. For example, the deep learning model can learn complex and abstract data representation, does not depend on feature engineering completely, results are stable, and the deep learning model has strong adaptability and can adapt to the conversion of fields and application scenes.

Fig. 2 is a block diagram of a data processing apparatus provided according to one embodiment of the present disclosure. As shown in fig. 2, the apparatus 20 includes:

the first determining module 21 is configured to determine a target detection item corresponding to data to be processed;

a second determining module 22, configured to determine an associated item that has a correlation with the target detection item;

a third determining module 23, configured to determine associated data content according to the to-be-processed data, the target detection item, and the associated item, where the associated data content includes a data content corresponding to the target detection item and a data content corresponding to the associated item in the to-be-processed data;

the feature construction module 24 is configured to construct a target feature of the to-be-processed data according to the associated data content;

and the processing module 25 is configured to input the target feature into a target data classification model, and obtain a classification result output by the target data classification model, where the classification result is used to indicate whether the data to be processed is abnormal.

Optionally, the apparatus 20 is configured to determine an association item having a correlation with the target detection item by:

Optionally, the fifth determining module includes:

the feature construction module 24 is configured to, if the associated data content includes detection values, determine, for each of the detection values, a first difference between the detection value and a target upper limit value corresponding to the detection value, and a second difference between the detection value and a target lower limit value corresponding to the detection value, and determine, as a feature value associated with the associated data content, a feature value corresponding to each of the detection value, the first difference, and the second difference, where the target upper limit value corresponding to a detection value is used to characterize a normal upper limit of a detection item corresponding to the detection value, and the target lower limit value corresponding to a detection value is used to characterize a normal lower limit of the detection item corresponding to the detection value; if the associated data content comprises discrete information, determining a characteristic value corresponding to the discrete information as a characteristic value related to the associated data content for each discrete information; if the associated data content comprises the historical detection value of the target detection item, determining a target ratio of the difference between the detection value corresponding to the target detection item and the historical detection value of the target detection item to the historical detection value of the target detection item, and determining the characteristic value corresponding to the target ratio as the characteristic value related to the associated data content; if the associated data content comprises a detection value fluctuation upper limit value of a target detection item, determining a third difference value between the detection value fluctuation upper limit value of the target detection item and the target ratio, and determining a characteristic value corresponding to the third difference value as a characteristic value related to the associated data content; determining a multi-dimensional feature vector formed by feature values related to the associated data content as the target feature; wherein the characteristic value is obtained by a normalization mode.

Optionally, the apparatus 20 further comprises at least one of:

Optionally, the target data classification model is obtained by:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 3 is a block diagram illustrating an electronic device in accordance with an example embodiment. As shown in fig. 3, the electronic device 700 may include: a processor 701 and a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.

The processor 701 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the data processing method. The memory 702 is used to store various types of data to support operation at the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and the like. The Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 702 or transmitted through the communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 705 may thus include: Wi-Fi module, Bluetooth module, NFC module, etc.

In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described data Processing method.

In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the data processing method described above. For example, the computer readable storage medium may be the memory 702 described above comprising program instructions that are executable by the processor 701 of the electronic device 700 to perform the data processing method described above.

FIG. 4 is a block diagram illustrating an electronic device in accordance with an example embodiment. For example, the electronic device 1900 may be provided as a server. Referring to fig. 4, an electronic device 1900 includes a processor 1922, which may be one or more in number, and a memory 1932 for storing computer programs executable by the processor 1922. The computer program stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processor 1922 may be configured to execute the computer program to perform the data processing method described above.

Additionally, electronic device 1900 may also include a power component 1926 and a communication component 1950, the power component 1926 may be configured to perform power management of the electronic device 1900, and the communication component 1950 may be configured to enable communication, e.g., wired or wireless communication, of the electronic device 1900. In addition, the electronic device 1900 may also include input/output (I/O) interfaces 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, Mac OS XTM, UnixTM, Linux, etc., stored in memory 1932.

In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the data processing method described above. For example, the computer readable storage medium may be the memory 1932 described above that includes program instructions that are executable by the processor 1922 of the electronic device 1900 to perform the data processing method described above.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned data processing method when executed by the programmable apparatus.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method of data processing, the method comprising:

determining a target detection item corresponding to data to be processed;

2. The method according to claim 1, wherein the association item having a correlation with the target detection item is determined by:

3. The method according to claim 2, wherein the determining the correlation term according to the preset data item corresponding to the correlation coefficient of the first N bits comprises:

4. The method of claim 1, wherein the associated data content comprises at least one of: the detection value, discrete information, historical detection value of target detection item, and detection value fluctuation upper limit value of target detection item, wherein the discrete information is information corresponding to any one of age, sex, weight, and department of delivery;

wherein the characteristic value is obtained by a normalization mode.

5. The method according to claim 1, wherein prior to the step of constructing the target feature of the data to be processed from the associated data content, the method further comprises at least one of:

6. The method of claim 1, wherein the target data classification model is obtained by:

7. A data processing apparatus, characterized in that the apparatus comprises:

8. The apparatus of claim 7, wherein the apparatus is configured to determine the association item having a correlation with the target detection item by:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 6.