CN115757430A - Data structured processing method and system for medical data - Google Patents

Data structured processing method and system for medical data Download PDF

Info

Publication number
CN115757430A
CN115757430A CN202211536230.0A CN202211536230A CN115757430A CN 115757430 A CN115757430 A CN 115757430A CN 202211536230 A CN202211536230 A CN 202211536230A CN 115757430 A CN115757430 A CN 115757430A
Authority
CN
China
Prior art keywords
data
structured data
structured
multidimensional
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211536230.0A
Other languages
Chinese (zh)
Inventor
周校平
陈竹
章有智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Boke Guotai Information Technology Co ltd
Original Assignee
Wuhan Boke Guotai Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Boke Guotai Information Technology Co ltd filed Critical Wuhan Boke Guotai Information Technology Co ltd
Priority to CN202211536230.0A priority Critical patent/CN115757430A/en
Publication of CN115757430A publication Critical patent/CN115757430A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the specification provides a data structured processing method of medical data, which comprises the steps of acquiring the medical data; the medical data comprises at least one of personal information data, clinic data, examination data, daily data and payment data of the patient; determining multidimensional structured data based on the processing of the medical data; determining storage characteristics of the multidimensional structured data based on data characteristics of the multidimensional structured data; the storage characteristics include at least caching characteristics.

Description

Data structured processing method and system for medical data
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a method and system for processing medical data in a structured manner.
Background
In medical data storage and retrieval scenarios, users desire to achieve fast lookup of target data. However, since the cache space is limited, all data cannot be read into the cache in advance, and if a large amount of data which is not concerned by the user still exists in the cache at this time, the data search efficiency of the user is influenced.
Therefore, it is desirable to provide a method and a system for processing medical data in a data structure, which improve the efficiency of data search and optimize the user experience by improving the storage and calling modes of the data.
Disclosure of Invention
One or more embodiments of the present specification provide a data structured processing method of medical data, the method including: acquiring medical data; the medical data comprises at least one of personal information data, clinic data, examination data, daily data and payment data of the patient; determining multidimensional structured data based on the processing of the medical data; determining storage characteristics of the multidimensional structured data based on data characteristics of the multidimensional structured data; the storage characteristics include at least caching characteristics.
One or more embodiments of the present specification provide a data structured processing system for medical data, the system comprising at least one processor, at least one memory, and an acquisition module, a first determination module, and a second determination module; at least one memory for storing computer instructions; at least one processor is configured to execute at least a portion of the computer instructions, comprising: acquiring medical data based on an acquisition module; processing the medical data based on a first determination module to determine multi-dimensional structured data; storage characteristics of the multidimensional structured data are determined based on a second determination module.
One or more embodiments of the present specification provide a data structured processing apparatus for medical data, including a processor for executing a data structured processing method for medical data.
One or more embodiments of the present specification provide a computer-readable storage medium storing computer instructions, which, when read by a computer, cause the computer to perform a method for data structured processing of medical data.
Drawings
The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is a block diagram of a data structured processing system for medical data, shown in accordance with some embodiments of the present description;
FIG. 2 is an exemplary flow diagram of a method for data structured processing of medical data, according to some embodiments of the present description;
FIG. 3 is an exemplary flow diagram illustrating the determination of multidimensional structured data in accordance with some embodiments of the present description;
FIG. 4 is a schematic diagram illustrating determining storage characteristics of multidimensional structured data in accordance with some embodiments of the present description;
FIG. 5 is a model structure diagram illustrating an access heat determination model according to some embodiments of the present description.
FIG. 6 is an exemplary flow diagram illustrating determining distributed storage characteristics of multidimensional structured data according to some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "apparatus", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only the explicitly identified steps or elements as not constituting an exclusive list and that the method or apparatus may comprise further steps or elements.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
Fig. 1 is a block diagram of a data structured processing system for medical data, shown in accordance with some embodiments of the present description.
In some embodiments, as shown in fig. 1, the data structured processing system 100 of medical data may include an acquisition module 110, a first determination module 120, and a second determination module 130.
The acquisition module 110 may be used to acquire medical data and structured features of multiple dimensions. The relevant description of the medical data and dimensions can be found in relation to fig. 2. The relevant description of the structured features can be found in relation to fig. 3.
The first determination module 120 can be used to determine multidimensional structured data based on processing of medical data. For a related explanation about determining the multidimensional structured data, reference is made to the related description of fig. 3.
The second determination module 130 can be used to determine storage characteristics of the multidimensional structured data based on data characteristics of the multidimensional structured data. The storage characteristics may include at least caching characteristics. The storage features may also include distributed storage features. The relevant description of the data feature, the storage feature and the cache feature can be found in relation to fig. 2. The relevant description for determining the storage characteristics can be found in relation to fig. 4. For a description of determining the distributed storage characteristics, reference may be made to the description of fig. 6.
It should be understood that the system and its modules shown in FIG. 1 may be implemented in a variety of ways. For example, in some embodiments, the first determination module and the second determination module may be combined into a determination module.
It should be noted that the above description of the data structure processing system for medical data and the modules thereof is only for convenience of description, and the present specification is not limited to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the system, any combination of modules or sub-system may be configured to interface with other modules without departing from such teachings. In some embodiments, the obtaining module, the first determining module and the second determining module disclosed in fig. 1 may be different modules in a system, or may be a module that implements the functions of two or more modules. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.
Fig. 2 is an exemplary flow diagram of a method for data structured processing of medical data, according to some embodiments of the present description. As shown in fig. 2, the process 200 includes the following steps. In some embodiments, the process 200 may be performed by the data structured processing system 100 of medical data.
Step 210, acquiring medical data. In some embodiments, flow 210 may be performed by acquisition module 110.
The medical data may refer to data information related to the patient collected during the treatment of the patient, and may include at least one of personal information data, clinic data, examination data, daily data, and payment data of the patient. The medical data can be acquired by various methods, for example, the inquiry log, treatment log, etc. of the patient can be called from a hospital background database for acquisition.
The personal information data may refer to data containing personal identification information of the patient, and may include the patient's name, certificate number, sex, age, and the like. For example, the personal information data may be "zhang san, certificate number XXXX.. XXXX, gender male, age 40.
The outpatient data can refer to the patient illness data collected by the doctor when the patient is in an outpatient clinic, and can include the name, certificate number, outpatient time, the name of the doctor who receives the clinic, the illness type, the illness duration, the illness position, the illness severity and the like of the patient. For example, the outpatient data can be "Zhang San, certificate number XXXX.. XXXX, outpatient time 2022, 4 months and 1 day, hospital Liquan, disease type is rheumatism, disease duration is 3 years, disease part is left knee, and severity is higher".
The examination data may refer to the summary data of the results of various examination items performed by the patient in the hospital, and may include the name, certificate number, blood routine examination result, blood pressure examination result, urine examination result, etc. of the patient. For example, the examination data may be "zhang san, certificate number XXXX.. XXXX", blood routine examination result is not abnormal, blood pressure is normal, and urine examination result is not abnormal ".
The daily data may refer to the patient daily pathological expression data, and may include the patient's name, identification number, mobility, type of pathological expression, severity of pathological expression, and the like. For example, the daily data may be "zhang san, document number XXXX.. XXXX, mobility good, with diarrhea manifestations and more severe". The pathological manifestations may refer to the symptoms of the patient, such as dizziness, diarrhea, pain, etc.
The payment data can refer to the summary data of the payment conditions of each payment in the treatment process of the patient, and can comprise the name, the certificate number, the time of each payment and the payment amount of the patient. For example, the payment data may be "zhang san, certificate number XXXX.. XXXX, payment time 2022, 4 months and 1 day, payment amount 350.10 yuan; the payment time is 2022 years, 4 months and 5 days, and the payment amount is 522.89 yuan; ......".
Based on the processing of the medical data, multi-dimensional structured data is determined, step 220. In some embodiments, the flow 220 may be performed by the first determination module 120.
Structured data may refer to a collection of data that is stored in the form of a data table. For example, the structured data may be a structured data table in a relational database.
The dimensions of the structured data can refer to all types of data that the structured data contains. Two structured data can be considered to belong to different dimensions if the types of data contained in the two structured data are not identical. For example, five structured data can be constructed based on the personal information data, the clinic data, the examination data, the daily data and the payment data in the medical data, respectively, so that all types of data in the structured data corresponding to the personal information data include the name, the certificate number, the sex, the age, and the like of the patient, and all types of data in the structured data corresponding to the clinic data include the name, the certificate number, the clinic time, the name of the doctor who visits, the disease type, the disease duration, the disease part, the disease severity, and the like of the patient; a cut-out; each structured data includes data of different types from other structured data, and the five structured data can be considered to be in five different dimensions respectively.
In some embodiments, multi-dimensional structured data can be generated based on structured features of multiple dimensions.
Structured features may refer to feature data that is made up of all types of data that the structured data for a dimension contains. In some embodiments, the structured features may be data in the form of vectors, where each element corresponds to a data type. For example, structured data constructed based on personal information data corresponds to structured features such as (name, certificate number, gender, age).
In some embodiments, the method for generating the multidimensional structured data may be: acquiring the structural characteristics of each dimension; respectively generating corresponding empty structural data (only containing data types corresponding to the structural features, and not containing specific data under each type) based on the structural features of each dimension; and filling the specific data values corresponding to the data types into the corresponding empty structured data to obtain the multi-dimensional structured data. Wherein, the content of the structural features of each dimension can be obtained based on prior knowledge (such as industry experience), expert suggestions and the like. For example, if the structural feature of one of the acquired dimensions is (name, certificate number, gender, age), the corresponding empty structural data may be generated based on the structural feature; the structured data has four columns, and the data types of each column are the name, the certificate number, the gender and the age of the patient respectively; filling personal information of a patient contained in the personal information data into the space structured data to obtain structured data under the dimensionality; structured data of multiple dimensions can be obtained in the same way.
In some embodiments, initial multidimensional structured data may be generated based on structured features of multiple dimensions, and the initial multidimensional structured data may be appropriately adjusted to obtain multidimensional structured data, and further related descriptions may be referred to in relation to fig. 3.
Step 230, determining storage characteristics of the multidimensional structured data based on the data characteristics of the multidimensional structured data. The storage characteristics may include at least caching characteristics. In some embodiments, the flow 230 may be performed by the second determination module 130.
The data feature may refer to feature data capable of reflecting the self structural characteristics of the structured data and the characteristics of the recorded data, and may include intrinsic features and epitaxial features.
Intrinsic characteristics may refer to characteristics that are related to the structure of the structured data itself. For example, the intrinsic characteristics may include the number of records (e.g., number of rows, etc.) and the number of fields (e.g., number of columns, etc.) of the structured data. In some embodiments, the intrinsic features may be based on direct statistical acquisition of the own structure of the structured data.
Epitaxial features may refer to features that are related to the data itself that the structured data records.
The method for determining the extension characteristics can be as follows: for structured data with structured features containing patient pathology-related information (such as duration of illness, severity of illness, etc.), the average value of the patient pathology-related information can be used as an extension feature of the structured data. For example, if the structured feature of a certain structured data includes "duration of illness", the average value of the duration of illness of all patients in the structured data can be used as the extension feature of the structured data.
For structured data (for example, structured data constructed based on personal information data of patients) with structured features not containing patient pathology-related information, on the basis of identity information of all patients in the structured data, pathology information corresponding to patients can be queried from other structured data containing patient pathology-related information, and the result is averaged to serve as an extension feature of the structured data. For example, when structured data (abbreviated as structured data a) constructed based on personal information data of a patient does not contain information related to pathology of the patient, and structured data (abbreviated as structured data B) constructed based on outpatient service data of the patient contains information related to pathology of the patient, the duration of illness of each patient in the structured data a can be queried from the structured data B based on the patient certificate numbers in the two pieces of structured data, and the average value can be taken as an extension feature of the structured data a.
Storage characteristics may refer to characteristic data that reflects the location where structured data is stored. In some embodiments, the storage feature may include a caching feature.
A cache characteristic may refer to identifying data that reflects whether structured data is stored in a cache space. For example, the cache feature is 1, which represents that the corresponding structured data is stored in the cache space; the cache feature is 0, which means that the corresponding structured data is not stored in the cache space.
The storage characteristic may be determined based on the historical access condition of the structured data of each dimension, for example, the cache characteristic of the structured data with a higher historical access frequency may be set to 1, and the cache characteristic of the structured data with a lower historical access frequency may be set to 0. Further description of the storage characteristic determination method can be found in relation to fig. 4.
In some embodiments of the present description, the structured data is constructed by differentiating the dimensions, and the storage characteristics of the structured data are determined, so that the management and storage of the medical data can be simply and efficiently realized.
It should be noted that the above description related to the flow 200 is only for illustration and description, and does not limit the application scope of the present specification. Various modifications and changes to flow 200 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description. For example, the process 230 may be performed by other modules.
FIG. 3 is an exemplary flow diagram illustrating the determination of multidimensional structured data according to some embodiments of the present description. As shown in fig. 3, the process 300 includes the following steps. In some embodiments, the flow 300 may be performed by the first determination module 120.
Step 310, obtaining structural features of multiple dimensions.
The definition of the structural features and the related description of the acquisition method can be referred to the related description of fig. 2.
Step 320, generating initial multi-dimensional structured data based on the medical data and the structured features of the plurality of dimensions.
The initial multi-dimensional structured data may refer to a data set of multiple dimensions that is constructed based on medical data and structured features of the multiple dimensions for subsequent determination of the multi-dimensional structured data.
In some embodiments, the method of generating the initial multidimensional structured data may be the same as the method of generating the multidimensional structured data, and may include: generating initial multidimensional space structured data based on the structural features of multiple dimensions; and filling corresponding information in the medical data into the initial multi-dimensional space structured data to obtain the initial multi-dimensional structured data. For more details, reference is made to fig. 2 for description.
Step 330, in response to that the initial multi-dimensional structured data meets the preset conditions, transforming the initial multi-dimensional structured data, and determining the transformed initial multi-dimensional structured data as the multi-dimensional structured data.
In some embodiments, the preset conditions may include a first preset condition and a second preset condition.
The first preset condition may refer to that a difference between user accesses to the initial structured data of two dimensions in the initial multi-dimensional structured data is less than a time threshold and/or an average time interval of the user accesses is less than a time threshold. The number threshold and the time threshold may be system default values, empirical values, artificial preset values, or any combination thereof, and may be set according to actual requirements, which is not limited in this specification.
For example, if the number of user accesses to the initial structured data a is 120, the number of user accesses to the initial structured data B is 125, and the number threshold is preset to 20, the initial structured data a and the initial structured data B satisfy the first preset condition. For another example, the user accesses the initial structured data C and the initial structured data D continuously for multiple times (sequentially, not sequentially), with time intervals of 1 minute, 2 minutes, 1.5 minutes and 2.2 minutes, so that the average time interval for the user to access is (1 +2+1.5+ 2.2)/4 =1.675 minutes, and the time threshold is preset to 2 minutes, and then the initial structured data C and the initial structured data D satisfy the first preset condition.
In some embodiments, whether the two initial structured data satisfy the first predetermined condition may be related to the estimated access heat of the two initial structured data. For example, if the difference between the user access times of two initial structured data is greater than the time threshold, but the estimated access heat of the two initial structured data is higher than the first heat threshold, the two initial structured data may also be considered to satisfy the first preset condition. The first heat threshold may be a system default value, an empirical value, an artificial preset value, or any combination thereof, and may be set according to actual requirements, which is not limited in this specification.
The estimated access heat degree can refer to the frequency of subsequent users for checking the structured data or searching information in the structured data, and the larger the frequency is, the higher the estimated access heat degree corresponding to the structured data is. The predicted access heat may be characterized by a value within [0, 100 ]. The method for determining the estimated access heat can be described in relation to fig. 4.
The second preset condition may refer to that the number of records (e.g., number of rows, etc.) of the initial structured data exceeds the number threshold. The number threshold may be a system default value, an empirical value, a manually preset value, or the like, or any combination thereof, and may be set according to actual requirements, which is not limited in this specification. For example, if the quantity threshold is preset to 100 and the number of records of the initial structured data E is 480, the initial structured data E satisfies the second preset condition.
In some embodiments, the second predetermined condition may also be related to an estimated access heat of the sub-portion in the initial structured data. A sub-portion of structured data may refer to a collection of data that is part of a row of structured data. There may be more than one sub-portion of a structured datum. The estimated access heat for a subsection may refer to the estimated access heat for structured data that is made up of the data within the subsection. The method for determining the estimated access heat of the structured data can be referred to the related description of fig. 4 and fig. 5.
In some embodiments, if there is a difference between the estimated access heat for two sub-portions in the initial structured data that is greater than the second heat threshold, then the initial structured data also satisfies the second predetermined condition. The second heat threshold may be a system default value, an empirical value, a manually preset value, or the like, or any combination thereof, and may be set according to actual requirements, which is not limited in this specification.
In some embodiments of the present description, by hooking the preset condition with the estimated access heat, the number of initial structured data meeting the preset condition is effectively expanded, the number of redundant data of structured data determined through a subsequent transformation process is effectively reduced, and data storage and call efficiency is improved.
Transformation may refer to a process of performing structural adjustment on the initial structured data that satisfies a preset condition to obtain multidimensional structured data. In some embodiments, the transformation may include a first transformation and a second transformation. In some embodiments, the initial multidimensional structured data meeting the first preset condition may be subjected to a first transformation, and the initial multidimensional structured data meeting the second preset condition may be subjected to a second transformation, so as to obtain multidimensional structured data.
The first transformation may refer to merging of data type and data content for the initial multidimensional structured data that satisfies a first preset condition. An exemplary merging process may be: if the structural characteristics of the initial structured data F are (name, certificate number, hospitalization or not, total payment), and the structural characteristics of the initial structured data G are (name, certificate number, disease type, disease severity), the structural characteristics of the structured data H obtained by combining the initial structured data F and the initial structured data G are (name, certificate number, hospitalization or not, total payment, disease type, disease severity), and the original data contents in the initial structured data F and the initial structured data G are correspondingly copied to the structured data H (the data contents in the repeated data type are copied only once, for example, the name), so that the complete structured data H is obtained. The related description of the structural features can be found in relation to fig. 2.
The second transformation may refer to logically splitting the initial multidimensional structured data that satisfies a second preset condition. Logical splitting may refer to extracting data of a row of at least one portion of the initial structured data as a sub-portion, which may be multiple. After logical splitting, the data contents of any two sub-parts may not overlap, for example, the original initial structured data has 300 rows of data, and the original initial structured data may be logically split into 3 sub-parts, each sub-part containing 100 rows of data. The method of logical splitting may be preset. For example, a multiple relationship between the number of rows of the initial structured data and a number threshold is calculated, and the logical splitting is performed based on the multiple relationship. An exemplary logical split process may be: if the initial structured data I contains 280 lines of data (corresponding to 280 patients of data respectively) and the threshold number is 100, the initial structured data I can be logically split into 280/100=3 (rounded up) sub-portions, and the number of lines in each sub-portion is 100, 100, and 80 respectively.
In some embodiments, when there is at least one pair of sub-portions in the initial structured data that differ in the estimated access heat by more than a second heat threshold, new sub-structured data may be generated based on the sub-portions with the greater estimated access heat. For example, the initial structured data K may be logically split into four sub-portions K-1, K-2, K-3, and K-4 with estimated access heat levels of 88, 81, 25, and 19, respectively, with a second heat threshold value preset to 40, and with the differences in estimated access heat levels between sub-portions K-1 and K-2 and K-3 and K-4 each being greater than the second heat threshold value, so that sub-structured data K-a and K-b may be correspondingly generated based on the data contained in sub-portions K-1 and K-2, respectively. If the sub-structured data K-a and K-b meet the first preset condition, the sub-structured data K-a and K-b can be subjected to first transformation to obtain the sub-structured data K 0
Step 340, in response to that the initial multi-dimensional structured data does not satisfy the preset condition, determining the initial multi-dimensional structured data as multi-dimensional structured data.
In some embodiments of the present specification, by using the method for determining multidimensional structured data, the data redundancy degree can be effectively reduced, so that the search speed of a user is increased, and the user experience is improved.
It should be noted that the above description of the process 300 is for illustration and description only and is not intended to limit the scope of the present disclosure. Various modifications and changes to flow 300 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are still within the scope of the present specification. For example, the order of flow 330 and flow 340 is interchanged.
FIG. 4 is a schematic diagram illustrating determining storage characteristics of multidimensional structured data in accordance with some embodiments of the present description.
In some embodiments, as shown in FIG. 4, the estimated access heat 420 of the multidimensional structured data can be determined based on data characteristics 410 of the multidimensional structured data. The data characteristics and the predicted access heat are described in relation to fig. 2.
In some embodiments, the estimated access heat of the multidimensional structured data can be determined based on processing of data features of the multidimensional structured data by the access heat determination model. The access heat determination model may be a machine learning model. More details on accessing the heat determination model can be found in relation to FIG. 5.
In some embodiments, as shown in FIG. 4, storage characteristics 430 of the multidimensional structured data can be determined based on the projected access heat 420. An exemplary method of determining storage characteristics may be: acquiring estimated access heat of the multidimensional structured data; based on the estimated access heat, sorting the multidimensional structured data in a descending order; and sequentially storing the structured data of each dimension into the cache space according to the sequencing order until the rest space of the cache space can not contain the next structured data, wherein the storage characteristics of the structured data stored into the cache space are 1, and the storage characteristics of the rest structured data which are not stored into the cache space are 0. The relevant description of the estimated access heat and the storage characteristics can be referred to the relevant description of fig. 2.
In some embodiments, the stored characteristics of the initial structured data are also related to the estimated access heat of the sub-portion of the initial structured data. The estimated access heat for a subsection may refer to the estimated access heat for structured data that is made up of the data within the subsection. The method for determining the estimated access heat of the structured data can be referred to the related description of fig. 4 and fig. 5.
In some embodiments, if the difference between the estimated access heat of the sub-portions of the initial structured data is greater than the second heat threshold, the corresponding sub-structured data may be generated based on the sub-portions with higher estimated access heat, and the cache feature of the initial structured data is set to 0, along with the structured data of other dimensions. For example, if the initial structured data J is logically split into sub-portions J-1, J-2, J-3, J-4, J-5, whose estimated access heat is 90, 85, 21, 17, 13, respectively, and the second heat threshold is preset to 30, then corresponding sub-structured data J-a and J-b may be generated based on sub-portions J-1, J-2, respectively. The sub-structured data J-a, J-b may determine cache characteristics along with structured data of other dimensions. If the sub-structured data J-a and J-b meet the first preset condition, the sub-structured data J-a and J-b can also be subjected to first transformation to obtain the sub-structured data J 0 Sub-structured data J 0 Cache characteristics may be determined along with structured data of other dimensions, instead of sub-structured data J-a and J-b. The storage characteristic of the initial structured data J will be set to 0. The method for determining the caching characteristics of the multidimensional structured data can be seen from the foregoing description.
In some embodiments of the present description, by determining the cache characteristics of the sub-structure with a higher heat degree, it is avoided that the data with a lower heat degree occupies the cache space in the structured data, so that the utilization efficiency of the cache space is improved.
In some embodiments of the present description, by the above method for determining storage characteristics, the possibility of storing low-heat data in a cache space is effectively reduced, and the search rate of a user is effectively increased.
FIG. 5 is a model structure diagram illustrating an access heat determination model according to some embodiments of the present description.
The access heat determination model may refer to a model for determining an estimated access heat of the structured data. In some embodiments, the access heat determination model may be a machine learning model. For example, the visit heat determination model may include any one or a combination of various feasible models, such as a Recurrent Neural Network (RNN) model, a Deep Neural Network (DNN) model, a Convolutional Neural Network (CNN) model, and the like.
As shown in FIG. 5, the access heat determination model 530 can determine an estimated access heat 540 for a dimension of structured data based on processing data features 510 of the structured data for the dimension. The inputs to the access heat determination model 530 may include data characteristics 510 of the structured data for a dimension, and the outputs may include an estimated access heat 540 of the structured data for the dimension. The data characteristics of the structured data can be described in relation to fig. 2. The relevant description of the estimated access heat can be referred to the relevant description of fig. 3.
In some embodiments, the input to access the heat determination model 530 may also include the clinical characteristics 520 of the physician.
The doctor's clinical characteristics may refer to the characteristic data of the main doctors of all patients contained in the structured data based on the dimension, and may include the doctor busyness and the duty ratio of each doctor. The doctor's clinical characteristics can be obtained based on preset rules, for example, the doctor's clinical characteristics can be determined based on the patients currently in charge of the main doctors of all the patients included in the structured data of the dimension, the future recent shift schedule, and the like.
The busyness of the doctor can refer to the busyness of the doctor and can be determined by the number of patients for which the doctor is responsible. For example, the doctor has a busyness of 5, and the number of patients who represent the doctor to be responsible for the treatment is 5.
The duty cycle may refer to the time cycle of various types of recent work by the physician. For example, the doctor needs to be responsible for outpatient work for 2 days within the next week, and is responsible for emergency work for 1 day, the working day of the operation is 2 days, and the rest is 2 days, then the outpatient work ratio of the doctor is 2/(2 +1+ 2) =0.29 (two decimal places are reserved), and the proportions of the emergency work, the operation work and the rest of the doctor are respectively 0.14,0.29 and 0.29, and further the task proportion of the doctor is (0.29, 0.14,0.29 and 0.29).
In some embodiments of the present description, by introducing relevant features of a doctor into the input of the model, the estimated access heat of the structured data determined by the model can be more accurate and more practical.
In some embodiments, the visit heat determination model may be trained from a plurality of labeled training samples. For example, a plurality of training samples with labels may be input into the initial visit heat determination model, a loss function is constructed from the labels and the results of the initial visit heat determination model, and parameters of the initial visit heat determination model are iteratively updated based on the loss function. And finishing the model training when the loss function of the initial visit heat determination model meets the preset condition to obtain the trained visit heat determination model. The preset condition may be that the loss function converges, the number of iterations reaches a threshold, and the like.
In some embodiments, the training samples may include at least data features of historical structured data for multiple dimensions. The tags can be the access heat of historical structured data for multiple dimensions. Wherein the labels may be retrieved based on manual labeling. In some embodiments, if the inputs to access the heat determination model also include the clinical characteristics of the doctor, the training samples at the time of model training also include the historical clinical characteristics of the doctor.
In some embodiments of the present description, the estimated access heat of the structured data is determined by the model, so that the efficiency of the determination process can be effectively improved, and the accuracy of the determined estimated access heat can also be improved.
FIG. 6 is an exemplary flow diagram illustrating determining distributed storage characteristics for multidimensional structured data according to some embodiments of the present description. As shown in fig. 6, the process 600 includes the following steps. In some embodiments, the flow 600 may be performed by the second determination module 130.
Step 610, determining the vulnerability of the multidimensional structured data.
The vulnerability may refer to the probability of errors in the data contained in the structured data during storage and invocation. The greater the value of the vulnerability, the higher the probability that the data contained in the structured data during storage and invocation will be corrupted. The vulnerability can be characterized by a positive integer within [1, 10 ].
In some embodiments, the vulnerability of the structured data may be determined based on its number of records (e.g., number of rows, etc.) and number of fields (e.g., number of columns, etc.). For example, the vulnerability of structured data may be positively correlated to the product of its number of records and number of fields.
In some embodiments, the vulnerability of the structured data may also be related to the number of sub-portions that the logic contained in the structured data splits out. For example, it may be a positive correlation. The related description of the logical splits and sub-parts can be found in relation to fig. 3.
In some embodiments of the present description, the determined vulnerability is made more accurate by hooking the vulnerability of the structured data with the number of its sub-parts.
In some embodiments, the vulnerability of the structured data may also be related to the estimated access heat of the structured data. For example, it may be a positive correlation. The relevant description of the estimated access heat and the determination method thereof can be referred to the relevant description of fig. 2 and fig. 4.
In some embodiments of the present description, the accuracy of the determined vulnerability is further improved by hooking the vulnerability of the structured data with its predicted access heat.
And step 620, determining distributed storage characteristics of the multidimensional structured data based on the fragility.
The distributed storage characteristics can refer to the number of images of the structured data stored in each distributed database, and can be characterized by positive integers. For example, the distributed storage feature is 3, and there are three mirrors representing the corresponding structured data, which are stored in 3 different distributed databases respectively.
In some embodiments, a relationship table of the vulnerability of the structured data and the distributed storage characteristics may be preset, and the distributed storage characteristics of the structured data may be determined based on the relationship table. For example, when the vulnerability of the structured data is set to 1 or 2, the distributed storage characteristic is 2; when the fragility of the structured data is 3 or 4, the distributed storage characteristic is 4; .......
In some embodiments of the present specification, by determining the vulnerability of the structured data and determining the distributed storage characteristics based on the vulnerability, it is possible to effectively avoid a situation that data queried by a user has errors due to data errors in the structured data.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, though not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the specification. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, certain features, structures, or characteristics may be combined as suitable in one or more embodiments of the specification.
Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.
Similarly, it should be noted that in the foregoing description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Where numerals describing the number of components, attributes or the like are used in some embodiments, it is to be understood that such numerals used in the description of the embodiments are modified in some instances by the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range in some embodiments of the specification are approximations, in specific embodiments, such numerical values are set forth as precisely as possible within the practical range.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments described herein. Other variations are also possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims (10)

1. A method of data structured processing of medical data, the method comprising:
acquiring medical data; the medical data comprises at least one of personal information data, clinic data, examination data, daily data and payment data of the patient;
determining multidimensional structured data based on the processing of the medical data;
determining storage characteristics of the multidimensional structured data based on data characteristics of the multidimensional structured data; the storage characteristics include at least caching characteristics.
2. The method of claim 1, wherein determining multidimensional structured data based on the processing of the medical data comprises:
acquiring structural features of multiple dimensions;
generating initial multi-dimensional structured data based on the medical data and the structured features of the multiple dimensions;
responding to the initial multi-dimensional structured data meeting a preset condition, transforming the initial multi-dimensional structured data, and determining the transformed initial multi-dimensional structured data as the multi-dimensional structured data;
and in response to the initial multi-dimensional structured data not meeting a preset condition, determining the initial multi-dimensional structured data as multi-dimensional structured data.
3. The method of claim 1, wherein determining storage characteristics of the multidimensional structured data based on data characteristics of the multidimensional structured data comprises:
determining the estimated access heat of the multidimensional structured data based on the data characteristics of the multidimensional structured data;
and determining the storage characteristics of the multidimensional structured data based on the estimated access heat.
4. The method of claim 1, wherein the storage features further comprise distributed storage features; the determining the storage characteristics of the multidimensional structured data based on the data characteristics of the multidimensional structured data further comprises:
determining the vulnerability of the multi-dimensional structured data;
determining the distributed storage characteristics of the multidimensional structured data based on the vulnerability.
5. A data structured processing system for medical data, the system comprising:
the acquisition module is used for acquiring medical data;
a first determination module to determine multidimensional structured data based on processing of the medical data;
the second determination module is used for determining the storage characteristics of the multidimensional structured data based on the data characteristics of the multidimensional structured data; the storage characteristics include at least caching characteristics.
6. The data structure processing system of medical data of claim 5, wherein the first determination module is further to:
acquiring structural features of multiple dimensions;
generating initial multi-dimensional structured data based on the medical data and the structured features of the multiple dimensions;
in response to the fact that a preset condition is met, transforming the initial multi-dimensional structured data, and determining a transformed result as the multi-dimensional structured data;
and in response to the preset condition not being met, determining the initial multi-dimensional structured data as multi-dimensional structured data.
7. The data structured processing system of medical data of claim 5, wherein said second determination module is further configured to:
determining the estimated access heat of the multidimensional structured data based on the data characteristics of the multidimensional structured data;
and determining the storage characteristics of the multidimensional structured data based on the estimated access heat.
8. The data structured processing system of medical data of claim 5, wherein said second determination module is further configured to:
determining the vulnerability of the multidimensional structured data;
determining the distributed storage characteristics of the multidimensional structured data based on the vulnerability.
9. An apparatus for data structured processing of medical data, the apparatus comprising at least one processor and at least one memory;
the at least one memory is for storing computer instructions;
the at least one processor is configured to execute at least a part of the computer instructions to implement the data structured processing method of medical data according to any one of claims 1 to 4.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement a data structured processing method of medical data according to any one of claims 1 to 4.
CN202211536230.0A 2022-12-01 2022-12-01 Data structured processing method and system for medical data Pending CN115757430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211536230.0A CN115757430A (en) 2022-12-01 2022-12-01 Data structured processing method and system for medical data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211536230.0A CN115757430A (en) 2022-12-01 2022-12-01 Data structured processing method and system for medical data

Publications (1)

Publication Number Publication Date
CN115757430A true CN115757430A (en) 2023-03-07

Family

ID=85342597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211536230.0A Pending CN115757430A (en) 2022-12-01 2022-12-01 Data structured processing method and system for medical data

Country Status (1)

Country Link
CN (1) CN115757430A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360637A (en) * 2018-11-21 2019-02-19 金色熊猫有限公司 Dimension combined method, system, equipment and the storage medium of medical information
CN109785927A (en) * 2019-02-01 2019-05-21 上海众恒信息产业股份有限公司 Clinical document structuring processing method based on internet integration medical platform
CN110888926A (en) * 2019-10-22 2020-03-17 北京百度网讯科技有限公司 Method and device for structuring medical text
CN110990372A (en) * 2019-11-06 2020-04-10 苏宁云计算有限公司 Dimensional data processing method and device and data query method and device
CN111190902A (en) * 2019-12-25 2020-05-22 南京医睿科技有限公司 Medical data structuring method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360637A (en) * 2018-11-21 2019-02-19 金色熊猫有限公司 Dimension combined method, system, equipment and the storage medium of medical information
CN109785927A (en) * 2019-02-01 2019-05-21 上海众恒信息产业股份有限公司 Clinical document structuring processing method based on internet integration medical platform
CN110888926A (en) * 2019-10-22 2020-03-17 北京百度网讯科技有限公司 Method and device for structuring medical text
CN110990372A (en) * 2019-11-06 2020-04-10 苏宁云计算有限公司 Dimensional data processing method and device and data query method and device
CN111190902A (en) * 2019-12-25 2020-05-22 南京医睿科技有限公司 Medical data structuring method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Shickel et al. DeepSOFA: a continuous acuity score for critically ill patients using clinically interpretable deep learning
US11669792B2 (en) Medical scan triaging system and methods for use therewith
Saboji A scalable solution for heart disease prediction using classification mining technique
US9378271B2 (en) Database system for analysis of longitudinal data sets
WO2021135429A1 (en) Knowledge map-based health information recommendation method, apparatus, device, and medium
CN110504031B (en) Cloud management database establishment method and system for health behavior intervention
US20220044809A1 (en) Systems and methods for using deep learning to generate acuity scores for critically ill or injured patients
US10430716B2 (en) Data driven featurization and modeling
CN111696661A (en) Patient clustering model construction method, patient clustering method and related equipment
Alshakhs et al. Predicting postoperative length of stay for isolated coronary artery bypass graft patients using machine learning
CN112132624A (en) Medical claims data prediction system
WO2017182509A1 (en) Hospital matching of de-identified healthcare databases without obvious quasi-identifiers
US20230042882A1 (en) Method of mapping and machine learning for patient-healthcare encounters to predict patient health and determine treatment options
CN112580817A (en) Managing machine learning features
CN113658712A (en) Doctor-patient matching method, device, equipment and storage medium
Bhoi et al. Premier: Personalized recommendation for medical prescriptions from electronic records
Alaria et al. Design Simulation and Assessment of Prediction of Mortality in Intensive Care Unit Using Intelligent Algorithms
Fathima et al. Majority voting ensembled feature selection and customized deep neural network for the enhanced clinical decision support system
Zhang et al. Predicting treatment initiation from clinical time series data via graph-augmented time-sensitive model
CN115757430A (en) Data structured processing method and system for medical data
CN113168917A (en) Blood glucose dataset optimization for improved hypoglycemia prediction based on machine learning implementation acquisition
CN113436746B (en) Medication recommendation method, device, equipment and storage medium based on sorting algorithm
CN115631823A (en) Similar case recommendation method and system
CN107993696B (en) Data acquisition method, device, client and system
CN113689924A (en) Similar medical record retrieval method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination