CN111046085A - Data source tracing processing method and device, medium and equipment - Google Patents

Data source tracing processing method and device, medium and equipment Download PDF

Info

Publication number
CN111046085A
CN111046085A CN201911320462.0A CN201911320462A CN111046085A CN 111046085 A CN111046085 A CN 111046085A CN 201911320462 A CN201911320462 A CN 201911320462A CN 111046085 A CN111046085 A CN 111046085A
Authority
CN
China
Prior art keywords
data
tracing
target
address
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911320462.0A
Other languages
Chinese (zh)
Other versions
CN111046085B (en
Inventor
韩朝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yidu Cloud Beijing Technology Co Ltd
Original Assignee
Nanjing Yiyi Yunda Data Technology Co Ltd
Nanjing Yirui Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Yiyi Yunda Data Technology Co Ltd, Nanjing Yirui Technology Co Ltd filed Critical Nanjing Yiyi Yunda Data Technology Co Ltd
Priority to CN201911320462.0A priority Critical patent/CN111046085B/en
Publication of CN111046085A publication Critical patent/CN111046085A/en
Application granted granted Critical
Publication of CN111046085B publication Critical patent/CN111046085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The disclosure provides a data source tracing processing method and device, a medium and electronic equipment, and relates to the technical field of data processing. The method comprises the following steps: carrying out typesetting processing on the medical data according to the disease category identification to obtain and store typesetting data related to each disease category, wherein the address for storing the typesetting data is a first address; carrying out aggregation processing on the nano-row data of each disease category according to the patient identification to obtain the source tracing data of each patient; mining based on the tracing data to obtain structured data and storing the structured data, wherein the address for storing the structured data is a second address; and tracing the target structured data according to the incidence relation between the second address and the first address. The technical scheme provides a data tracing processing scheme, which is used for tracing new data after data mining processing to obtain source data so as to verify the accuracy of the new data.

Description

Data source tracing processing method and device, medium and equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a medium, and a device for processing data.
Background
Data mining refers to a process of processing collected source data to convert the source data into more valuable information, and has been widely applied to the fields of medicine, business management, production control, market analysis, and the like.
In the actual application process, when new data after data mining processing is used for analysis, the accuracy of the new data needs to be verified before the analysis, so that the accuracy of further data analysis is guaranteed.
Therefore, a data tracing processing scheme is needed to trace the source of the new data after data mining processing to obtain source data, so as to check the accuracy of the new data.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a data tracing method, a data tracing apparatus, a computer readable medium and an electronic device for implementing the method, and further a data tracing scheme, so as to trace the source of new data after data mining processing to obtain source data, thereby verifying the accuracy of the new data.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the embodiments of the present disclosure, a method for processing data by tracing a source is provided, where the method includes:
carrying out typesetting processing on the medical data according to the disease type identification to obtain and store the typesetting data of each disease type, wherein the address for storing the typesetting data is a first address;
performing aggregation processing on the patient type data according to the patient identification to obtain source tracing data of each patient;
mining based on the tracing data to obtain structured data and storing the structured data, wherein the address for storing the structured data is a second address;
and tracing the target structured data according to the incidence relation between the second address and the first address.
In an embodiment of the present disclosure, based on the foregoing scheme, performing aggregation processing on the nauplicate data of each disease category according to the patient identifier to obtain traceability data about each patient, includes:
acquiring the data of the patient who is subjected to the treatment, wherein the data of the patient who is subjected to the treatment comprises a target patient identification;
screening the target admission data belonging to the same disease category according to admission time to obtain target traceability data corresponding to the target patient identification;
the target tracing data comprises nano-row data related to at least one disease species and a first path of the nano-row data.
In an embodiment of the present disclosure, based on the foregoing scheme, the obtaining target tracing data corresponding to the target patient identifier according to the screening of the target hospitation data belonging to the same disease category according to the hospitation time includes:
and screening the latest data of the cashier time from the target cashier data belonging to the same disease category to obtain the target traceability data corresponding to the target patient identification.
In an embodiment of the present disclosure, based on the foregoing scheme, mining based on the above source tracing data to obtain structured data includes:
mapping the tracing data to obtain the structured data; or the like, or, alternatively,
extracting the tracing data to obtain the structured data; or the like, or, alternatively,
based on medical logic, logic calculation is carried out on one field or a plurality of fields in the tracing data according to the medical logic to obtain the structured data.
In an embodiment of the present disclosure, based on the foregoing scheme, performing source tracing processing on target structured data according to an association relationship between the second address and the first address includes:
determining a source tracing path corresponding to each structured data according to the incidence relation between the second address and the first address;
and acquiring target structured data, and acquiring a second address for storing the target tracing data according to a target tracing path corresponding to the target structured data so as to acquire the target tracing data from the second address.
In an embodiment of the present disclosure, based on the foregoing scheme, after determining a traceable path corresponding to each piece of structured data according to an association relationship between the second address and the first address, the method further includes:
and performing hash processing on the tracing path to obtain a tracing hash value corresponding to each structured data, and performing the tracing processing on the target structured data according to the tracing hash value.
In an embodiment of the present disclosure, based on the foregoing scheme, before performing source tracing processing on the target structured data, the method further includes:
determining first application data corresponding to the first structured data, wherein the first application data is used for showing to a user;
performing association processing on a first traceability hash value corresponding to the first structured data and the first application data;
carrying out source tracing processing on the target structured data, wherein the source tracing processing comprises the following steps:
acquiring target application data corresponding to target structured data, and acquiring a target traceability hash value associated with the target application data;
and performing tracing processing on the target structured data according to the target tracing path corresponding to the target tracing hash value.
According to a second aspect of the embodiments of the present disclosure, there is provided a data tracing apparatus, including: the system comprises a receiving and arranging module, an aggregation module, a mining module and a source tracing module.
Wherein the nanoribbon module is configured to: carrying out typesetting processing on the medical data according to the disease type identification to obtain and store the typesetting data of each disease type, wherein the address for storing the typesetting data is a first address;
the above aggregation module configured to: performing aggregation processing on the patient type data according to the patient identification to obtain source tracing data of each patient;
the mining module is configured to: mining based on the tracing data to obtain structured data and storing the structured data, wherein the address for storing the structured data is a second address; and the number of the first and second groups,
the tracing module is configured to: and tracing the target structured data according to the incidence relation between the second address and the first address.
According to a third aspect of the embodiments of the present disclosure, there is provided a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for processing data by tracing to a source as described in any one of the above-mentioned technical solutions of the first aspect of the embodiments.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: one or more processors; a storage device, configured to store one or more programs, and when the one or more programs are executed by the one or more processors, enable the one or more processors to implement the method for processing data by tracing a source as described in any of the technical solutions of the first aspect of the embodiments.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
in some embodiments of the present disclosure, the medical data regarding a plurality of disease categories is subjected to a hospitalization process according to the disease category identification, the hospitalization data regarding each disease category is obtained, and a storage address of the hospitalization data of each disease category is taken as the first address. Then, the nanoribbon data of each disease type are aggregated according to the patient identification, so as to obtain the source tracing data about each patient. Further, mining processing is carried out based on the source tracing data to obtain structured data, and the address for storing the structured data is a second address. Finally, according to the incidence relation between the second address and the first address, the source tracing processing of the target structured data can be realized. Therefore, the technical scheme provides a data tracing processing scheme, the structured data after the data mining processing can be traced to obtain the source data, the accuracy of the structured data is further verified, and finally the accuracy of the data after the data mining processing is favorably improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
fig. 1 is a schematic diagram of a system architecture for implementing a method and an apparatus for processing data tracing in an exemplary embodiment of the present disclosure;
FIG. 2 is a flow diagram illustrating a method for processing data according to an embodiment of the present disclosure;
FIG. 3 shows a flow diagram of a data aggregation processing method according to an embodiment of the present disclosure;
FIG. 4 shows a flow diagram of a traceability processing method according to an embodiment of the present disclosure;
FIG. 5 shows a flow diagram of a method of traceback processing of data according to another embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a data tracing apparatus according to an embodiment of the present disclosure;
FIG. 7 shows a schematic diagram of a structure of a computer storage medium in an exemplary embodiment of the disclosure; and the number of the first and second groups,
fig. 8 shows a schematic structural diagram of an electronic device in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
The present exemplary embodiment first provides a system architecture for implementing a data tracing processing method, which can be applied to various data processing scenarios. Referring to fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send request instructions or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a photo processing application, a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, for example, the server 105 acquires medical data input with the terminal apparatuses 101, 102, 103. The server 105 performs a hospitation process on the medical data according to the disease category identifier, obtains and stores hospitation data about each disease category, wherein the address for storing the hospitation data is a first address (for example only). The server 105 may aggregate the nanoobjects for each patient based on the patient identification, resulting in traceability data for each patient (for example only). The server 105 performs mining processing based on the source tracing data to obtain structured data and stores the structured data, wherein the address for storing the structured data is a second address. Finally, the server 105 performs source tracing processing on the target structured data according to the association relationship between the second address and the first address.
The technical scheme provides a data source tracing processing method and device, a computer storage medium and electronic equipment. The following description will first describe a data tracing method:
fig. 2 is a flowchart illustrating a method for processing data according to an embodiment of the present disclosure. The traceability processing method of data provided by the embodiment overcomes the above problems in the prior art at least to a certain extent.
Referring to fig. 2, the method for processing data source tracing provided by this embodiment includes:
step S210, performing typesetting processing on the medical data according to the disease category identification and storing the medical data to obtain the typesetting data related to each disease category, wherein the address for storing the typesetting data is a first address;
step S220, carrying out aggregation processing on the patient type data according to the patient identification to obtain the source tracing data of each patient;
step S230, mining is carried out based on the source tracing data to obtain structured data and the structured data are stored, wherein the address for storing the structured data is a second address; and the number of the first and second groups,
step S240, performing source tracing processing on the target structured data according to the association relationship between the second address and the first address.
In the technical solution provided by the embodiment shown in fig. 2, the medical data of a plurality of disease types is subjected to the receiving and ranking process according to the disease type identification to obtain receiving and ranking data of each disease type, and the storage address of the receiving and ranking data of each disease type is taken as the first address. Then, the nanoribbon data of each disease type are aggregated according to the patient identification, so as to obtain the source tracing data about each patient. Further, mining processing is carried out based on the source tracing data to obtain structured data, and the address for storing the structured data is a second address. Finally, according to the incidence relation between the second address and the first address, the source tracing processing of the target structured data can be realized. Therefore, the technical scheme provides a data tracing processing scheme, the structured data after the data mining processing can be traced to obtain the source data, the accuracy of the structured data is further verified, and finally the accuracy of the data after the data mining processing is favorably improved.
The following explains the specific embodiments of the steps of the solution shown in fig. 2:
in step S210, a patient identifier is used to perform a patient enrollment process on the medical data, and the patient enrollment data for each patient is obtained and stored, wherein the address storing the patient enrollment data is the first address.
In an exemplary embodiment, medical data about multiple disease categories, such as multi-patient visit data, hospitalization data, etc., is first collected; further, for the collected medical data about multiple disease categories, the production of the nanor data is performed for each individual disease category. For example, in the XX hospital multi-disease database, the medical data is composed of three disease species: liver cancer, gastric cancer, bladder cancer. Then medical data of each disease is distinguished according to the disease identification, and furthermore, the single disease is respectively subjected to the nano-data production. Illustratively, the name of the admission and discharge diagnosis (outpatient/emergency/inpatient/outpatient diagnosis) includes medical data of liver cancer, stomach cancer and bladder cancer, which are stored in different disease data paths. In this embodiment, for convenience of description, an address storing the nanoribbon data is referred to as a first address.
In step S220, the nanoribbon data of each disease category is aggregated according to the patient identifier, so as to obtain the traceability data of each patient.
In an exemplary embodiment, fig. 3 shows a flowchart of a data aggregation processing method according to an embodiment of the present disclosure, which may be specifically used as a specific implementation manner of step S220. Referring to fig. 3, the embodiment shown in the figure includes step S310 and step S320.
In step S310, the inclusion data including the identification of the target patient is obtained, resulting in target inclusion data for at least one disease species.
Illustratively, the target patient identification (psn) is S (i.e., the unique identification of patient S). For a patient S with psn being S, data including the target patient identifier S is obtained from the patient data of each disease category, and the patient data of the disease category A and the corresponding first path A ', the patient data of the disease category B and the corresponding first path B ', and the patient data of the disease category C and the corresponding first path C ' (i.e. the storage address of the patient data of the disease category) are obtained.
In an exemplary embodiment, the production time of the nanoribbon is used as an attribute of the patient, then the patient unique identifier psn in the patient basic information table is taken out as a key, and then the patient data with the same psn is aggregated to form a table such as: psn1 [ post-hospitation data uniquely identified by patient in disease 1 as psn1, post-hospitation data uniquely identified by patient in disease 2 as psn1, and post-hospitation data … … uniquely identified by patient in disease 3 as psn1 ]. Therefore, after data of each disease is subjected to data storage and discharge, data fusion is carried out according to the patient identification, so that the medical data is associated with the user, the aggregated data conforms to the disease development rule of the user, the life cycle of a certain disease is easy to determine, and the subsequent data mining result is more fit with the real world.
In step S320, the target receiving and ranking data belonging to the same disease category is screened according to the receiving and ranking time, so as to obtain the target tracing data corresponding to the target patient identifier.
In an exemplary embodiment, the generation time of the nano-meter data of each disease category is used as the nano-meter time, and further, the latest data of the nano-meter time is screened from the target nano-meter data belonging to the same disease category according to the nano-meter time, so that the aggregated data reflects the latest disease condition of the patient, and the data mining result is more accurate.
Illustratively, the disease model labels on each patient are taken out and combined together in a traversal manner, then the patients with the same psn are sorted in a reverse order according to the nano production time, the first patient data (i.e. the finally produced nano data) is taken as the source tracing original data of the patient, and the just aggregated disease model label is taken as the attribute of the patient. Such as disease _ tags [ disease tag 1, disease tag 2, disease tag 3 … ]. Thus, target tracing data corresponding to the target patient identifier is obtained.
Continuing to refer to fig. 2, in step S230, mining is performed based on the source tracing data to obtain structured data and store the structured data, where an address storing the structured data is a second address.
In an exemplary embodiment, the data mining processing manner is not limited, and may be to perform mapping operation on the tracing data to obtain structured data X; extracting one or more tracing data to obtain structured data Y; based on medical logic, carrying out logic calculation on one field or a plurality of fields in the tracing data to obtain structured data Z; and so on.
Illustratively, when a medical research and development special disease library is used for data production, patient data is obtained through a Hospital Information System (HIS), a Laboratory Information System (LIS) and other systems, a corresponding business System table is established, corresponding disease patients are discharged according to the requirements of the disease special disease library to be researched, structured field data production (data mining) based on natural language understanding is carried out, a large text field is used for producing a plurality of structured fields with medical research meanings, and in addition, new fields with medical research meanings are calculated based on a plurality of field paths. For example, the specialist reservoir fields fall into three categories: mapping fields, which are mapped from an original service table; secondly, structured fields are extracted from original text fields; thirdly, calculating a field: a field calculated according to certain medical logic based on one or several fields.
When a doctor conducts scientific research on the fields subjected to the data mining, the doctor needs to know the source condition of the target field to check whether the field processing mode is correct or not, and then can conduct scientific research design on the premise of the correct field path source. The data tracing method provided by the technical scheme can determine the source condition of the structured data obtained after data mining/production processing.
Specifically, the method comprises the following steps:
in an exemplary embodiment, for each disease model of a single disease, after structured data production/mining according to the nanoobject data, new fields are produced according to one or more postnanoobject data field paths, and the path from the new datamation field to the postnanoobject field is mapped as a data traceable path. Specifically, in this embodiment, for convenience of description, an address storing the arranged data is referred to as a first address, and an address storing the structured data is referred to as a second address. And in step S240, performing source tracing on the target structured data according to the association relationship between the second address and the first address.
In an exemplary embodiment, fig. 4 shows a flowchart of a data tracing processing method according to an embodiment of the present disclosure, which may be specifically used as a specific implementation manner of step S240. Referring to fig. 4, the embodiment shown in the figure includes step S410 and step S420.
In step S410, a tracing path corresponding to each structured data is determined according to the association relationship between the second address and the first address.
In an exemplary embodiment, if the structured data X is determined by the traceable data X1 and the traceable data X2, and meanwhile, the address (i.e., the second address) storing the structured data X is "qwerty", the address (i.e., the first address) storing the traceable data X1 is "yuoop", and the address (i.e., the first address) storing the traceable data X2 is "asdfg". The tracing path about the above structured data X is: "qwerty-yuoop" and "qwerty-asdfg".
Further, hash processing may be performed on the tracing path, for example, using a Message Digest Algorithm (md 5), to obtain a tracing hash value/hash value (hashvalue) corresponding to each structured data, so as to perform tracing processing on the target structured data according to the tracing hash value.
Illustratively, md5 processing is performed on the tracing path about the structured data G, and then the unique identification psn of the patient and the md5 of the tracing path are used as keys, and the data structure is as follows: md5 value of the traceback path, value being the specific path. For example, key is 0054879045 fe1267bdd0942704adc0e.06c6772b49a135f06e01e694418d2917; the value is: output _ medial _ order _ strategy.b5617237-dcfa-46df-9ff8-1b571d6d4629 description.order _ group _ info.f7116551-50d1-4854-b9ec-421b15133566. descriptor _ name.
Illustratively, in view of the huge amount of data (more than one hundred million pieces) of the data structure, the data structure can be stored in a distributed and column-oriented open source database hbase so as to fully utilize the fast read-write capability of the key-value structure of the hbase.
In an exemplary embodiment, fig. 5 shows a flowchart of a method for tracing data according to another embodiment of the present disclosure. Specifically, before performing the source tracing process on the target structured data, the technical solution may further include step S510 and step S520.
In step S510, determining first application data corresponding to first structured data, where the first application data is used for displaying to a user; and in step S520, performing association processing on the first tracing hash value corresponding to the first structured data and the first application data.
In an exemplary embodiment, the above-mentioned tracing path md5 value is written into the application layer patient data as a tracing path md5 table, and stored as a separate table, and the tracing data associated with the patient is associated with the psn + path + group _ id + field name, so as to reduce the limitation of the data capacity of the main table (for example, the database monog has a limitation of 16M for a single piece of data), and at the same time, the tracing data and the application data are decoupled.
In step S420, target structured data is obtained, and a second address storing the target tracing data is obtained according to a target tracing path corresponding to the target structured data, so as to obtain the target tracing data from the second address. During the production of the application layer data, each field value includes its corresponding md5 value, so as to further: and carrying out source tracing on the application data according to the md5 value of the application data.
Illustratively, referring to fig. 5, steps S530 and S540 show a specific implementation of the tracing process for the target structured data. Wherein:
in step S530, obtaining target application data corresponding to the target structured data, and obtaining a target traceability hash value associated with the target application data; and in step S540, performing tracing processing on the target structured data according to the target tracing path corresponding to the target tracing hash value.
In an exemplary embodiment, when a certain application field (denoted: target application data) of a certain record of a certain patient needs to be subjected to a tracing process: for the field value displayed by the product end, looking up the traced md5 value in the traced md5 table, and obtaining the target traced hash value (md5 value) associated with the path of the target application data field; further, a real tracing path (i.e. address information before hash processing) is obtained from hbase according to the md5 value. Finally, according to the real tracing path, tracing data can be determined, and then original data is found in the data after the data passes through the receiving and arranging, so that the tracing processing process of the target application data field is completed.
The technical scheme provides a data tracing processing scheme, which can trace the structured data after data mining processing to obtain source data, so as to verify the accuracy of the structured data, and finally is beneficial to improving the accuracy of the data after data mining processing.
Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments are implemented as computer programs executed by processors, including Central Processing Units (CPUs) and Graphics Processing Units (GPUs). When the computer program is executed by a CPU or a GPU, the above-described functions defined by the above-described methods provided by the present disclosure are performed. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following describes embodiments of a data tracing apparatus according to the present disclosure, which can be used to execute the data tracing method provided by the above embodiments of the present disclosure.
Fig. 6 shows a schematic structural diagram of a data tracing apparatus according to an embodiment of the present disclosure, and referring to fig. 6, the data tracing apparatus 600 provided in this embodiment includes: a nanorower module 601, an aggregation module 602, a mining module 603, and a provenance module 604.
Wherein, the nanoribbon module 601 is configured to: carrying out typesetting processing on the medical data according to the disease type identification to obtain and store the typesetting data of each disease type, wherein the address for storing the typesetting data is a first address;
the aggregation module 602, configured to: performing aggregation processing on the patient type data according to the patient identification to obtain source tracing data of each patient;
the mining module 603 is configured to: mining based on the tracing data to obtain structured data and storing the structured data, wherein the address for storing the structured data is a second address; and the number of the first and second groups,
the tracing module 604 is configured to: and tracing the target structured data according to the incidence relation between the second address and the first address.
In an embodiment of the present disclosure, based on the foregoing solution, the aggregation module 602 includes: the device comprises an acquisition unit and a screening unit.
Wherein the obtaining unit is configured to: acquiring the data of the patient who is subjected to the treatment, wherein the data of the patient who is subjected to the treatment comprises a target patient identification; the screening unit is configured to: screening the target admission data belonging to the same disease category according to admission time to obtain target traceability data corresponding to the target patient identification; the target tracing data comprises nano-row data related to at least one disease species and a first path of the nano-row data.
In an embodiment of the present disclosure, based on the foregoing scheme, the screening unit is specifically configured to:
and screening the latest data of the cashier time from the target cashier data belonging to the same disease category to obtain the target traceability data corresponding to the target patient identification.
In an embodiment of the present disclosure, based on the foregoing scheme, the mining module 603 is specifically configured to:
mapping the tracing data to obtain the structured data; or, extracting the tracing data to obtain the structured data; or, based on medical logic, performing logic calculation on one field or a plurality of fields in the tracing data to obtain the structured data.
In an embodiment of the present disclosure, based on the foregoing scheme, the tracing module 604 is specifically configured to:
determining a source tracing path corresponding to each structured data according to the incidence relation between the second address and the first address; and acquiring target structured data, and acquiring a second address for storing the target tracing data according to a target tracing path corresponding to the target structured data so as to acquire the target tracing data from the second address.
In an embodiment of the present disclosure, based on the foregoing scheme, the apparatus 600 for processing source tracing of data further includes: and a hash processing module.
Wherein the hash processing module is configured to: after the tracing module 604 determines a tracing path corresponding to each structured data according to the association relationship between the second address and the first address: and performing hash processing on the tracing path to obtain a tracing hash value corresponding to each structured data, and performing the tracing processing on the target structured data according to the tracing hash value.
In an embodiment of the present disclosure, based on the foregoing scheme, the apparatus 600 for processing source tracing of data further includes: and (5) an association module.
Wherein the association module is configured to: before the source tracing processing is carried out on the target structured data: determining first application data corresponding to the first structured data, wherein the first application data is used for showing to a user; performing association processing on a first traceability hash value corresponding to the first structured data and the first application data;
the tracing module 604 is specifically configured to:
acquiring target application data corresponding to target structured data, and acquiring a target traceability hash value associated with the target application data; and performing tracing processing on the target structured data according to the target tracing path corresponding to the target tracing hash value.
As each functional module of the data tracing apparatus in the exemplary embodiment of the present disclosure corresponds to the step of the exemplary embodiment of the data tracing method, please refer to the embodiment of the data tracing method in the present disclosure for details that are not disclosed in the embodiment of the data tracing apparatus in the present disclosure.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer storage medium capable of implementing the above method. On which a program product capable of implementing the above-described method of the present specification is stored. In some possible embodiments, various aspects of the present disclosure may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above of this specification when the program product is run on the terminal device.
Referring to fig. 7, a program product 700 for implementing the above method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product described above may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (EPROM or flash Memory), an optical fiber, a portable compact disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of Network, including a Local Area Network (LAN) or Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 800 according to this embodiment of the disclosure is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present disclosure.
As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.
Wherein the storage unit stores program codes, and the program codes can be executed by the processing unit 810, so that the processing unit 810 executes the steps according to various exemplary embodiments of the present disclosure described in the "exemplary method" section above in this specification. For example, the processing unit 810 may perform the following as shown in fig. 2: step S210, carrying out typesetting processing on the medical data according to the disease category identification to obtain and store the typesetting data related to each disease category, wherein the address for storing the typesetting data is a first address; step S220, carrying out aggregation processing on the patient type data according to the patient identification to obtain the source tracing data of each patient; step S230, mining is carried out based on the source tracing data to obtain structured data and the structured data are stored, wherein the address for storing the structured data is a second address; and step S240, performing source tracing processing on the target structured data according to the incidence relation between the second address and the first address.
For example, the processing unit 810 may further perform a data tracing processing method as shown in any one of fig. 3 to fig. 5.
Storage 820 may include readable media in the form of volatile storage such as: a Random Access Memory (RAM) 8201 and/or a cache Memory 8202, and may further include a Read-Only Memory (ROM) 8203.
The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may be through an Input/Output (I/O) interface 850. Further, the I/O interface 850 is connected with the display unit 840 to transmit content to be displayed to the display unit 840 through the I/O interface 850 for viewing by a user.
Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A method for processing data by tracing source is characterized in that the method comprises the following steps:
carrying out typesetting processing on the medical data according to the disease category identification to obtain and store typesetting data related to each disease category, wherein the address for storing the typesetting data is a first address;
performing aggregation processing on the patient type data according to the patient identification to obtain source tracing data of each patient;
mining based on the tracing data to obtain structured data and storing the structured data, wherein the address for storing the structured data is a second address;
and tracing the source of the target structured data according to the incidence relation between the second address and the first address.
2. The method of claim 1, wherein aggregating the nanoobjects of each patient type according to patient identification to obtain traceability data about each patient comprises:
acquiring the data of the patient who is subjected to the treatment, wherein the data of the patient who is subjected to the treatment comprises a target patient identification;
screening the target admission data belonging to the same disease category according to admission time to obtain target traceability data corresponding to the target patient identification;
wherein the target traceability data comprises nana data about at least one disease species, and a first path of the nana data.
3. The method of claim 2, wherein the step of obtaining the target traceability data corresponding to the target patient identifier according to the filtering of the target hospitation data belonging to the same disease category according to the hospitation time comprises:
and screening the latest data of the cashier time from the target cashier data belonging to the same disease category to obtain the target traceability data corresponding to the target patient identification.
4. The method according to any one of claims 1 to 3, wherein performing mining processing based on the traceable data to obtain structured data comprises:
mapping operation is carried out on the tracing data to obtain the structured data; or the like, or, alternatively,
extracting the tracing data to obtain the structured data; or the like, or, alternatively,
and based on medical logic, performing logic calculation on one field or a plurality of fields in the tracing data to obtain the structured data.
5. The method according to any one of claims 1 to 3, wherein performing source tracing on the target structured data according to the association relationship between the second address and the first address comprises:
determining a source tracing path corresponding to each structured data according to the incidence relation between the second address and the first address;
obtaining target structured data, and obtaining a second address for storing the target tracing data according to a target tracing path corresponding to the target structured data, so as to obtain the target tracing data from the second address.
6. The method according to claim 5, wherein after determining the traceback path corresponding to each structured data according to the association relationship between the second address and the first address, the method further comprises:
and carrying out Hash processing on the tracing path to obtain a tracing Hash value corresponding to each structured data, and carrying out tracing processing on the target structured data according to the tracing Hash value.
7. The method of claim 6, wherein prior to performing the traceablility process on the target structured data, the method further comprises:
determining first application data corresponding to the first structured data, wherein the first application data is used for displaying to a user;
performing association processing on a first tracing hash value corresponding to the first structured data and the first application data;
carrying out source tracing processing on the target structured data, wherein the source tracing processing comprises the following steps:
acquiring target application data corresponding to target structured data, and acquiring a target traceability hash value associated with the target application data;
and performing tracing processing on the target structured data according to the target tracing path corresponding to the target tracing hash value.
8. An apparatus for processing data, the apparatus comprising:
a nano-strip module configured to: carrying out typesetting processing on the medical data according to the disease category identification to obtain and store typesetting data related to each disease category, wherein the address for storing the typesetting data is a first address;
an aggregation module configured to: performing aggregation processing on the patient type data according to the patient identification to obtain source tracing data of each patient;
a mining module configured to: mining based on the tracing data to obtain structured data and storing the structured data, wherein the address for storing the structured data is a second address;
a source tracing module configured to: and tracing the source of the target structured data according to the incidence relation between the second address and the first address.
9. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, implements a method of traceablility processing of data according to any one of claims 1 to 7.
10. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of traceablility processing of data as claimed in any one of claims 1 to 7.
CN201911320462.0A 2019-12-19 2019-12-19 Data tracing processing method and device, medium and equipment Active CN111046085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911320462.0A CN111046085B (en) 2019-12-19 2019-12-19 Data tracing processing method and device, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911320462.0A CN111046085B (en) 2019-12-19 2019-12-19 Data tracing processing method and device, medium and equipment

Publications (2)

Publication Number Publication Date
CN111046085A true CN111046085A (en) 2020-04-21
CN111046085B CN111046085B (en) 2023-04-28

Family

ID=70237968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911320462.0A Active CN111046085B (en) 2019-12-19 2019-12-19 Data tracing processing method and device, medium and equipment

Country Status (1)

Country Link
CN (1) CN111046085B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112289394A (en) * 2020-08-12 2021-01-29 上海柯林布瑞信息技术有限公司 Case subscription method and device for disease library, storage medium and terminal
CN113901094A (en) * 2021-09-29 2022-01-07 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130046558A1 (en) * 2011-08-18 2013-02-21 Siemens Medical Solutions Usa, Inc. System and Method for Identifying Inconsistent and/or Duplicate Data in Health Records
CN108876405A (en) * 2018-06-14 2018-11-23 李兵 Product traceability records processing method, device and electronic equipment
CN109471918A (en) * 2018-11-09 2019-03-15 医渡云(北京)技术有限公司 A kind of middle field retroactive method, equipment and medium
CN110427549A (en) * 2019-06-20 2019-11-08 北京清博大数据科技有限公司 A kind of network public opinion Source Tracing method, apparatus, terminal and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130046558A1 (en) * 2011-08-18 2013-02-21 Siemens Medical Solutions Usa, Inc. System and Method for Identifying Inconsistent and/or Duplicate Data in Health Records
CN108876405A (en) * 2018-06-14 2018-11-23 李兵 Product traceability records processing method, device and electronic equipment
CN109471918A (en) * 2018-11-09 2019-03-15 医渡云(北京)技术有限公司 A kind of middle field retroactive method, equipment and medium
CN110427549A (en) * 2019-06-20 2019-11-08 北京清博大数据科技有限公司 A kind of network public opinion Source Tracing method, apparatus, terminal and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112289394A (en) * 2020-08-12 2021-01-29 上海柯林布瑞信息技术有限公司 Case subscription method and device for disease library, storage medium and terminal
CN113901094A (en) * 2021-09-29 2022-01-07 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111046085B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
Yaqoob et al. Big data: From beginning to future
US11232365B2 (en) Digital assistant platform
CN111813963B (en) Knowledge graph construction method and device, electronic equipment and storage medium
CA3046247C (en) Data platform for automated data extraction, transformation, and/or loading
US11380116B2 (en) Automatic delineation and extraction of tabular data using machine learning
CN112711581B (en) Medical data checking method and device, electronic equipment and storage medium
CN109524070B (en) Data processing method and device, electronic equipment and storage medium
US10586611B2 (en) Systems and methods employing merge technology for the clinical domain
US9760260B2 (en) System for retrieving, visualizing and editing semantic annotations
CN111709527A (en) Operation and maintenance knowledge map library establishing method, device, equipment and storage medium
WO2022048210A1 (en) Named entity recognition method and apparatus, and electronic device and readable storage medium
CN110597946B (en) Case storage method, device, equipment and storage medium
US20240053307A1 (en) Identifying Repetitive Portions of Clinical Notes and Generating Summaries Pertinent to Treatment of a Patient Based on the Identified Repetitive Portions
CN111046085B (en) Data tracing processing method and device, medium and equipment
CN112507230A (en) Webpage recommendation method and device based on browser, electronic equipment and storage medium
US20210150270A1 (en) Mathematical function defined natural language annotation
CN111383768B (en) Medical data regression analysis method, device, electronic equipment and computer readable medium
CN113204698B (en) News subject term generation method, device, equipment and medium
CN113792138B (en) Report generation method and device, electronic equipment and storage medium
CN110931136B (en) Event searching method and device, computer medium and electronic equipment
CN113053531B (en) Medical data processing method, medical data processing device, computer readable storage medium and equipment
CN113220896A (en) Multi-source knowledge graph generation method and device and terminal equipment
CN112786132B (en) Medical record text data segmentation method and device, readable storage medium and electronic equipment
CN113590845B (en) Knowledge graph-based document retrieval method and device, electronic equipment and medium
CN113806372B (en) New data information construction method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210225

Address after: 100191 room 801, 8th floor, building 9, 35 Huayuan North Road, Haidian District, Beijing

Applicant after: YIDU CLOUD Ltd.

Address before: Room 1502, 15th floor, No.211, pubin Road, Jiangbei new district, Nanjing, Jiangsu 210000

Applicant before: Nanjing Yirui Technology Co.,Ltd.

Applicant before: Nanjing Yiyi Yunda Data Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant