CN114741438A - Multi-center research data extraction method and device, electronic equipment and storage medium - Google Patents

Multi-center research data extraction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114741438A
CN114741438A CN202210204375.4A CN202210204375A CN114741438A CN 114741438 A CN114741438 A CN 114741438A CN 202210204375 A CN202210204375 A CN 202210204375A CN 114741438 A CN114741438 A CN 114741438A
Authority
CN
China
Prior art keywords
data
extracting
research data
medical record
patient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210204375.4A
Other languages
Chinese (zh)
Inventor
冯晓彬
黎成权
吴美龙
温晓夕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Beijing Tsinghua Changgeng Hospital
Original Assignee
Tsinghua University
Beijing Tsinghua Changgeng Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Beijing Tsinghua Changgeng Hospital filed Critical Tsinghua University
Priority to CN202210204375.4A priority Critical patent/CN114741438A/en
Publication of CN114741438A publication Critical patent/CN114741438A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application relates to the technical field of data processing, in particular to a method and a device for extracting multi-center research data, electronic equipment and a storage medium, wherein the method comprises the following steps: extracting initial study data from at least one patient medical record from a plurality of centers; carrying out standard processing on the initial research data to generate at least one research data and form a standard data set; and extracting the multi-center research data from the data meeting the privacy safety inspection conditions in the standard data set. Therefore, the data source can be standardized and the standardized heterogeneous data can be extracted, so that more complete and high-quality research data can be obtained.

Description

Multi-center research data extraction method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for extracting multi-center research data, an electronic device, and a storage medium.
Background
Since the multi-center research can obtain more data and perform external verification, the research result is more reliable, and more researchers are dedicated to developing the multi-center research. Typical disadvantages of multi-center research are high data acquisition cost and difficult processing. With the development of medical informatization, more and more researches adopt the construction of a database, and multi-center data is collected into the database for research.
Conventional databases are entered manually by medical or research personnel. With the development of big data and artificial intelligence, how to automatically collect research data becomes an urgent need for researchers. Medical staff have diversity in language expression when writing medical records, so that medical data has strong heterogeneity, and the heterogeneity and non-standardization of the data bring difficulties for automatic extraction. Incomplete and non-normative data expressed in the information system is yet another difficulty in multi-center studies, where supplementation and correction of data becomes almost impossible once a patient leaves the hospital.
Therefore, how to standardize the data sources and obtain more complete and high-quality research data is a pressing need of researchers.
Disclosure of Invention
The application provides a method and a device for extracting multi-center research data, electronic equipment and a storage medium, which can standardize a data source and extract standardized heterogeneous data so as to obtain more complete and high-quality research data.
An embodiment of a first aspect of the present application provides a method for extracting multi-center research data, including the following steps:
extracting initial study data from at least one patient medical record from a plurality of centers;
performing standard processing on the initial research data to generate at least one research data and form a standard data set; and
and extracting the multi-center research data from the data meeting the privacy safety inspection conditions in the standard data set.
Optionally, the at least one study data includes one or more patient indicators of symptoms, physical examination, treatment, laboratory examination, imaging examination, and pathology examination.
Optionally, the extracting initial study data from at least one patient medical record of the multicenter comprises:
identifying an actual type of each patient medical record;
and when the actual type is the unstructured medical text, performing natural language processing and privacy removal processing on the text data of the patient medical record to obtain the initial research data.
Optionally, before performing natural language processing and privacy elimination processing on the text data of the patient medical record, the method further includes:
detecting missing data in the textual data of the patient medical record;
and displaying a reminding signal of the missing data for the target person, and supplementing the missing data by the input information of the target person.
Optionally, before extracting the initial study data from the at least one patient medical record of the multi-center, further comprising:
judging whether the interval between the current extraction moment and the last extraction moment reaches a preset extraction time interval or not;
and if the preset extraction time interval is not reached, extracting the initial research data temporarily, otherwise, extracting the initial research data.
Optionally, the preset extraction time interval is determined by a medical load, wherein the calculation formula of the medical load is as follows:
a written text length + b wishlist number + c surgery difficulty + d surgery duration + e clinic duration + f emergency patient number + g severe patient number + h medical operation number,
wherein a, b, c, d, e, f, g and h are coefficients.
The embodiment of the second aspect of the present application provides an extraction apparatus for multi-center research data, including:
a first extraction module to extract initial study data from at least one patient medical record in a plurality of centers;
the standard processing module is used for carrying out standard processing on the initial research data to generate at least one research data and form a standard data set; and
and the second extraction module is used for extracting the multi-center research data from the data meeting the privacy safety inspection conditions in the standard data set.
Optionally, the at least one study data includes one or more patient indicators of symptoms, physical examination, treatment, laboratory examination, imaging examination, and pathology examination.
Optionally, the first extraction module is specifically configured to:
identifying an actual type of each patient medical record;
and when the actual type is the unstructured medical text, performing natural language processing and privacy removal processing on the text data of the patient medical record to obtain the initial research data.
Optionally, before performing natural language processing and privacy elimination processing on the text data of the patient medical record, the first extraction module is further configured to:
detecting missing data in the text data of the patient medical record;
and displaying a reminding signal of the missing data for the target person, and supplementing the missing data by the input information of the target person.
Optionally, before extracting the initial study data from the at least one patient medical record of the multiple centers, the first extraction module is further configured to:
judging whether the interval between the current extraction moment and the last extraction moment reaches a preset extraction time interval or not;
and if the preset extraction time interval is not reached, extracting the initial research data temporarily, otherwise, extracting the initial research data. Optionally, the preset extraction time interval is determined by a medical load, wherein the calculation formula of the medical load is as follows:
a written text length + b wishlist number + c surgery difficulty + d surgery duration + e clinic duration + f emergency patient number + g severe patient number + h medical operation number,
wherein a, b, c, d, e, f, g and h are coefficients.
An embodiment of a third aspect of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the multi-center research data extraction method according to the embodiment.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for extracting multi-center research data as described in the foregoing embodiments.
Therefore, initial research data can be extracted from at least one patient medical record in a multi-center, standard processing is carried out on the initial research data to generate at least one research data, a standard data set is formed, and the multi-center research data is extracted from data meeting privacy safety inspection conditions in the standard data set. Therefore, the data source can be standardized and the standardized heterogeneous data can be extracted, so that more complete and high-quality research data can be obtained.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a method for extracting multi-center research data according to an embodiment of the present disclosure;
FIG. 2 is a flow diagram of a method for extracting multicenter study data according to one embodiment of the present application;
FIG. 3 is a block diagram illustration of an apparatus for multi-center study data extraction according to an embodiment of the present application;
fig. 4 is an exemplary diagram of an electronic device according to an embodiment of the application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The following describes a method, an apparatus, an electronic device, and a storage medium for extracting multi-center research data according to an embodiment of the present application with reference to the drawings. In order to solve the problems that a data source is not standard, the data is heterogeneous and non-standardized, and automatic extraction is difficult, and the like, which are mentioned in the background technology center, the application provides a method for extracting multi-center research data. Therefore, the data source can be standardized and the standardized heterogeneous data can be extracted, so that more complete and high-quality research data can be obtained.
Specifically, fig. 1 is a schematic flow chart of a method for extracting multi-center research data according to an embodiment of the present application.
As shown in fig. 1, the method for extracting multi-center research data includes the following steps:
in step S101, initial study data is extracted from at least one patient medical record in a multi-center.
Optionally, extracting initial study data from at least one patient medical record of the multiple centers comprises: identifying an actual type of each patient medical record; and when the actual type is the unstructured medical text, performing natural language processing and privacy removal processing on the text data of the patient medical record to obtain initial research data.
The actual types of patient medical records can include structured medical text and unstructured medical text (e.g., heterogeneous and non-standardized data), among others.
It should be appreciated that, because the unstructured medical text brings difficulties to automatic extraction, the embodiment of the present application may identify the actual type of each patient medical record, and if the actual type of the patient medical record is identified as the unstructured medical text, natural language processing and privacy-removing processing may be performed on the text data of the patient medical record, so as to obtain the initial research data.
It should be noted that both the natural language processing and the privacy elimination processing may adopt processing methods in related technologies, and detailed description is omitted here to avoid redundancy.
Optionally, in some embodiments, before performing the natural language processing and the de-privacy processing on the text data of the patient medical record, the method further includes: detecting missing data in text data of a patient medical record; and displaying a reminding signal of the missing data for the target person, and supplementing the missing data by the input information of the target person.
It should be understood that a standard data set may be preset in the embodiment of the present application, and before performing natural language processing and privacy removal processing on text data of a patient medical record, the embodiment of the present application may compare a text in the patient medical record with the standard data set, and prompt missing data according to the standard data set, so that a target person (e.g., a doctor) can complete and modify the missing data according to the prompt, and further supplement the missing data.
Optionally, in some embodiments, before extracting the initial study data from the at least one patient medical record of the multiple center, further comprising: judging whether the interval between the current extraction moment and the last extraction moment reaches a preset extraction time interval or not; and if the preset extraction time interval is not reached, extracting the data temporarily, otherwise, extracting the initial research data.
The preset extraction time interval may be a time interval preset by a user, may be a time interval obtained through a limited number of experiments, or may be a time interval obtained through a limited number of computer simulations, which is not specifically limited herein.
Preferably, in some embodiments, the preset extraction time interval may be set individually according to the medical load change, where the calculation formula of the medical load is:
a written text length + b wishlist number + c surgery difficulty + d surgery duration + e clinic duration + f emergency number + g severe number + h dressing change number and other medical operation number,
wherein a, b, c, d, e, f, g and h are coefficients, and the coefficients may be completed according to questionnaires of medical staff, or may be preset, and are not specifically limited herein.
It should be understood that, in the embodiment of the present application, whether an interval between a current extraction time and a last extraction time reaches a preset extraction time interval or not may be determined based on a timer or a timer, if the interval between the current extraction time and the last extraction time reaches the preset extraction time interval, initial research data may be extracted, and if the interval between the current extraction time and the last extraction time does not reach the preset extraction time interval, extraction may not be performed temporarily, so as to avoid resource waste caused by real-time extraction.
It should be noted that, for some special cases, the embodiment of the present application may also perform real-time extraction, that is, the preset extraction time interval is set to 0.
In step S102, the initial research data is subjected to a standard process, at least one research data is generated, and a standard data set is composed.
Optionally, the at least one study data includes one or more patient indicators of symptoms, physical examination, treatment, laboratory examination, imaging examination, and pathology examination.
It should be understood that after obtaining the initial research data based on step S101, the embodiment of the present application may perform standard processing on the initial research data to generate at least one research data. There are many ways of standard processing, such as comparison with standard data.
As a possible implementation manner, the embodiment of the present application may perform standard processing on initial research data, and generate research data as four patient indexes of symptom, physical examination, therapeutic measure, and laboratory test, so that the embodiment of the present application may combine the four patient indexes of symptom, physical examination, therapeutic measure, and laboratory test into a standard data set.
As another possible implementation manner, the present application embodiment may perform standard processing on the initial research data, and generate the research data as six patient indexes of symptom, physical examination, therapeutic measure, laboratory test, image test, and pathological test, so that the present application embodiment may combine the six patient indexes of symptom, physical examination, therapeutic measure, laboratory test, image test, and pathological test into a standard data set.
It should be noted that the above description is only exemplary and not limiting for the present application, and those skilled in the art can make the setting according to the actual situation.
In step S103, multicenter research data is extracted from data satisfying the privacy security check condition in the standard data set.
Specifically, the embodiment of the application can perform the last pre-extraction check and prompt when the patient discharge procedure is handled, and extract the multi-center research data and the privacy security check.
To facilitate further understanding of the method for extracting the multi-center research data according to the embodiment of the present application, the following description is made in detail with reference to fig. 2.
Fig. 2 is a schematic flow chart of a method for extracting multi-center research data according to an embodiment of the present application, as shown in fig. 2.
The method comprises the steps of identifying an unstructured medical text, carrying out natural language processing and privacy removal processing on the unstructured medical text to obtain initial research data, carrying out standard processing on the initial research data to form a standard data set, feeding the standard data set back to the unstructured medical text, and carrying out formal extraction and privacy safety check after the standard is met.
According to the method for extracting the multi-center research data, provided by the embodiment of the application, the initial research data can be extracted from at least one patient medical record in the multi-center, the initial research data is subjected to standard processing to generate at least one research data, a standard data set is formed, and the multi-center research data is extracted from the data meeting privacy safety inspection conditions in the standard data set. Therefore, the data source can be standardized and the standardized heterogeneous data can be extracted, so that more complete and high-quality research data can be obtained.
Next, an extraction device of multicenter research data proposed according to an embodiment of the present application is described with reference to the drawings.
Fig. 3 is a block diagram of an apparatus for extracting multicenter research data according to an embodiment of the present application.
As shown in fig. 3, the apparatus 10 for extracting multicenter research data includes: a first extraction module 100, a standard processing module 200 and a second extraction module 300.
Wherein the first extraction module 100 is configured to extract initial study data from at least one patient medical record in a multiple center;
the standard processing module 200 is configured to perform standard processing on the initial research data to generate at least one research data, and form a standard data set; and
the second extraction module 300 is configured to extract the multicenter research data from the data satisfying the privacy security check condition in the standard data set.
Optionally, the at least one study data includes one or more patient indicators of symptoms, physical examination, treatment, laboratory examination, imaging examination, and pathology examination.
Optionally, the first extraction module is specifically configured to:
identifying an actual type of each patient medical record;
and when the actual type is the unstructured medical text, performing natural language processing and privacy removal processing on the text data of the patient medical record to obtain initial research data.
Optionally, before performing natural language processing and privacy elimination processing on the text data of the patient medical record, the first extraction module is further configured to:
detecting missing data in text data of a patient medical record;
and displaying a reminding signal of the missing data for the target person, and supplementing the missing data by the input information of the target person.
Optionally, before extracting the initial study data from the at least one patient medical record of the multi-center, the first extraction module is further configured to:
judging whether the interval between the current extraction moment and the last extraction moment reaches a preset extraction time interval or not;
and if the preset extraction time interval is not reached, extracting the data temporarily, otherwise, extracting the initial research data.
Optionally, the preset extraction time interval is determined by a medical load, wherein the calculation formula of the medical load is as follows:
a written text length + b wishlist number + c surgery difficulty + d surgery duration + e clinic duration + f emergency patient number + g severe patient number + h medical operation number,
wherein a, b, c, d, e, f, g and h are coefficients.
It should be noted that the foregoing explanation of the embodiment of the method for extracting multicenter research data is also applicable to the apparatus for extracting multicenter research data of this embodiment, and is not repeated here.
According to the device for extracting the multi-center research data, the initial research data can be extracted from at least one patient medical record in the multi-center, the initial research data is subjected to standard processing to generate at least one research data, a standard data set is formed, and the multi-center research data is extracted from the data meeting privacy safety inspection conditions in the standard data set. Therefore, the data source can be standardized and the standardized heterogeneous data can be extracted, so that more complete and high-quality research data can be obtained.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
memory 401, processor 402, and computer programs stored on memory 401 and executable on processor 402.
The processor 402, when executing the program, implements the method of extracting multicenter research data provided in the above-described embodiments.
Further, the electronic device further includes:
a communication interface 403 for communication between the memory 401 and the processor 402.
A memory 401 for storing computer programs operable on the processor 402.
Memory 401 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 401, the processor 402 and the communication interface 403 are implemented independently, the communication interface 403, the memory 401 and the processor 402 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
Optionally, in a specific implementation, if the memory 401, the processor 402, and the communication interface 403 are integrated on a chip, the memory 401, the processor 402, and the communication interface 403 may complete mutual communication through an internal interface.
The processor 402 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the method of extracting multicenter research data as above.
In the description of the present specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A method for extracting multi-center research data is characterized by comprising the following steps:
extracting initial study data from at least one patient medical record from a plurality of centers;
performing standard processing on the initial research data to generate at least one research data and form a standard data set; and
and extracting the multi-center research data from the data meeting the privacy safety inspection conditions in the standard data set.
2. The method of claim 1, wherein the at least one study data includes one or more patient indicators of symptoms, physical examination, treatment, laboratory examination, imaging examination, and pathology examination.
3. The method of claim 1, wherein extracting initial study data from at least one patient medical record in the multicenter comprises:
identifying an actual type of each patient medical record;
and when the actual type is the unstructured medical text, performing natural language processing and privacy removal processing on the text data of the patient medical record to obtain the initial research data.
4. The method of claim 3, further comprising, prior to performing natural language processing and de-privacy processing on the textual data of the patient medical record:
detecting missing data in the textual data of the patient medical record;
and displaying a reminding signal of the missing data for the target person, and supplementing the missing data by the input information of the target person.
5. The method of claim 1, further comprising, prior to extracting the initial study data from the at least one patient medical record in the multiple center:
judging whether the interval between the current extraction moment and the last extraction moment reaches a preset extraction time interval or not;
and if the preset extraction time interval is not reached, extracting the data temporarily, otherwise extracting the initial research data.
6. The method according to claim 5, wherein the preset extraction time interval is determined by a medical load, wherein the medical load is calculated by the formula:
a written text length + b wishlist number + c surgery difficulty + d surgery duration + e clinic duration + f emergency patient number + g severe patient number + h medical operation number,
wherein a, b, c, d, e, f, g and h are coefficients.
7. An apparatus for extracting multicenter research data, comprising:
a first extraction module to extract initial study data from at least one patient medical record in a plurality of centers;
the standard processing module is used for carrying out standard processing on the initial research data to generate at least one research data and form a standard data set; and
and the second extraction module is used for extracting the multi-center research data from the data meeting the privacy safety inspection conditions in the standard data set.
8. The apparatus of claim 7, wherein the at least one study data includes one or more patient indicators of a symptom, a physical examination, a treatment, a laboratory examination, an imaging examination, and a pathology examination.
9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of extracting multicenter study data as claimed in any one of claims 1-5.
10. A computer-readable storage medium, on which a computer program is stored, the program being executable by a processor for implementing the method for extracting multicenter study data according to any of claims 1-5.
CN202210204375.4A 2022-03-03 2022-03-03 Multi-center research data extraction method and device, electronic equipment and storage medium Pending CN114741438A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210204375.4A CN114741438A (en) 2022-03-03 2022-03-03 Multi-center research data extraction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210204375.4A CN114741438A (en) 2022-03-03 2022-03-03 Multi-center research data extraction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114741438A true CN114741438A (en) 2022-07-12

Family

ID=82275967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210204375.4A Pending CN114741438A (en) 2022-03-03 2022-03-03 Multi-center research data extraction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114741438A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060052945A1 (en) * 2004-09-07 2006-03-09 Gene Security Network System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data
CN103761697A (en) * 2014-01-20 2014-04-30 中国中医科学院 Scientific research data generation and patient privacy protection system based on electronic medical record
CN108986919A (en) * 2018-07-19 2018-12-11 清华大学 A kind of processing method and processing device of medical data
CN113345545A (en) * 2021-07-28 2021-09-03 北京惠每云科技有限公司 Clinical data checking method and device, electronic equipment and readable storage medium
CN113821510A (en) * 2021-08-30 2021-12-21 山东健康医疗大数据有限公司 Method and system for realizing conversion from multi-source heterogeneous data to FHIR standard

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060052945A1 (en) * 2004-09-07 2006-03-09 Gene Security Network System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data
CN103761697A (en) * 2014-01-20 2014-04-30 中国中医科学院 Scientific research data generation and patient privacy protection system based on electronic medical record
CN108986919A (en) * 2018-07-19 2018-12-11 清华大学 A kind of processing method and processing device of medical data
CN113345545A (en) * 2021-07-28 2021-09-03 北京惠每云科技有限公司 Clinical data checking method and device, electronic equipment and readable storage medium
CN113821510A (en) * 2021-08-30 2021-12-21 山东健康医疗大数据有限公司 Method and system for realizing conversion from multi-source heterogeneous data to FHIR standard

Similar Documents

Publication Publication Date Title
CN109741804B (en) Information extraction method and device, electronic equipment and storage medium
CN110136788B (en) Medical record quality inspection method, device, equipment and storage medium based on automatic detection
US11170499B2 (en) Method and device for the automated evaluation of at least one image data record recorded with a medical image recording device, computer program and electronically readable data carrier
US8548823B2 (en) Automatically determining ideal treatment plans for complex neuropsychiatric conditions
CN110827941B (en) Electronic medical record information correction method and system
US7996242B2 (en) Automatically developing neuropsychiatric treatment plans based on neuroimage data
JP2018060529A (en) Method and apparatus of context-based patient similarity
CN111833984B (en) Medicine quality control analysis method, device, equipment and medium based on machine learning
JP2015527648A (en) Automated clinical evidence sheet workflow
US20090316969A1 (en) Determining efficacy of therapeutic intervention in neurosychiatric disease
CN109727651A (en) Epilepsy cases data base management method and terminal device
CN112951414A (en) Primary medical clinical assistant decision-making system
CN114155949A (en) Examination and verification method, device and equipment for first page of medical record
CN116665883A (en) Method and system for predicting risk of neurodegenerative disease
Hansen et al. Assigning diagnosis codes using medication history
CN113678147B (en) Search method and information processing system
CN112802598A (en) Real-time auxiliary diagnosis and treatment method and system based on voice diagnosis and treatment data
CN114741438A (en) Multi-center research data extraction method and device, electronic equipment and storage medium
CN115631823A (en) Similar case recommendation method and system
CN114678092A (en) Nursing recording method, terminal, system and storage medium
CN111243750B (en) Method and device for identifying pregnancy status of patient in multiple modes
CN114155968A (en) Method for establishing mapping relation, and method and equipment for auditing surgical operation
CN113674827A (en) Electronic medical record generation method and device, electronic equipment and computer readable medium
CN113571179A (en) Index extraction method and device based on knowledge graph
CN114743683A (en) Method and device for normatively developing clinical research, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination