CN113138980A - Data processing method, device, terminal and storage medium - Google Patents

Data processing method, device, terminal and storage medium Download PDF

Info

Publication number
CN113138980A
CN113138980A CN202110522186.7A CN202110522186A CN113138980A CN 113138980 A CN113138980 A CN 113138980A CN 202110522186 A CN202110522186 A CN 202110522186A CN 113138980 A CN113138980 A CN 113138980A
Authority
CN
China
Prior art keywords
data
identity
disease
repeated
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110522186.7A
Other languages
Chinese (zh)
Inventor
王成
王雅洁
赵培祯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dermatology Hospital Of Southern Medical University
Original Assignee
Dermatology Hospital Of Southern Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dermatology Hospital Of Southern Medical University filed Critical Dermatology Hospital Of Southern Medical University
Priority to CN202110522186.7A priority Critical patent/CN113138980A/en
Publication of CN113138980A publication Critical patent/CN113138980A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the invention discloses a data processing method, a device, a terminal and a storage medium, which are applied to report data of diseases, and the method comprises the following steps: acquiring a plurality of original report data of the same disease; converting each identity characteristic of the patient in each original report data into a preset character string by a random algorithm to obtain data to be processed; processing all data to be processed based on a plurality of preset rules to determine repeated data; removing the data with the earliest disease reporting time in the repeated data, and setting other removed data as final repeated data; and associating the final repeated data with the corresponding relation and then respectively storing. According to the scheme, the conversion is carried out after the original report data is acquired, so that the information of the patient cannot be directly seen in the subsequent data duplicate checking process, the privacy of the patient is protected, the duplicate checking mode rule is beneficial to automatic processing, the data processing efficiency is improved, and the large-scale data processing requirement can be met.

Description

Data processing method, device, terminal and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for data processing.
Background
At present, the conditions of a plurality of infectious diseases need to be reported and registered; however, many infectious diseases, such as syphilis cases, are specific, and particularly, even after regular treatment of syphilis, both syphilis-specific antibodies and non-specific antibodies of syphilis can be displayed and detected to be positive for life, so that multiple rechecks are required, and therefore, patients can have multiple times of treatment, secondary treatment, multiple times of treatment, annual treatment and the like.
The problem of repeated reports can occur when patients see a plurality of times, at the same time, the most cases of syphilis are reported at present in a comprehensive hospital, the cases of the comprehensive hospital mainly come from preoperative screening, and a plurality of doctors in the preoperative screening process are professional doctors in non-skin disease departments, so that the problem that the standard of syphilis report is not clear and the syphilis is easily reported again is caused.
Specifically, the re-reporting of syphilis, i.e., the repeated reporting of syphilis, refers to the phenomenon that the same case is reported 2 times or more after re-infection is excluded. All cases reported in the same diagnosis after the 1 st report of the same case are re-reported after re-infection is eliminated. The re-reporting phenomenon of syphilis is not beneficial to accurately monitoring the infectious disease and exactly mastering the actual number of the infected persons. Repeated reports of syphilis will directly affect the judgment of the true level of the epidemic and further affect the government's decisions.
For such a situation, some methods for checking duplicate currently exist, such as manual duplicate checking and manual screening by using Excel (a software in the software suite of microsoft corporation for processing data) table, but this method has a large workload, is time-consuming and labor-consuming, is prone to error, cannot meet the needs of large-scale data processing, and in addition, in the process of data processing, the information of the patient is directly displayed, which is not beneficial to the privacy protection of the patient.
For this reason, there is a need for a better solution to the problems of the prior art.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a terminal and a storage medium for data processing, and the method, the apparatus, the terminal and the storage medium convert after acquiring original report data, so that in a subsequent data duplicate checking process, information of a patient cannot be directly seen, which is beneficial to protecting privacy of the patient, and the duplicate checking method is regulated, which is beneficial to automation, improves data processing efficiency, and can meet large-scale data processing requirements.
Specifically, the present invention proposes the following specific examples:
the embodiment of the invention provides a data processing method, which is applied to report data of diseases and comprises the following steps:
acquiring a plurality of original report data of the same disease;
converting each identity characteristic of the patient in each original report data into a preset character string by a random algorithm to obtain data to be processed, and establishing a corresponding relation between the identity characteristic and the preset character string; the preset character strings are different after different identity characteristics are converted;
processing all the data to be processed based on a plurality of preset rules to determine repeated data; each preset rule is generated based on the identity characteristics of the patient and the unique characteristics of the disease, and the identity characteristics comprise the unique identity characteristics of the patient and/or other identity characteristics except the unique identity characteristics;
removing the data with the earliest disease reporting time in the repeated data, and setting other removed data as final repeated data;
and associating the final repeated data with the corresponding relation and then respectively storing.
In a specific embodiment, obtaining multiple raw report data for the same disease includes:
acquiring a plurality of original data of the same disease;
performing data cleaning on all the original data to remove the original data which does not comprise the identity features or the unique features of the diseases;
setting the remaining original data as original report data.
In a specific embodiment, the type of identity feature comprises any combination of one or more of the following: name, certificate number, gender, age, address, contact, birth time;
when the disease is syphilis, the disease is characterized by stages of syphilis.
In a specific embodiment, when the identity is a name, the conversion is: randomly converting all Chinese characters in the name into preset character strings; converting each Chinese character in the name into pinyin, and randomly converting each letter in the pinyin into a preset character; converting different Chinese characters into different preset character strings, and converting different letters into different preset characters;
when the identity feature is other than name, the conversion is: and converting all the other characteristics into preset character strings at random.
In a specific embodiment, when the preset character strings after the whole conversion of the Chinese characters in the name are the same, or each preset character obtained after the pinyin conversion of the Chinese characters in the name is the same, the identity characteristic of the name is determined to be the same.
In a specific embodiment, the "processing the data to be processed based on a plurality of preset rules to determine duplicate data" includes:
if the unique characteristics of the diseases in the two or more data to be processed are the same and the unique characteristics of the identities of the patients in the two or more data to be processed are the same, determining the two or more data to be processed as initial repeated data;
if the unique characteristics of diseases in two or more data to be processed are the same, and the other identity characteristics of which the quantity exceeds a preset value in the two or more data to be processed are the same, determining that the two or more data to be processed are initial repeated data;
and summarizing all the initial repeated data and performing deduplication to obtain repeated data.
The embodiment of the invention also provides a data processing device, which is applied to report data of diseases and comprises the following components:
the acquisition module is used for acquiring a plurality of original report data of the same disease;
the conversion module is used for converting each identity characteristic of the patient in each original report data into a preset character string by a random algorithm to obtain data to be processed and establishing a corresponding relation between the identity characteristic and the preset character string; the preset character strings are different after different identity characteristics are converted;
the processing module is used for processing all the data to be processed based on a plurality of preset rules so as to determine repeated data; each preset rule is generated based on the identity characteristics of the patient and the unique characteristics of the disease, and the identity characteristics comprise the unique identity characteristics of the patient and/or other identity characteristics except the unique identity characteristics;
the removing module is used for removing the data with the earliest disease reporting time in the repeated data and setting other removed data as final repeated data;
and the storage module is used for associating the final repeated data with the corresponding relation and then respectively storing the associated repeated data.
In a specific embodiment, the obtaining module is configured to: acquiring a plurality of original data of the same disease; performing data cleaning on all the original data to remove the original data which does not comprise the identity features or the unique features of the diseases; setting the remaining original data as original report data.
The embodiment of the invention also provides a terminal, which comprises a processor and a memory, wherein an application program is stored in the memory, and the application program executes the data processing method when running on the processor.
The embodiment of the present invention further provides a storage medium, where an application program is stored in the storage medium, and the application program executes the data processing method when running on a processor.
Therefore, the embodiment of the invention discloses a data processing method, a device, a terminal and a storage medium, which are applied to report data of diseases, and the method comprises the following steps: acquiring a plurality of original report data of the same disease; converting each identity characteristic of the patient in each original report data into a preset character string by a random algorithm to obtain data to be processed, and establishing a corresponding relation between the identity characteristic and the preset character string; the preset character strings are different after different identity characteristics are converted; processing all the data to be processed based on a plurality of preset rules to determine repeated data; each preset rule is generated based on the identity characteristics of the patient and the unique characteristics of the disease, and the identity characteristics comprise the unique identity characteristics of the patient and/or other identity characteristics except the unique identity characteristics; removing the data with the earliest disease reporting time in the repeated data, and setting other removed data as final repeated data;
and associating the final repeated data with the corresponding relation and then respectively storing. According to the scheme, the conversion is carried out after the original report data is acquired, so that the information of the patient cannot be directly seen in the subsequent data duplicate checking process, the privacy of the patient is protected, the duplicate checking mode rule is beneficial to automatic processing, the data processing efficiency is improved, and the large-scale data processing requirement can be met.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.
FIG. 1 shows a flow diagram of a method of data processing;
FIG. 2 is a schematic diagram of a data processing apparatus;
fig. 3 shows a schematic structural diagram of a terminal;
fig. 4 shows a schematic structural diagram of a storage medium.
Illustration of the drawings:
201-an acquisition module; 202-a conversion module; 203-a processing module; 204-a culling module;
205-storage module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.
Example 1
Embodiment 1 of the present invention discloses a data processing method, which is applied to report data of diseases, and as shown in fig. 1, the method includes the following steps:
step S101, acquiring a plurality of original report data of the same disease;
specifically, the step S101 of obtaining the original report data of the same disease includes:
acquiring a plurality of original data of the same disease; performing data cleansing on all of the raw data to remove the raw data that does not include patient identity or unique characteristics of the disease; setting the remaining original data as original report data.
Specifically, for example, in the case where the disease is syphilis, the raw data may be downloaded from, for example, a national infectious disease report information management system, or downloaded from another data source.
The syphilis needs to include unique characteristics of syphilis diseases, such as stage characteristics of syphilis, and particularly, considering that syphilis cases have particularity, syphilis can be divided into first-stage syphilis (corresponding to ulcer or chancre at an infected part), second-stage syphilis (corresponding to rash, skin mucosa lesion, lymph node lesion and the like), and third-stage syphilis (corresponding to heart lesion or gummy swelling), so that the stage characteristics can be used as the unique characteristics of the syphilis diseases.
In addition, the report data also needs to include identity information, so that the original data is cleaned, the original data which does not include the identity characteristics or unique characteristics of the disease of the patient is removed, and the original data which remains after the removal operation is used as the original report data.
Furthermore, the type of identity feature comprises any combination of one or more of: name, certificate number, gender, age, address, contact, birth time; when the disease is syphilis, the disease is characterized by stages of syphilis.
Step S102, converting each identity characteristic of the patient in each original report data into a preset character string by a random algorithm to obtain data to be processed, and establishing a corresponding relation between the identity characteristic and the preset character string; the preset character strings are different after different identity characteristics are converted;
specifically, when the identity feature is a name, the conversion is: randomly converting all Chinese characters in the name into preset character strings; converting each Chinese character in the name into pinyin, and randomly converting each letter in the pinyin into a preset character; converting different Chinese characters into different preset character strings, and converting different letters into different preset characters;
for example, when the name of the patient is "Zhang III", the "Zhang III" is generally converted into a preset character string, for example, into Asc; in addition, the pinyin of "zhang san" is "zhang san", and is converted into a preset character for each letter, for example, "z" is converted into "a", "h" is converted into "S", and the like.
For another example, when the last name of the patient is "Liquan", the "Liquan" population is converted into a preset character string, for example, into sdc; in addition, the pinyin of "lie four" is "li si", and for each letter, the pinyin is converted into a preset character, for example, "l" is converted into "g", "i" is converted into "o", and the like.
And performing integral conversion on the specific identity characteristics of other types, such as the identification number, age and the like, and using the converted preset character string for uniquely identifying the identity characteristics before conversion.
When the identity feature is other than name, the conversion is: and converting all the other characteristics into preset character strings at random. Specific other features then perform an overall conversion, such as performing an overall conversion of gender "male" to "P"; converting the gender of the female into a U; for example, the birth time is converted into a predetermined string.
After the conversion, when the preset character strings after the whole conversion of the Chinese characters in the name are the same or each preset character obtained after the pinyin conversion of the Chinese characters in the name is the same, the identity characteristic of the name is determined to be the same.
Specifically, the name feature is still described as "zhang san", and if the names are all "zhang san" or the pinyin is "zhang san", it means that the feature of the name is the same as that of the body.
This is because it is considered that misleading is likely to occur due to the speech rate, accent, and the like when voice communication is performed in an actual process.
Step S103, processing all the data to be processed based on a plurality of preset rules to determine repeated data; each preset rule is generated based on the identity characteristics of the patient and the unique characteristics of the disease, and the identity characteristics comprise the unique identity characteristics of the patient and/or other identity characteristics except the unique identity characteristics;
specifically, the step S103 of processing all the to-be-processed data based on a plurality of preset rules to determine the repeated data includes: if the unique characteristics of the diseases in the two or more data to be processed are the same and the unique characteristics of the identities of the patients in the two or more data to be processed are the same, determining the two or more data to be processed as initial repeated data; if the unique characteristics of diseases in two or more data to be processed are the same, and the other identity characteristics of which the quantity exceeds a preset value in the two or more data to be processed are the same, determining that the two or more data to be processed are initial repeated data; and summarizing all the initial repeated data and performing deduplication to obtain repeated data.
In one particular embodiment, reference is made to table 1 below:
TABLE 1
Figure BDA0003064470770000091
Figure BDA0003064470770000101
Note: check represents agreement, and O represents homophonic and different characters. The reported cases may differ in age by ± 1 year within the same year.
Based on the characteristics of each type in table 1, each condition corresponds to a preset rule, and any one of the following conditions is satisfied to be regarded as repeated data in a specific processing process.
1. Selecting case data with the identity card number consistent with the syphilis staging in a database, wherein the selected case data is repeated data, and adding a label on the database for identification;
2. selecting the case data with completely consistent or basically consistent names (homonymous characters and different characters), gender and age (the reported cases can differ by +/-1 year in the same year), and consistent telephone numbers and syphilis stages from the database, wherein the selected case data is the repeated data, and adding a label on the database for identification;
3. selecting the case data with completely consistent or basically consistent names (same tone and different characters), sex, age (the reported cases can differ by +/-1 year in the same year), current addresses (particularly to the villages and towns and street levels) and consistent syphilis stages in a database, wherein the selected case data is the repeated data, and adding a label on the database for marking;
4. selecting the case data with completely consistent or basically consistent names (homophonic different characters) and consistent gender, birth date and syphilis staging in a database, wherein the selected case data is repeated data, and adding a label on the database for identification;
5. selecting sex, telephone, age (the reported cases in the same year can differ by +/-1 year) and syphilis staged case data from a database, wherein the selected case data is repeated data, and adding a label on the database for identification;
and integrating the tags obtained by the 5 standards, and deleting repeated data.
Specifically, the repetition determination may be performed based on other features and rules than the above features and rules.
S104, eliminating the data with the earliest disease reporting time in the repeated data, and setting other eliminated data as final repeated data;
specifically, the repeated data includes the time for reporting a disease, that is, the time for reporting a disease, and after the duplicate checking of the scheme, the first case of the disease reporting data also appears in the repeated data, so that the time for reporting a disease needs to be excluded from the first disease reporting data in the repeated cases, and the remaining repeated data is the final repeated data.
And S105, associating the final repeated data with the corresponding relation and then respectively storing the data.
Specifically, after the final repeated data is obtained, in a technical scene needing to be applied, the final repeated data needs to be restored to original data, and the restoration can be performed based on the corresponding relationship.
In addition, after the final repeated data is obtained, other analysis processing can be performed, for example, statistical analysis can be performed on the final repeated data according to factors such as year, area and the like. In addition, the final repeated data can be handed over to various cities for rechecking the data.
Example 2
Embodiment 1 of the present invention discloses a data processing apparatus, which is applied to report data of diseases, and as shown in fig. 2, the apparatus includes:
an obtaining module 201, configured to obtain multiple original report data of the same disease;
a conversion module 202, configured to convert each identity feature of the patient in each original report data into a preset character string by using a random algorithm, obtain data to be processed, and establish a corresponding relationship between the identity feature and the preset character string; the preset character strings are different after different identity characteristics are converted;
the processing module 203 is configured to process all the to-be-processed data based on a plurality of preset rules to determine duplicate data; each preset rule is generated based on the identity characteristics of the patient and the unique characteristics of the disease, and the identity characteristics comprise the unique identity characteristics of the patient and/or other identity characteristics except the unique identity characteristics;
the removing module 204 is configured to remove the data with the earliest reported time in the repeated data, and set the other removed data as final repeated data;
and the storage module 205 is configured to associate the final duplicate data with the corresponding relationship and then store the final duplicate data respectively.
In a specific embodiment, the obtaining module 201 is configured to: acquiring a plurality of original data of the same disease; performing data cleaning on all the original data to remove the original data which does not comprise the identity features or the unique features of the diseases; setting the remaining original data as original report data.
In a specific embodiment, the type of identity feature comprises any combination of one or more of the following: name, certificate number, gender, age, address, contact, birth time; when the disease is syphilis, the disease is characterized by stages of syphilis.
In a specific embodiment, when the identity is a name, the conversion is: randomly converting all Chinese characters in the name into preset character strings; converting each Chinese character in the name into pinyin, and randomly converting each letter in the pinyin into a preset character; converting different Chinese characters into different preset character strings, and converting different letters into different preset characters; when the identity feature is other than name, the conversion is: and converting all the other characteristics into preset character strings at random.
In a specific embodiment, when the preset character strings after the whole conversion of the Chinese characters in the name are the same, or each preset character obtained after the pinyin conversion of the Chinese characters in the name is the same, the identity characteristic of the name is determined to be the same.
In a specific embodiment, the processing module 203 is configured to: if the unique characteristics of the diseases in the two or more data to be processed are the same and the unique characteristics of the identities of the patients in the two or more data to be processed are the same, determining the two or more data to be processed as initial repeated data; if the unique characteristics of diseases in two or more data to be processed are the same, and the other identity characteristics of which the quantity exceeds a preset value in the two or more data to be processed are the same, determining that the two or more data to be processed are initial repeated data; and summarizing all the initial repeated data and performing deduplication to obtain repeated data.
Example 3
Embodiment 3 of the present invention further discloses a terminal, as shown in fig. 3, including a processor and a memory, where the memory stores an application program, and the application program executes the data processing method described in embodiment 1 when running on the processor.
Example 4
Embodiment 4 of the present invention further discloses a storage medium, as shown in fig. 4, where an application program is stored in the storage medium, and the application program executes the data processing method described in embodiment 1 when running on a processor.
Therefore, the embodiment of the invention discloses a data processing method, a device, a terminal and a storage medium, which are applied to report data of diseases, and the method comprises the following steps: acquiring a plurality of original report data of the same disease; converting each identity characteristic of the patient in each original report data into a preset character string by a random algorithm to obtain data to be processed, and establishing a corresponding relation between the identity characteristic and the preset character string; the preset character strings are different after different identity characteristics are converted; processing all the data to be processed based on a plurality of preset rules to determine repeated data; each preset rule is generated based on the identity characteristics of the patient and the unique characteristics of the disease, and the identity characteristics comprise the unique identity characteristics of the patient and/or other identity characteristics except the unique identity characteristics; removing the data with the earliest disease reporting time in the repeated data, and setting other removed data as final repeated data;
and associating the final repeated data with the corresponding relation and then respectively storing. According to the scheme, the conversion is carried out after the original report data is acquired, so that the information of the patient cannot be directly seen in the subsequent data duplicate checking process, the privacy of the patient is protected, the duplicate checking mode rule is beneficial to automatic processing, the data processing efficiency is improved, and the large-scale data processing requirement can be met.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims (10)

1. A method of data processing, applied to report data on a disease, the method comprising:
acquiring a plurality of original report data of the same disease;
converting each identity characteristic of the patient in each original report data into a preset character string by a random algorithm to obtain data to be processed, and establishing a corresponding relation between the identity characteristic and the preset character string; the preset character strings are different after different identity characteristics are converted;
processing all the data to be processed based on a plurality of preset rules to determine repeated data; each preset rule is generated based on the identity characteristics of the patient and the unique characteristics of the disease, and the identity characteristics comprise the unique identity characteristics of the patient and/or other identity characteristics except the unique identity characteristics;
removing the data with the earliest disease reporting time in the repeated data, and setting other removed data as final repeated data;
and associating the final repeated data with the corresponding relation and then respectively storing.
2. The method of claim 1, wherein obtaining multiple raw report data for the same disease comprises:
acquiring a plurality of original data of the same disease;
performing data cleaning on all the original data to remove the original data which does not comprise the identity features or the unique features of the diseases;
setting the remaining original data as original report data.
3. The method of claim 1, wherein the type of identity feature comprises any combination of one or more of: name, certificate number, gender, age, address, contact, birth time;
when the disease is syphilis, the disease is characterized by stages of syphilis.
4. The method of claim 1, wherein when the identity feature is a name, the converting is to: randomly converting all Chinese characters in the name into preset character strings; converting each Chinese character in the name into pinyin, and randomly converting each letter in the pinyin into a preset character; converting different Chinese characters into different preset character strings, and converting different letters into different preset characters;
when the identity feature is other than name, the conversion is: and converting all the other characteristics into preset character strings at random.
5. The method as claimed in claim 4, wherein the identity of the name is determined to be the same when the predetermined character string after the entire conversion of the chinese characters in the name is the same or each predetermined character obtained after the pinyin conversion of the chinese characters in the name is the same.
6. The method according to claim 1, wherein the "processing all the data to be processed based on a plurality of preset rules to determine duplicate data" comprises:
if the unique characteristics of the diseases in the two or more data to be processed are the same and the unique characteristics of the identities of the patients in the two or more data to be processed are the same, determining the two or more data to be processed as initial repeated data;
if the unique characteristics of diseases in two or more data to be processed are the same, and the other identity characteristics of which the quantity exceeds a preset value in the two or more data to be processed are the same, determining that the two or more data to be processed are initial repeated data;
and summarizing all the initial repeated data and performing deduplication to obtain repeated data.
7. An apparatus for data processing, applied to report data of a disease, the apparatus comprising:
the acquisition module is used for acquiring a plurality of original report data of the same disease;
the conversion module is used for converting each identity characteristic of the patient in each original report data into a preset character string by a random algorithm to obtain data to be processed and establishing a corresponding relation between the identity characteristic and the preset character string; the preset character strings are different after different identity characteristics are converted;
the processing module is used for processing all the data to be processed based on a plurality of preset rules so as to determine repeated data; each preset rule is generated based on the identity characteristics of the patient and the unique characteristics of the disease, and the identity characteristics comprise the unique identity characteristics of the patient and/or other identity characteristics except the unique identity characteristics;
the removing module is used for removing the data with the earliest disease reporting time in the repeated data and setting other removed data as final repeated data;
and the storage module is used for associating the final repeated data with the corresponding relation and then respectively storing the associated repeated data.
8. The apparatus of claim 7, wherein the acquisition module is to:
acquiring a plurality of original data of the same disease;
performing data cleaning on all the original data to remove the original data which does not comprise the identity features or the unique features of the diseases;
setting the remaining original data as original report data.
9. A terminal, characterized in that it comprises a processor and a memory, in which an application program is stored, which, when run on the processor, performs the method of data processing according to any one of claims 1 to 6.
10. A storage medium, in which an application program is stored, which, when run on a processor, performs the method of data processing according to any one of claims 1-6.
CN202110522186.7A 2021-05-13 2021-05-13 Data processing method, device, terminal and storage medium Pending CN113138980A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110522186.7A CN113138980A (en) 2021-05-13 2021-05-13 Data processing method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110522186.7A CN113138980A (en) 2021-05-13 2021-05-13 Data processing method, device, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN113138980A true CN113138980A (en) 2021-07-20

Family

ID=76817400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110522186.7A Pending CN113138980A (en) 2021-05-13 2021-05-13 Data processing method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN113138980A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778021A (en) * 2016-12-31 2017-05-31 深圳市前海康启源科技有限公司 Medical diagnosis information management system and method
CN106815336A (en) * 2016-12-31 2017-06-09 深圳市易特科信息技术有限公司 Medical data inquiry system and method based on big data associated storage
CN106844723A (en) * 2017-02-10 2017-06-13 厦门大学 medical knowledge base construction method based on question answering system
CN107085666A (en) * 2017-05-24 2017-08-22 山东大学 Disease risks are assessed and personalized health report preparing system and method
CN107358040A (en) * 2017-07-01 2017-11-17 深圳市前海安测信息技术有限公司 Slow disease patient health check system and method based on doctor's interrogation
CN107656966A (en) * 2017-08-28 2018-02-02 深圳市诚壹科技有限公司 The method and server of a kind of processing data
CN109522746A (en) * 2018-11-07 2019-03-26 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN111312367A (en) * 2020-05-11 2020-06-19 成都派沃智通科技有限公司 Campus personnel abnormal psychological prediction method based on self-adaptive cloud management platform
CN111816318A (en) * 2020-07-16 2020-10-23 山东大学 Heart disease data queue generation method and risk prediction system
CN112585596A (en) * 2018-06-25 2021-03-30 易享信息技术有限公司 System and method for investigating relationships between entities

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778021A (en) * 2016-12-31 2017-05-31 深圳市前海康启源科技有限公司 Medical diagnosis information management system and method
CN106815336A (en) * 2016-12-31 2017-06-09 深圳市易特科信息技术有限公司 Medical data inquiry system and method based on big data associated storage
CN106844723A (en) * 2017-02-10 2017-06-13 厦门大学 medical knowledge base construction method based on question answering system
CN107085666A (en) * 2017-05-24 2017-08-22 山东大学 Disease risks are assessed and personalized health report preparing system and method
CN107358040A (en) * 2017-07-01 2017-11-17 深圳市前海安测信息技术有限公司 Slow disease patient health check system and method based on doctor's interrogation
CN107656966A (en) * 2017-08-28 2018-02-02 深圳市诚壹科技有限公司 The method and server of a kind of processing data
CN112585596A (en) * 2018-06-25 2021-03-30 易享信息技术有限公司 System and method for investigating relationships between entities
CN109522746A (en) * 2018-11-07 2019-03-26 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN111312367A (en) * 2020-05-11 2020-06-19 成都派沃智通科技有限公司 Campus personnel abnormal psychological prediction method based on self-adaptive cloud management platform
CN111816318A (en) * 2020-07-16 2020-10-23 山东大学 Heart disease data queue generation method and risk prediction system

Similar Documents

Publication Publication Date Title
CN108876636B (en) Intelligent air control method, system, computer equipment and storage medium for claim settlement
CN113934895A (en) Method for assisting in establishing patient main index
CN103473375A (en) Data cleaning method and data cleaning system
CN109637605B (en) Electronic medical record structuring method and computer-readable storage medium
CN111785341A (en) Patient main index data merging method and device based on similarity
CN110752027B (en) Electronic medical record data pushing method, device, computer equipment and storage medium
Gill OX-LINK: the Oxford medical record linkage system
CN111369334A (en) Salary calculation method and system
CN111046882A (en) Disease name standardization method and system based on profile hidden Markov model
CN113138980A (en) Data processing method, device, terminal and storage medium
Demir et al. Emergency readmission criterion: a technique for determining the emergency readmission time window
CN110929009A (en) Method and device for acquiring new words
CN113468186A (en) Data table primary key association method and device, computer equipment and readable storage medium
CN112365243B (en) Subject creation method and device and computer equipment
CN114996472A (en) Sample optimization method and system based on relation extraction model
CN111708930A (en) Early warning method and device and computer readable storage medium
WO2014091481A1 (en) System and method for determining by an external entity the human hierarchial structure of an organization, using public social networks
CN113448933B (en) Service data processing method, device, electronic equipment and medium
CN114661779A (en) Person-specific calculation and analysis method, system, and storage medium
CN116633933B (en) Medical equipment information management system based on block chain
CN113344744B (en) Personalized business function calculation method and device for power system
CN113890756B (en) Method, device, medium and computing equipment for detecting confusion of user account
CN116402050B (en) Address normalization and supplement method and device, electronic equipment and storage medium
CN114445034A (en) Civil internal data correlation comparison method
CN114579674A (en) Method and system for judging user survival state based on user behavior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination