CN114564444A - System for extracting, identifying and classifying files by using binary system - Google Patents

System for extracting, identifying and classifying files by using binary system Download PDF

Info

Publication number
CN114564444A
CN114564444A CN202210174166.XA CN202210174166A CN114564444A CN 114564444 A CN114564444 A CN 114564444A CN 202210174166 A CN202210174166 A CN 202210174166A CN 114564444 A CN114564444 A CN 114564444A
Authority
CN
China
Prior art keywords
data
binary
extracting
data information
identifying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210174166.XA
Other languages
Chinese (zh)
Inventor
汪路
祝林杰
其他发明人请求不公开姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Aijia Medical Technology Co ltd
Lancet Technology Co Ltd
Original Assignee
Lancet Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lancet Technology Co Ltd filed Critical Lancet Technology Co Ltd
Priority to CN202210174166.XA priority Critical patent/CN114564444A/en
Publication of CN114564444A publication Critical patent/CN114564444A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of medical file data processing, and discloses a system for extracting, identifying and classifying files by using a binary system, which comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database; the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system; the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module, and the system for extracting, identifying and classifying the files by using the binary system can identify, extract and classify the information data of the medical files by acquiring the binary data packet of the medical database in the medical system, and can process the classified data information by using the same data structure according to a preset data structure so as to realize the front-end output of the unified style of the data information, and the hospital data maintenance cost can be reduced without interface butt joint.

Description

System for extracting, identifying and classifying files by using binary system
Technical Field
The invention relates to the technical field of medical file data processing, in particular to a system for extracting, identifying and classifying files by using binary system.
Background
Document classification is a technical route in informatics and computer science. The task is to assign a file to one or more categories. The classification can be completed by manual classification or realized by computer algorithm. Through classification, the data structure can be unified, and standardized output is realized.
At present, a large number of medical information systems used by hospitals are old, interfaces can not be maintained normally, and some hospitals have irregular data structures of data such as cases, inspection and the like due to replacement manufacturers, so that data display is not uniform, and the data acquisition effect is influenced.
Disclosure of Invention
In order to realize the system purpose of extracting, identifying and classifying files by using binary system, the invention is realized by the following technical scheme: a system for extracting, identifying and classifying files by using binary system comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database;
the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system;
the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module;
the extraction module is used for extracting the data information of the binary data packet identified by the identification module;
the classification module is used for classifying the data information extracted by the extraction module;
the new database is used for storing the data information after being processed by the new data structure.
A method for extracting, identifying and classifying files by using binary system specifically comprises the following steps:
s1, acquiring a binary data packet of a medical database in the medical system by using the data acquisition module;
s2, identifying the obtained binary data packet by using an identification module and an extraction module, and extracting data information in the binary data packet;
s3, classifying the extracted data information by using a classification module;
and S4, storing the classified data packets according to a uniform data structure, and establishing a new database.
Further, when the binary data packet is acquired in S1, the category of the binary data packet in the medical system is synchronously acquired, and a data tag is generated.
Further, the extracting of the data information in S2 specifically includes: and extracting data information in the binary data packet according to the identified binary data packet, and extracting a plurality of keywords according to the data word frequency appearing in the data information.
Further, the step of classifying the data information in S3 includes:
s301, analyzing the correlation degree of the data information extracted in the S2 and the data label by using a Sharksearch algorithm;
s302, judging whether the correlation degree between the data label and the data information extracted in the S2 reaches a set correlation rate;
s3021, if the degree of correlation between the data label and the data information extracted in S2 reaches a set correlation rate, establishing a new classification label by using the data label;
and S3022, if the degree of correlation between the data label and the data information extracted in S2 does not reach the set correlation rate, resetting the classification and classification label of the data information.
Further, the resetting of the classification label of the data information in S3022 includes:
s3031, performing relevancy analysis on the data information extracted in the S2 and the keyword by using a Sharksearch algorithm;
s3032, taking the correlation rate of each keyword as the weight of the keyword in the data information;
s3033, calculating a coincidence value of each keyword according to the weight of each keyword in the data center and the word frequency appearing in the data information;
s3033, establishing a new classification label by the key word corresponding to the highest conforming value through the comparison between the conforming values of each key word.
Further, the establishing of the new database in S4 specifically includes:
s401, presetting a data structure template;
s402, uniformly processing the classified data packets according to the data structure template in the S401;
and S403, storing the processed data packets according to the classification and classification labels, thereby establishing a new database.
Compared with the prior art, the invention has the following beneficial effects:
1. the system for extracting, identifying and classifying the files by using the binary system can identify, extract and classify the information data of the medical files by acquiring the binary data packet of the medical database in the medical system, and simultaneously can process the classified data information by using the same data structure according to a preset data structure and store the same data structure into the database so as to realize the front-end output of the unified style of the data information, and the hospital data maintenance cost can be reduced without interface butt joint.
2. According to the system for extracting, identifying and classifying the files by using the binary system, the medical file information data can be identified and extracted, and the type judgment and correction processing can be performed on the medical file data, so that the conformity of the file data and the classified type is improved, and the accuracy of acquiring the medical file information data content is improved.
Drawings
FIG. 1 is a flow chart of data categorization and acquisition for a medical system of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The embodiment of the system for extracting, identifying and classifying the files by using the binary system is as follows:
a system for extracting, identifying and classifying files by using binary system comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database;
the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system;
the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module;
the extraction module is used for extracting the data information of the binary data packet identified by the identification module;
the classification module is used for classifying the data information extracted by the extraction module;
referring to fig. 1, a system for extracting, identifying and classifying files by using binary system includes the following steps:
s1, acquiring a binary data packet of a medical database in the medical system by using a data acquisition module, and synchronously acquiring the category of the binary data packet in the medical system and generating a data label when acquiring the binary data packet;
s2, identifying the obtained binary data packet by using an identification module and an extraction module, and extracting data information in the binary data packet, where the extraction of the data information specifically is: extracting data information in the binary data packet according to the identified binary data packet, and extracting a plurality of keywords according to data word frequency appearing in the data information;
s3, classifying the extracted data information by using a classification module;
s301, analyzing the correlation degree of the data information extracted in the S2 and the data label by using a Sharksearch algorithm;
s302, judging whether the correlation degree between the data label and the data information extracted in the S2 reaches a set correlation rate;
s3021, if the degree of correlation between the data label and the data information extracted in S2 reaches a set correlation rate, establishing a new classification label by using the data label;
and S3022, if the degree of correlation between the data label and the data information extracted in S2 does not reach the set correlation rate, resetting the classification and classification label of the data information.
S3031, performing relevancy analysis on the data information extracted in the S2 and the keyword by using a Sharksearch algorithm;
s3032, taking the correlation rate of each keyword as the weight of the keyword in the data information;
s3033, calculating a coincidence value of each keyword according to the weight of each keyword in the data center and the word frequency appearing in the data information;
s3033, establishing a new classification label by the key word corresponding to the highest conforming value through the comparison between the conforming values of each key word.
S4, storing the classified data packets according to a unified data structure, and establishing a new database:
s401, presetting a data structure template;
s402, uniformly processing the classified data packets according to the data structure template in the S401;
and S403, storing the processed data packets according to the classification and classification labels, thereby establishing a new database.
The system is installed on a medical system software computer in a hospital or installed on a server, firstly, a user sends a data acquisition request through a user side and sends the data acquisition request to a medical database in the medical system through a foreground program of medical system software, the medical database starts to reply data after receiving the data acquisition request, and when the data is replied, a data acquisition module is used for acquiring a binary data packet replied by the medical database.
When the binary data packet is obtained, the category of the binary data packet in the medical system is synchronously obtained, a data label is generated, then the obtained binary data packet is identified by using an identification module and an extraction module, data information in the binary data packet is extracted, and a plurality of keywords are extracted according to data word frequency appearing in the data information.
Classifying the extracted data information by using a classifying module, analyzing the correlation degree of the extracted data information and the extracted data label by using a SharkSearch algorithm, judging whether the correlation degree between the data label and the extracted data information reaches a set correlation rate, and if the correlation degree between the data label and the extracted data information reaches the set correlation rate, establishing a new classifying and classifying label by using the data label.
If the correlation degree between the data label and the extracted data information does not reach the set correlation rate, re-setting a classification label of the data information, analyzing the correlation degree between the extracted data information and the keywords by using a Sharksearch algorithm, taking the correlation rate of each keyword as the weight of the keyword in the data information, calculating the coincidence value of each keyword according to the weight of each keyword in the data information and the word frequency appearing in the data information, and establishing a new classification label by using the keyword corresponding to the highest coincidence value through the comparison between the coincidence values of each keyword.
After the classification label is established, the data packet can be output through the front end of the medical system software and sent to a user in a data display mode, or a data structure template is preset, the classified data packet is processed uniformly according to the data structure template, the processed data packet is stored according to the classification label, so that a new database is established, and the reply data packet is sent to the user according to the new data structure through the new database.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. A system for extracting, identifying and classifying files by using binary system is characterized in that: the system comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database;
the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system;
the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module;
the extraction module is used for extracting the data information of the binary data packet identified by the identification module;
the classification module is used for classifying the data information extracted by the extraction module;
the new database is used for storing the data information after being processed by the new data structure.
2. A method for extracting, identifying and classifying files by using binary system, which is applied to the system for extracting, identifying and classifying files by using binary system of claim 1, characterized in that: the method specifically comprises the following steps:
s1, acquiring a binary data packet of a medical database in the medical system by using the data acquisition module;
s2, identifying the obtained binary data packet by using an identification module and an extraction module, and extracting data information in the binary data packet;
s3, classifying the extracted data information by using a classification module;
and S4, storing the classified data packets according to a uniform data structure, and establishing a new database.
3. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: and when the binary data packet is acquired in the step S1, the category of the binary data packet in the medical system is synchronously acquired, and a data tag is generated.
4. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: the extracting of the data information in S2 specifically includes: and extracting data information in the binary data packet according to the identified binary data packet, and extracting a plurality of keywords according to the data word frequency appearing in the data information.
5. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: the specific step of classifying the data information in S3 includes:
s301, analyzing the correlation degree of the data information extracted in the S2 and the data label by using a Sharksearch algorithm;
s302, judging whether the degree of correlation between the data label and the data information extracted in the S2 reaches a set correlation rate;
s3021, if the degree of correlation between the data label and the data information extracted in S2 reaches a set correlation rate, establishing a new classification label by using the data label;
and S3022, if the degree of correlation between the data label and the data information extracted in the S2 does not reach the set correlation rate, resetting the classification and classification label of the data information.
6. The method for extracting, identifying and classifying files by using binary system as claimed in claim 5, wherein: the resetting of the classification label of the data information in S3022 includes:
s3031, performing relevancy analysis on the data information extracted in the S2 and the keywords by using a Sharksearch algorithm;
s3032, taking the correlation rate of each keyword as the weight of the keyword in the data information;
s3033, calculating a coincidence value of each keyword according to the weight of each keyword in the data center and the word frequency appearing in the data information;
s3033, establishing a new classification label by the key word corresponding to the highest conforming value through the comparison between the conforming values of each key word.
7. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: the establishing of the new database in S4 specifically includes:
s401, presetting a data structure template;
s402, uniformly processing the classified data packets according to the data structure template in the S401;
and S403, storing the processed data packets according to the classification and classification labels, thereby establishing a new database.
CN202210174166.XA 2022-02-24 2022-02-24 System for extracting, identifying and classifying files by using binary system Pending CN114564444A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210174166.XA CN114564444A (en) 2022-02-24 2022-02-24 System for extracting, identifying and classifying files by using binary system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210174166.XA CN114564444A (en) 2022-02-24 2022-02-24 System for extracting, identifying and classifying files by using binary system

Publications (1)

Publication Number Publication Date
CN114564444A true CN114564444A (en) 2022-05-31

Family

ID=81715620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210174166.XA Pending CN114564444A (en) 2022-02-24 2022-02-24 System for extracting, identifying and classifying files by using binary system

Country Status (1)

Country Link
CN (1) CN114564444A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845071A (en) * 2016-12-15 2017-06-13 扬州立兴科技发展合伙企业(有限合伙) A kind of trans-regional medical data information obtains system
CN107818153A (en) * 2017-10-27 2018-03-20 中航信移动科技有限公司 Data classification method and device
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform
CN111177372A (en) * 2019-12-06 2020-05-19 绍兴市上虞区理工高等研究院 Scientific and technological achievement classification method, device, equipment and medium
WO2020155760A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Multi-database data processing method, apparatus, computer device, and storage medium
CN111899832A (en) * 2020-08-13 2020-11-06 东北电力大学 Medical theme management system and method based on context semantic analysis
CN113380414A (en) * 2021-05-20 2021-09-10 心医国际数字医疗系统(大连)有限公司 Data acquisition method and system based on big data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845071A (en) * 2016-12-15 2017-06-13 扬州立兴科技发展合伙企业(有限合伙) A kind of trans-regional medical data information obtains system
CN107818153A (en) * 2017-10-27 2018-03-20 中航信移动科技有限公司 Data classification method and device
WO2020155760A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Multi-database data processing method, apparatus, computer device, and storage medium
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform
CN111177372A (en) * 2019-12-06 2020-05-19 绍兴市上虞区理工高等研究院 Scientific and technological achievement classification method, device, equipment and medium
CN111899832A (en) * 2020-08-13 2020-11-06 东北电力大学 Medical theme management system and method based on context semantic analysis
CN113380414A (en) * 2021-05-20 2021-09-10 心医国际数字医疗系统(大连)有限公司 Data acquisition method and system based on big data

Similar Documents

Publication Publication Date Title
CN110716868B (en) Abnormal program behavior detection method and device
CN109656999B (en) Method, device, storage medium and apparatus for synchronizing large data volume data
CN111553137B (en) Report generation method and device, storage medium and computer equipment
CN111639077B (en) Data management method, device, electronic equipment and storage medium
CN115879017A (en) Automatic classification and grading method and device for power sensitive data and storage medium
CN112148750B (en) Data integration method and system
CN112559526A (en) Data table export method and device, computer equipment and storage medium
CN113642327A (en) Method and device for constructing standard knowledge base
CN116452212B (en) Intelligent customer service commodity knowledge base information management method and system
CN111581299A (en) Inter-library data conversion system and method of multi-source data warehouse based on big data
CN114564444A (en) System for extracting, identifying and classifying files by using binary system
US10614102B2 (en) Method and system for creating entity records using existing data sources
CN112598226B (en) Equipment checking method, device, equipment and storage medium
CN115016929A (en) Data processing method, device, equipment and storage medium
CN114416847A (en) Data conversion method, device, server and storage medium
CN113239126A (en) Business activity information standardization scheme based on BOR method
CN113626387A (en) Task data export method and device, electronic equipment and storage medium
CN111475657A (en) Display device, display system and entity alignment method
CN116489047B (en) Intelligent communication management system and method based on edge calculation
CN112966101B (en) Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device
WO2024125183A1 (en) Traffic identification method, terminal device, and storage medium
CN114860847B (en) Data link processing method, system and medium applied to big data platform
CN117112846B (en) Multi-information source license information management method, system and medium
CN112883727B (en) Method and device for determining association relationship between people
CN117454892B (en) Metadata management method, device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230421

Address after: 12 / F, main building, high tech Zone building, hengsan Road, high tech Zone, Yangzhou City, Jiangsu Province, 225000

Applicant after: LANCET TECHNOLOGY CO.,LTD.

Applicant after: Jiangsu Aijia Medical Technology Co.,Ltd.

Address before: 12 / F, main building, high tech Zone building, hengsan Road, high tech Zone, Yangzhou City, Jiangsu Province, 225000

Applicant before: LANCET TECHNOLOGY CO.,LTD.

TA01 Transfer of patent application right