CN114564444A - System for extracting, identifying and classifying files by using binary system - Google Patents
System for extracting, identifying and classifying files by using binary system Download PDFInfo
- Publication number
- CN114564444A CN114564444A CN202210174166.XA CN202210174166A CN114564444A CN 114564444 A CN114564444 A CN 114564444A CN 202210174166 A CN202210174166 A CN 202210174166A CN 114564444 A CN114564444 A CN 114564444A
- Authority
- CN
- China
- Prior art keywords
- data
- binary
- extracting
- data information
- identifying
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 229910002056 binary alloy Inorganic materials 0.000 title claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 238000000034 method Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 6
- 210000001503 joint Anatomy 0.000 abstract description 2
- 238000012423 maintenance Methods 0.000 abstract description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/113—Details of archiving
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/122—File system administration, e.g. details of archiving or snapshots using management policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/178—Techniques for file synchronisation in file systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of medical file data processing, and discloses a system for extracting, identifying and classifying files by using a binary system, which comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database; the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system; the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module, and the system for extracting, identifying and classifying the files by using the binary system can identify, extract and classify the information data of the medical files by acquiring the binary data packet of the medical database in the medical system, and can process the classified data information by using the same data structure according to a preset data structure so as to realize the front-end output of the unified style of the data information, and the hospital data maintenance cost can be reduced without interface butt joint.
Description
Technical Field
The invention relates to the technical field of medical file data processing, in particular to a system for extracting, identifying and classifying files by using binary system.
Background
Document classification is a technical route in informatics and computer science. The task is to assign a file to one or more categories. The classification can be completed by manual classification or realized by computer algorithm. Through classification, the data structure can be unified, and standardized output is realized.
At present, a large number of medical information systems used by hospitals are old, interfaces can not be maintained normally, and some hospitals have irregular data structures of data such as cases, inspection and the like due to replacement manufacturers, so that data display is not uniform, and the data acquisition effect is influenced.
Disclosure of Invention
In order to realize the system purpose of extracting, identifying and classifying files by using binary system, the invention is realized by the following technical scheme: a system for extracting, identifying and classifying files by using binary system comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database;
the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system;
the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module;
the extraction module is used for extracting the data information of the binary data packet identified by the identification module;
the classification module is used for classifying the data information extracted by the extraction module;
the new database is used for storing the data information after being processed by the new data structure.
A method for extracting, identifying and classifying files by using binary system specifically comprises the following steps:
s1, acquiring a binary data packet of a medical database in the medical system by using the data acquisition module;
s2, identifying the obtained binary data packet by using an identification module and an extraction module, and extracting data information in the binary data packet;
s3, classifying the extracted data information by using a classification module;
and S4, storing the classified data packets according to a uniform data structure, and establishing a new database.
Further, when the binary data packet is acquired in S1, the category of the binary data packet in the medical system is synchronously acquired, and a data tag is generated.
Further, the extracting of the data information in S2 specifically includes: and extracting data information in the binary data packet according to the identified binary data packet, and extracting a plurality of keywords according to the data word frequency appearing in the data information.
Further, the step of classifying the data information in S3 includes:
s301, analyzing the correlation degree of the data information extracted in the S2 and the data label by using a Sharksearch algorithm;
s302, judging whether the correlation degree between the data label and the data information extracted in the S2 reaches a set correlation rate;
s3021, if the degree of correlation between the data label and the data information extracted in S2 reaches a set correlation rate, establishing a new classification label by using the data label;
and S3022, if the degree of correlation between the data label and the data information extracted in S2 does not reach the set correlation rate, resetting the classification and classification label of the data information.
Further, the resetting of the classification label of the data information in S3022 includes:
s3031, performing relevancy analysis on the data information extracted in the S2 and the keyword by using a Sharksearch algorithm;
s3032, taking the correlation rate of each keyword as the weight of the keyword in the data information;
s3033, calculating a coincidence value of each keyword according to the weight of each keyword in the data center and the word frequency appearing in the data information;
s3033, establishing a new classification label by the key word corresponding to the highest conforming value through the comparison between the conforming values of each key word.
Further, the establishing of the new database in S4 specifically includes:
s401, presetting a data structure template;
s402, uniformly processing the classified data packets according to the data structure template in the S401;
and S403, storing the processed data packets according to the classification and classification labels, thereby establishing a new database.
Compared with the prior art, the invention has the following beneficial effects:
1. the system for extracting, identifying and classifying the files by using the binary system can identify, extract and classify the information data of the medical files by acquiring the binary data packet of the medical database in the medical system, and simultaneously can process the classified data information by using the same data structure according to a preset data structure and store the same data structure into the database so as to realize the front-end output of the unified style of the data information, and the hospital data maintenance cost can be reduced without interface butt joint.
2. According to the system for extracting, identifying and classifying the files by using the binary system, the medical file information data can be identified and extracted, and the type judgment and correction processing can be performed on the medical file data, so that the conformity of the file data and the classified type is improved, and the accuracy of acquiring the medical file information data content is improved.
Drawings
FIG. 1 is a flow chart of data categorization and acquisition for a medical system of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The embodiment of the system for extracting, identifying and classifying the files by using the binary system is as follows:
a system for extracting, identifying and classifying files by using binary system comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database;
the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system;
the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module;
the extraction module is used for extracting the data information of the binary data packet identified by the identification module;
the classification module is used for classifying the data information extracted by the extraction module;
referring to fig. 1, a system for extracting, identifying and classifying files by using binary system includes the following steps:
s1, acquiring a binary data packet of a medical database in the medical system by using a data acquisition module, and synchronously acquiring the category of the binary data packet in the medical system and generating a data label when acquiring the binary data packet;
s2, identifying the obtained binary data packet by using an identification module and an extraction module, and extracting data information in the binary data packet, where the extraction of the data information specifically is: extracting data information in the binary data packet according to the identified binary data packet, and extracting a plurality of keywords according to data word frequency appearing in the data information;
s3, classifying the extracted data information by using a classification module;
s301, analyzing the correlation degree of the data information extracted in the S2 and the data label by using a Sharksearch algorithm;
s302, judging whether the correlation degree between the data label and the data information extracted in the S2 reaches a set correlation rate;
s3021, if the degree of correlation between the data label and the data information extracted in S2 reaches a set correlation rate, establishing a new classification label by using the data label;
and S3022, if the degree of correlation between the data label and the data information extracted in S2 does not reach the set correlation rate, resetting the classification and classification label of the data information.
S3031, performing relevancy analysis on the data information extracted in the S2 and the keyword by using a Sharksearch algorithm;
s3032, taking the correlation rate of each keyword as the weight of the keyword in the data information;
s3033, calculating a coincidence value of each keyword according to the weight of each keyword in the data center and the word frequency appearing in the data information;
s3033, establishing a new classification label by the key word corresponding to the highest conforming value through the comparison between the conforming values of each key word.
S4, storing the classified data packets according to a unified data structure, and establishing a new database:
s401, presetting a data structure template;
s402, uniformly processing the classified data packets according to the data structure template in the S401;
and S403, storing the processed data packets according to the classification and classification labels, thereby establishing a new database.
The system is installed on a medical system software computer in a hospital or installed on a server, firstly, a user sends a data acquisition request through a user side and sends the data acquisition request to a medical database in the medical system through a foreground program of medical system software, the medical database starts to reply data after receiving the data acquisition request, and when the data is replied, a data acquisition module is used for acquiring a binary data packet replied by the medical database.
When the binary data packet is obtained, the category of the binary data packet in the medical system is synchronously obtained, a data label is generated, then the obtained binary data packet is identified by using an identification module and an extraction module, data information in the binary data packet is extracted, and a plurality of keywords are extracted according to data word frequency appearing in the data information.
Classifying the extracted data information by using a classifying module, analyzing the correlation degree of the extracted data information and the extracted data label by using a SharkSearch algorithm, judging whether the correlation degree between the data label and the extracted data information reaches a set correlation rate, and if the correlation degree between the data label and the extracted data information reaches the set correlation rate, establishing a new classifying and classifying label by using the data label.
If the correlation degree between the data label and the extracted data information does not reach the set correlation rate, re-setting a classification label of the data information, analyzing the correlation degree between the extracted data information and the keywords by using a Sharksearch algorithm, taking the correlation rate of each keyword as the weight of the keyword in the data information, calculating the coincidence value of each keyword according to the weight of each keyword in the data information and the word frequency appearing in the data information, and establishing a new classification label by using the keyword corresponding to the highest coincidence value through the comparison between the coincidence values of each keyword.
After the classification label is established, the data packet can be output through the front end of the medical system software and sent to a user in a data display mode, or a data structure template is preset, the classified data packet is processed uniformly according to the data structure template, the processed data packet is stored according to the classification label, so that a new database is established, and the reply data packet is sent to the user according to the new data structure through the new database.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (7)
1. A system for extracting, identifying and classifying files by using binary system is characterized in that: the system comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database;
the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system;
the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module;
the extraction module is used for extracting the data information of the binary data packet identified by the identification module;
the classification module is used for classifying the data information extracted by the extraction module;
the new database is used for storing the data information after being processed by the new data structure.
2. A method for extracting, identifying and classifying files by using binary system, which is applied to the system for extracting, identifying and classifying files by using binary system of claim 1, characterized in that: the method specifically comprises the following steps:
s1, acquiring a binary data packet of a medical database in the medical system by using the data acquisition module;
s2, identifying the obtained binary data packet by using an identification module and an extraction module, and extracting data information in the binary data packet;
s3, classifying the extracted data information by using a classification module;
and S4, storing the classified data packets according to a uniform data structure, and establishing a new database.
3. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: and when the binary data packet is acquired in the step S1, the category of the binary data packet in the medical system is synchronously acquired, and a data tag is generated.
4. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: the extracting of the data information in S2 specifically includes: and extracting data information in the binary data packet according to the identified binary data packet, and extracting a plurality of keywords according to the data word frequency appearing in the data information.
5. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: the specific step of classifying the data information in S3 includes:
s301, analyzing the correlation degree of the data information extracted in the S2 and the data label by using a Sharksearch algorithm;
s302, judging whether the degree of correlation between the data label and the data information extracted in the S2 reaches a set correlation rate;
s3021, if the degree of correlation between the data label and the data information extracted in S2 reaches a set correlation rate, establishing a new classification label by using the data label;
and S3022, if the degree of correlation between the data label and the data information extracted in the S2 does not reach the set correlation rate, resetting the classification and classification label of the data information.
6. The method for extracting, identifying and classifying files by using binary system as claimed in claim 5, wherein: the resetting of the classification label of the data information in S3022 includes:
s3031, performing relevancy analysis on the data information extracted in the S2 and the keywords by using a Sharksearch algorithm;
s3032, taking the correlation rate of each keyword as the weight of the keyword in the data information;
s3033, calculating a coincidence value of each keyword according to the weight of each keyword in the data center and the word frequency appearing in the data information;
s3033, establishing a new classification label by the key word corresponding to the highest conforming value through the comparison between the conforming values of each key word.
7. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: the establishing of the new database in S4 specifically includes:
s401, presetting a data structure template;
s402, uniformly processing the classified data packets according to the data structure template in the S401;
and S403, storing the processed data packets according to the classification and classification labels, thereby establishing a new database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210174166.XA CN114564444A (en) | 2022-02-24 | 2022-02-24 | System for extracting, identifying and classifying files by using binary system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210174166.XA CN114564444A (en) | 2022-02-24 | 2022-02-24 | System for extracting, identifying and classifying files by using binary system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114564444A true CN114564444A (en) | 2022-05-31 |
Family
ID=81715620
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210174166.XA Pending CN114564444A (en) | 2022-02-24 | 2022-02-24 | System for extracting, identifying and classifying files by using binary system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114564444A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845071A (en) * | 2016-12-15 | 2017-06-13 | 扬州立兴科技发展合伙企业(有限合伙) | A kind of trans-regional medical data information obtains system |
CN107818153A (en) * | 2017-10-27 | 2018-03-20 | 中航信移动科技有限公司 | Data classification method and device |
CN110415831A (en) * | 2019-07-18 | 2019-11-05 | 天宜(天津)信息科技有限公司 | A kind of medical treatment big data cloud service analysis platform |
CN111177372A (en) * | 2019-12-06 | 2020-05-19 | 绍兴市上虞区理工高等研究院 | Scientific and technological achievement classification method, device, equipment and medium |
WO2020155760A1 (en) * | 2019-01-28 | 2020-08-06 | 平安科技(深圳)有限公司 | Multi-database data processing method, apparatus, computer device, and storage medium |
CN111899832A (en) * | 2020-08-13 | 2020-11-06 | 东北电力大学 | Medical theme management system and method based on context semantic analysis |
CN113380414A (en) * | 2021-05-20 | 2021-09-10 | 心医国际数字医疗系统(大连)有限公司 | Data acquisition method and system based on big data |
-
2022
- 2022-02-24 CN CN202210174166.XA patent/CN114564444A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845071A (en) * | 2016-12-15 | 2017-06-13 | 扬州立兴科技发展合伙企业(有限合伙) | A kind of trans-regional medical data information obtains system |
CN107818153A (en) * | 2017-10-27 | 2018-03-20 | 中航信移动科技有限公司 | Data classification method and device |
WO2020155760A1 (en) * | 2019-01-28 | 2020-08-06 | 平安科技(深圳)有限公司 | Multi-database data processing method, apparatus, computer device, and storage medium |
CN110415831A (en) * | 2019-07-18 | 2019-11-05 | 天宜(天津)信息科技有限公司 | A kind of medical treatment big data cloud service analysis platform |
CN111177372A (en) * | 2019-12-06 | 2020-05-19 | 绍兴市上虞区理工高等研究院 | Scientific and technological achievement classification method, device, equipment and medium |
CN111899832A (en) * | 2020-08-13 | 2020-11-06 | 东北电力大学 | Medical theme management system and method based on context semantic analysis |
CN113380414A (en) * | 2021-05-20 | 2021-09-10 | 心医国际数字医疗系统(大连)有限公司 | Data acquisition method and system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110716868B (en) | Abnormal program behavior detection method and device | |
CN109656999B (en) | Method, device, storage medium and apparatus for synchronizing large data volume data | |
CN111553137B (en) | Report generation method and device, storage medium and computer equipment | |
CN111639077B (en) | Data management method, device, electronic equipment and storage medium | |
CN115879017A (en) | Automatic classification and grading method and device for power sensitive data and storage medium | |
CN112148750B (en) | Data integration method and system | |
CN112559526A (en) | Data table export method and device, computer equipment and storage medium | |
CN113642327A (en) | Method and device for constructing standard knowledge base | |
CN116452212B (en) | Intelligent customer service commodity knowledge base information management method and system | |
CN111581299A (en) | Inter-library data conversion system and method of multi-source data warehouse based on big data | |
CN114564444A (en) | System for extracting, identifying and classifying files by using binary system | |
US10614102B2 (en) | Method and system for creating entity records using existing data sources | |
CN112598226B (en) | Equipment checking method, device, equipment and storage medium | |
CN115016929A (en) | Data processing method, device, equipment and storage medium | |
CN114416847A (en) | Data conversion method, device, server and storage medium | |
CN113239126A (en) | Business activity information standardization scheme based on BOR method | |
CN113626387A (en) | Task data export method and device, electronic equipment and storage medium | |
CN111475657A (en) | Display device, display system and entity alignment method | |
CN116489047B (en) | Intelligent communication management system and method based on edge calculation | |
CN112966101B (en) | Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device | |
WO2024125183A1 (en) | Traffic identification method, terminal device, and storage medium | |
CN114860847B (en) | Data link processing method, system and medium applied to big data platform | |
CN117112846B (en) | Multi-information source license information management method, system and medium | |
CN112883727B (en) | Method and device for determining association relationship between people | |
CN117454892B (en) | Metadata management method, device, terminal equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230421 Address after: 12 / F, main building, high tech Zone building, hengsan Road, high tech Zone, Yangzhou City, Jiangsu Province, 225000 Applicant after: LANCET TECHNOLOGY CO.,LTD. Applicant after: Jiangsu Aijia Medical Technology Co.,Ltd. Address before: 12 / F, main building, high tech Zone building, hengsan Road, high tech Zone, Yangzhou City, Jiangsu Province, 225000 Applicant before: LANCET TECHNOLOGY CO.,LTD. |
|
TA01 | Transfer of patent application right |