CN114564444A

CN114564444A - System for extracting, identifying and classifying files by using binary system

Info

Publication number: CN114564444A
Application number: CN202210174166.XA
Authority: CN
Inventors: 汪路; 祝林杰; 其他发明人请求不公开姓名
Original assignee: Lancet Technology Co Ltd
Current assignee: Jiangsu Aijia Medical Technology Co ltd; Lancet Technology Co Ltd
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-05-31

Abstract

The invention relates to the technical field of medical file data processing, and discloses a system for extracting, identifying and classifying files by using a binary system, which comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database; the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system; the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module, and the system for extracting, identifying and classifying the files by using the binary system can identify, extract and classify the information data of the medical files by acquiring the binary data packet of the medical database in the medical system, and can process the classified data information by using the same data structure according to a preset data structure so as to realize the front-end output of the unified style of the data information, and the hospital data maintenance cost can be reduced without interface butt joint.

Description

System for extracting, identifying and classifying files by using binary system

Technical Field

The invention relates to the technical field of medical file data processing, in particular to a system for extracting, identifying and classifying files by using binary system.

Background

Document classification is a technical route in informatics and computer science. The task is to assign a file to one or more categories. The classification can be completed by manual classification or realized by computer algorithm. Through classification, the data structure can be unified, and standardized output is realized.

At present, a large number of medical information systems used by hospitals are old, interfaces can not be maintained normally, and some hospitals have irregular data structures of data such as cases, inspection and the like due to replacement manufacturers, so that data display is not uniform, and the data acquisition effect is influenced.

Disclosure of Invention

In order to realize the system purpose of extracting, identifying and classifying files by using binary system, the invention is realized by the following technical scheme: a system for extracting, identifying and classifying files by using binary system comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database;

the data acquisition module is used for acquiring a binary data packet of a medical database in the medical system;

the identification module is used for identifying the data information of the binary data packet acquired by the acquisition module;

the extraction module is used for extracting the data information of the binary data packet identified by the identification module;

the classification module is used for classifying the data information extracted by the extraction module;

the new database is used for storing the data information after being processed by the new data structure.

A method for extracting, identifying and classifying files by using binary system specifically comprises the following steps:

s1, acquiring a binary data packet of a medical database in the medical system by using the data acquisition module;

s2, identifying the obtained binary data packet by using an identification module and an extraction module, and extracting data information in the binary data packet;

s3, classifying the extracted data information by using a classification module;

and S4, storing the classified data packets according to a uniform data structure, and establishing a new database.

Further, when the binary data packet is acquired in S1, the category of the binary data packet in the medical system is synchronously acquired, and a data tag is generated.

Further, the extracting of the data information in S2 specifically includes: and extracting data information in the binary data packet according to the identified binary data packet, and extracting a plurality of keywords according to the data word frequency appearing in the data information.

Further, the step of classifying the data information in S3 includes:

s301, analyzing the correlation degree of the data information extracted in the S2 and the data label by using a Sharksearch algorithm;

s302, judging whether the correlation degree between the data label and the data information extracted in the S2 reaches a set correlation rate;

s3021, if the degree of correlation between the data label and the data information extracted in S2 reaches a set correlation rate, establishing a new classification label by using the data label;

and S3022, if the degree of correlation between the data label and the data information extracted in S2 does not reach the set correlation rate, resetting the classification and classification label of the data information.

Further, the resetting of the classification label of the data information in S3022 includes:

s3031, performing relevancy analysis on the data information extracted in the S2 and the keyword by using a Sharksearch algorithm;

s3032, taking the correlation rate of each keyword as the weight of the keyword in the data information;

s3033, calculating a coincidence value of each keyword according to the weight of each keyword in the data center and the word frequency appearing in the data information;

s3033, establishing a new classification label by the key word corresponding to the highest conforming value through the comparison between the conforming values of each key word.

Further, the establishing of the new database in S4 specifically includes:

s401, presetting a data structure template;

s402, uniformly processing the classified data packets according to the data structure template in the S401;

and S403, storing the processed data packets according to the classification and classification labels, thereby establishing a new database.

Compared with the prior art, the invention has the following beneficial effects:

1. the system for extracting, identifying and classifying the files by using the binary system can identify, extract and classify the information data of the medical files by acquiring the binary data packet of the medical database in the medical system, and simultaneously can process the classified data information by using the same data structure according to a preset data structure and store the same data structure into the database so as to realize the front-end output of the unified style of the data information, and the hospital data maintenance cost can be reduced without interface butt joint.

2. According to the system for extracting, identifying and classifying the files by using the binary system, the medical file information data can be identified and extracted, and the type judgment and correction processing can be performed on the medical file data, so that the conformity of the file data and the classified type is improved, and the accuracy of acquiring the medical file information data content is improved.

Drawings

FIG. 1 is a flow chart of data categorization and acquisition for a medical system of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The embodiment of the system for extracting, identifying and classifying the files by using the binary system is as follows:

a system for extracting, identifying and classifying files by using binary system comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database;

referring to fig. 1, a system for extracting, identifying and classifying files by using binary system includes the following steps:

s1, acquiring a binary data packet of a medical database in the medical system by using a data acquisition module, and synchronously acquiring the category of the binary data packet in the medical system and generating a data label when acquiring the binary data packet;

s2, identifying the obtained binary data packet by using an identification module and an extraction module, and extracting data information in the binary data packet, where the extraction of the data information specifically is: extracting data information in the binary data packet according to the identified binary data packet, and extracting a plurality of keywords according to data word frequency appearing in the data information;

S4, storing the classified data packets according to a unified data structure, and establishing a new database:

s401, presetting a data structure template;

The system is installed on a medical system software computer in a hospital or installed on a server, firstly, a user sends a data acquisition request through a user side and sends the data acquisition request to a medical database in the medical system through a foreground program of medical system software, the medical database starts to reply data after receiving the data acquisition request, and when the data is replied, a data acquisition module is used for acquiring a binary data packet replied by the medical database.

When the binary data packet is obtained, the category of the binary data packet in the medical system is synchronously obtained, a data label is generated, then the obtained binary data packet is identified by using an identification module and an extraction module, data information in the binary data packet is extracted, and a plurality of keywords are extracted according to data word frequency appearing in the data information.

Classifying the extracted data information by using a classifying module, analyzing the correlation degree of the extracted data information and the extracted data label by using a SharkSearch algorithm, judging whether the correlation degree between the data label and the extracted data information reaches a set correlation rate, and if the correlation degree between the data label and the extracted data information reaches the set correlation rate, establishing a new classifying and classifying label by using the data label.

If the correlation degree between the data label and the extracted data information does not reach the set correlation rate, re-setting a classification label of the data information, analyzing the correlation degree between the extracted data information and the keywords by using a Sharksearch algorithm, taking the correlation rate of each keyword as the weight of the keyword in the data information, calculating the coincidence value of each keyword according to the weight of each keyword in the data information and the word frequency appearing in the data information, and establishing a new classification label by using the keyword corresponding to the highest coincidence value through the comparison between the coincidence values of each keyword.

After the classification label is established, the data packet can be output through the front end of the medical system software and sent to a user in a data display mode, or a data structure template is preset, the classified data packet is processed uniformly according to the data structure template, the processed data packet is stored according to the classification label, so that a new database is established, and the reply data packet is sent to the user according to the new data structure through the new database.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A system for extracting, identifying and classifying files by using binary system is characterized in that: the system comprises a data acquisition module, an identification module, an extraction module, a classification module and a new database;

2. A method for extracting, identifying and classifying files by using binary system, which is applied to the system for extracting, identifying and classifying files by using binary system of claim 1, characterized in that: the method specifically comprises the following steps:

3. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: and when the binary data packet is acquired in the step S1, the category of the binary data packet in the medical system is synchronously acquired, and a data tag is generated.

4. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: the extracting of the data information in S2 specifically includes: and extracting data information in the binary data packet according to the identified binary data packet, and extracting a plurality of keywords according to the data word frequency appearing in the data information.

5. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: the specific step of classifying the data information in S3 includes:

s302, judging whether the degree of correlation between the data label and the data information extracted in the S2 reaches a set correlation rate;

and S3022, if the degree of correlation between the data label and the data information extracted in the S2 does not reach the set correlation rate, resetting the classification and classification label of the data information.

6. The method for extracting, identifying and classifying files by using binary system as claimed in claim 5, wherein: the resetting of the classification label of the data information in S3022 includes:

s3031, performing relevancy analysis on the data information extracted in the S2 and the keywords by using a Sharksearch algorithm;

7. The method for extracting, identifying and classifying files by using binary system as claimed in claim 2, wherein: the establishing of the new database in S4 specifically includes:

s401, presetting a data structure template;