CN117688927B - Medical record chapter reconfiguration method, system, terminal and storage medium - Google Patents

Medical record chapter reconfiguration method, system, terminal and storage medium Download PDF

Info

Publication number
CN117688927B
CN117688927B CN202410145758.8A CN202410145758A CN117688927B CN 117688927 B CN117688927 B CN 117688927B CN 202410145758 A CN202410145758 A CN 202410145758A CN 117688927 B CN117688927 B CN 117688927B
Authority
CN
China
Prior art keywords
chapter
section
keywords
paragraph
paragraphs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410145758.8A
Other languages
Chinese (zh)
Other versions
CN117688927A (en
Inventor
王兵卡
马杰
金剑
邓小宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North Health Medical Big Data Technology Co ltd
Original Assignee
North Health Medical Big Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North Health Medical Big Data Technology Co ltd filed Critical North Health Medical Big Data Technology Co ltd
Priority to CN202410145758.8A priority Critical patent/CN117688927B/en
Publication of CN117688927A publication Critical patent/CN117688927A/en
Application granted granted Critical
Publication of CN117688927B publication Critical patent/CN117688927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention belongs to the technical field of data processing, and particularly provides a medical record chapter reconfiguration method, a system, a terminal and a storage medium, wherein the method comprises the following steps: performing character recognition on the electronic medical record, and dividing character recognition contents according to paragraphs to obtain a plurality of paragraphs; extracting keywords from the paragraphs by using a keyword extraction technique, and generating weights for the keywords based on the positions of the keywords in the paragraphs; calculating weighted similarity between the keywords of the paragraph and preset keyword groups of a plurality of chapters; screening out the section with highest similarity as the section to which the section belongs, and combining adjacent sections with the same section to form the section content of the same section; and calling the corresponding chapter contents based on the chapter and the chapter sequence to be configured, and arranging the corresponding chapter contents according to the chapter sequence to obtain the reconfiguration medical record. The invention realizes the integration and effective analysis of medical records and can meet various inquiry requirements.

Description

Medical record chapter reconfiguration method, system, terminal and storage medium
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a medical record chapter reconfiguration method, a system, a terminal and a storage medium.
Background
When a large number of medical records are arranged, because the formats of the medical records are not uniform, the relevance between the data is low, and the information in the medical records is difficult to integrate and effectively analyze. In addition, when inquiring medical record data, only the whole medical record document can be output, which is not beneficial to the effective utilization of the medical record data.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a medical record chapter reconfiguration method, a system, a terminal and a storage medium, so as to solve the technical problems.
In a first aspect, the present invention provides a medical record chapter reconfiguration method, including:
performing character recognition on the electronic medical record, and dividing character recognition contents according to paragraphs to obtain a plurality of paragraphs;
extracting keywords from the paragraphs by using a keyword extraction technique, and generating weights for the keywords based on the positions of the keywords in the paragraphs;
Calculating weighted similarity between the keywords of the paragraph and preset keyword groups of a plurality of chapters;
Screening out the section with highest similarity as the section to which the section belongs, and combining adjacent sections with the same section to form the section content of the same section;
and calling the corresponding chapter contents based on the chapter and the chapter sequence to be configured, and arranging the corresponding chapter contents according to the chapter sequence to obtain the reconfiguration medical record.
In an alternative embodiment, extracting keywords from paragraphs using keyword extraction techniques and generating weights for the keywords based on their location in the paragraphs includes:
Sequentially generating sequence numbers from small to large for characters according to the arrangement sequence of the characters in the paragraphs;
Obtaining the maximum sequence number;
Acquiring the serial numbers of the keywords, and calculating the intermediate values of the serial numbers of the keywords;
The calculation formula of the keyword weight is as follows:
Wherein Q is the weight of the key word, Is the intermediate value of the sequence number of the keyword,/>Is the largest sequence number.
In an alternative embodiment, calculating the weighted similarity between the keyword of the paragraph and the keyword group of the preset chapters includes:
the names of a plurality of chapters and key phrase corresponding to each chapter name are stored in a database in advance;
Calculating Euclidean distance between the keywords of the paragraph and the keywords in the keyword group of the chapter, and screening out the shortest Euclidean distance;
Summarizing the shortest Euclidean distance between all keywords of the paragraph and the keyword groups of the chapter, and converting the shortest Euclidean distance into word segmentation similarity;
and calculating the weighted sum of the word segmentation similarity of all the keywords of the paragraph based on the weight of all the keywords of the paragraph, and obtaining the similarity of the paragraph and the section.
In an alternative embodiment, the selecting the section with the highest similarity as the section to which the section belongs, and combining the adjacent sections with the same section to form the section content of the same section includes:
sequentially generating segment numbers for the paragraphs based on the positions of the paragraphs in the overall document;
merging the paragraphs which belong to the same chapter and have adjacent segment numbers into the content of the same chapter;
Screening interval paragraphs, wherein a previous paragraph of the interval paragraph and a next paragraph of the interval paragraph belong to a first section;
judging whether the similarity between the interval section and the second section exceeds a set first threshold value:
If yes, judging that the interval paragraph belongs to a second chapter;
If not, acquiring the similarity of the spacing paragraph and the keywords of the first chapter, and if the similarity reaches a set second threshold, judging that the spacing paragraph belongs to the first chapter.
In a second aspect, the present invention provides a medical record chapter reconfiguration system comprising:
The character recognition module is used for recognizing characters of the electronic medical record and dividing the character recognition content according to the paragraphs to obtain a plurality of paragraphs;
The feature extraction module is used for extracting keywords from the paragraphs by using a keyword extraction technology and generating weights for the keywords based on the positions of the keywords in the paragraphs;
The chapter matching module is used for calculating weighted similarity between keywords of the paragraphs and preset keyword groups of a plurality of chapters;
the content combination module is used for screening out the section with the highest similarity as the section to which the section belongs, and combining the adjacent sections with the same section to form the section content of the same section;
And the chapter configuration module is used for calling the corresponding chapter contents based on chapters and chapter sequences to be configured and arranging the corresponding chapter contents according to the chapter sequences to obtain the reconfiguration medical record.
In an alternative embodiment, the feature extraction module includes:
Sequentially generating sequence numbers from small to large for characters according to the arrangement sequence of the characters in the paragraphs;
Obtaining the maximum sequence number;
Acquiring the serial numbers of the keywords, and calculating the intermediate values of the serial numbers of the keywords;
The calculation formula of the keyword weight is as follows:
Wherein Q is the weight of the key word, Is the intermediate value of the sequence number of the keyword,/>Is the largest sequence number.
In an alternative embodiment, the chapter matching module includes:
The standard storage unit is used for storing the names of a plurality of chapters and key word groups corresponding to the names of the chapters in a database in advance;
the distance calculation unit is used for calculating Euclidean distance between the keywords of the paragraph and the keywords in the keyword group of the chapter, and screening out the shortest Euclidean distance;
The distance conversion unit is used for summarizing the shortest Euclidean distance between all keywords of the paragraph and the keyword groups of the chapter and converting the shortest Euclidean distance into word segmentation similarity;
and the weighting calculation unit is used for calculating the weighted sum of the segmentation similarity of all the keywords of the paragraph based on the weight of all the keywords of the paragraph, so as to obtain the similarity of the paragraph and the chapter.
In an alternative embodiment, the content combining module includes:
a segment number generation unit for sequentially generating segment numbers for the paragraphs based on the positions of the paragraphs in the overall document;
the paragraph merging unit is used for merging the paragraphs which belong to the same chapter and have adjacent paragraph numbers into the same chapter content;
The target screening unit is used for screening interval paragraphs, and a previous paragraph of the interval paragraph and a next paragraph of the interval paragraph belong to the first section;
A threshold value judging unit for judging whether the similarity between the interval section and the second section exceeds a set first threshold value;
a first judging unit, configured to judge that an interval paragraph belongs to a second chapter if a similarity between the interval paragraph and the second chapter is greater than a set first threshold;
And the second judging unit is used for acquiring the similarity of the keywords of the interval section and the first section if the similarity of the interval section and the second section does not exceed a set first threshold value, and judging that the interval section belongs to the first section if the similarity reaches the set second threshold value.
In a third aspect, a terminal is provided, including:
a processor, a memory, wherein,
The memory is used for storing a computer program,
The processor is configured to call and run the computer program from the memory, so that the terminal performs the method of the terminal as described above.
In a fourth aspect, there is provided a computer storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the above aspects.
The medical record chapter reconfiguration method, the system, the terminal and the storage medium have the advantages that the medical record chapter reconfiguration method, the system, the terminal and the storage medium are used for classifying the medical record based on the text recognition technology, further extracting keywords and matching the keywords of the paragraphs, so that the paragraphs are matched with the chapters, the paragraphs are integrated based on the chapters to which the paragraphs belong, the medical record is quickly disassembled, the disassembled contents are combined based on configuration requirements, the integration and effective analysis of the medical record are realized, and various query requirements can be met.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic flow chart of a method of one embodiment of the invention.
FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
FIG. 1 is a schematic flow chart of a method of one embodiment of the invention. Wherein, the execution body of fig. 1 can be a medical record chapter reconfiguration system. The order of the steps in the flow chart may be changed and some may be omitted according to different needs.
As shown in fig. 1, the method includes:
Step 110, performing character recognition on the electronic medical record, and dividing the character recognition content according to paragraphs to obtain a plurality of paragraphs;
step 120, extracting keywords from the paragraphs by using a keyword extraction technique, and generating weights for the keywords based on the positions of the keywords in the paragraphs;
step 130, calculating weighted similarity between the keywords of the paragraph and preset keyword groups of a plurality of chapters;
Step 140, screening out the section with the highest similarity as the section to which the section belongs, and combining the adjacent sections with the same section to form the section content of the same section;
and step 150, calling the corresponding chapter contents based on the chapters and the chapter sequence to be configured, and arranging the corresponding chapter contents according to the chapter sequence to obtain the reconfiguration medical record.
In order to facilitate understanding of the present invention, the medical record chapter reconfiguration method provided by the present invention is further described below with reference to a process of reconfiguring a medical record chapter in an embodiment according to the principles of the medical record chapter reconfiguration method of the present invention.
Specifically, the medical record chapter reconfiguration method includes:
s1, performing character recognition on the electronic medical record, and dividing character recognition contents according to paragraphs to obtain a plurality of paragraphs.
And calculating the circumscribed rectangle of all outlines of the image subjected to definition analysis and setting the size threshold of the circumscribed rectangle according to the length and the width of the font outlines. The positions of all the outlines of the self-adaptive binarized image are extracted, the positions of all the outlines of the image comprise the positions of the font outlines and the positions of the circumscribed rectangles of the font outlines, and therefore the positions of the font outlines and the positions of the circumscribed rectangles of the font outlines are also extracted.
The method comprises the steps of obtaining the total length of circumscribed rectangles of each row, calculating the average length of circumscribed rectangles of all the rows, marking the row with the circumscribed rectangle length smaller than the average length as a paragraph starting row, and dividing a picture into regional pictures of a plurality of paragraphs based on the position of the paragraph starting row.
And recognizing characters on the regional picture by utilizing an OCR recognition engine to obtain paragraph contents.
S2, extracting keywords from the paragraphs by using a keyword extraction technology, and generating weights for the keywords based on the positions of the keywords in the paragraphs.
The text rank algorithm is a ranking algorithm for keyword extraction and document abstract based on a graph, is improved by the webpage importance ranking algorithm Pagerank algorithm, can extract keywords by utilizing co-occurrence information (semantics) among words/words in a document, can extract keywords and keyword groups of the document from a given document, and can extract keywords and sentences of the text by using an automatic abstract method.
Sequentially generating sequence numbers from small to large for characters according to the arrangement sequence of the characters in the paragraphs; obtaining the maximum sequence number; acquiring the serial numbers of the keywords, and calculating the intermediate values of the serial numbers of the keywords;
The calculation formula of the keyword weight is as follows:
Wherein Q is the weight of the key word, Is the intermediate value of the sequence number of the keyword,/>Is the largest sequence number.
And S3, calculating weighted similarity between the keywords of the paragraph and preset keyword groups of a plurality of chapters.
The names of a plurality of chapters and key phrase corresponding to each chapter name are stored in a database in advance; the method for setting the chapter names and the key word groups comprises the following steps:
Based on the types of the medical records and the names of the documents, automatically carrying out document type standardized comparison on the medical records based on name semantic matching; and from each type of medical record document, sampling typical medical record data according to the characteristic identifications such as the text length, the text chapter number and the like. Extracting keywords of chapter contents of each chapter of typical medical record data, setting the keywords with occurrence times exceeding a set frequency threshold as the keywords of the chapter, and storing all the keywords of the chapter as key word groups of the chapter.
Calculating Euclidean distance between the keywords of the paragraph and the keywords in the keyword group of the chapter, and screening out the shortest Euclidean distance; summarizing the shortest Euclidean distance between all keywords of the paragraph and the keyword groups of the chapter, and converting the shortest Euclidean distance into word segmentation similarity; and calculating the weighted sum of the word segmentation similarity of all the keywords of the paragraph based on the weight of all the keywords of the paragraph, and obtaining the similarity of the paragraph and the section.
Taking the similarity of the calculated paragraph 1 and the chapter 1 as an example, the paragraph 1 protects the keyword 1 and the keyword 2, the weight of the keyword 1 is Q1, the weight of the keyword 2 is Q2, the keyword group of the chapter 1 comprises the keyword a and the keyword b, the Euclidean distance L1 between the keyword 1 and the keyword a and the Euclidean distance L2 between the keyword 1 and the keyword b are calculated firstly, if L1 is smaller than L2, the Euclidean distance between the keyword 1 and the keyword group is regarded as L1, and the Euclidean distance L3 between the keyword 2 and the keyword group is calculated by the same way. Converting L1 and L3 into similarities K1, K2, respectively, calculating the similarity of paragraph 1 and chapter 1 to k=k1×q1+k2×q2; similarly, the similarity of the paragraph to other chapters is calculated.
S4, screening out the section with the highest similarity as the section to which the section belongs, and combining adjacent sections with the same section to form the section content of the same section.
Sequentially generating segment numbers for the paragraphs based on the positions of the paragraphs in the overall document; and merging the paragraphs which belong to the same chapter and have adjacent segment numbers into the same chapter content. For example, if the second segment is the same as the section to which the third segment belongs, the second segment is merged with the third segment.
Screening out the interval section, wherein the previous section of the interval section and the next section of the interval section belong to the first section; for example, the third segment is the same as the fifth segment, and the fourth segment is a spacer segment if the fourth segment is different from the third segment and the fifth segment.
It is determined whether the similarity between the interval section and the second section is greater than a set first threshold, for example, the first threshold is 95%. If the similarity between the interval section and the second section is more than the set first threshold value, judging that the interval section belongs to the second section.
If the similarity between the interval section and the second section does not exceed a set first threshold value, acquiring the similarity between the interval section and the keyword of the first section, and if the similarity reaches the set second threshold value, judging that the interval section belongs to the first section; if the similarity does not reach the set second threshold, the interval section is sent to the manual processing terminal for manual judgment.
And the accuracy of chapter splitting is improved and the recognition error rate is reduced by carrying out secondary screening treatment on the interval paragraphs.
S5, calling the corresponding chapter content based on the chapter and chapter sequence to be configured, and arranging the corresponding chapter content according to the chapter sequence to obtain the reconfiguration medical record.
For example, a query request is received, the query request only protects a small portion of the key chapters and limits the chapter sequence, and the target chapter content is ordered after being called, so that the required medical record is obtained. The method realizes standardization of the medical record file and is beneficial to effective integration analysis of medical record information.
In some embodiments, the medical record section reconfiguration system can include a plurality of functional modules that are comprised of computer program segments. The computer program of each program segment in the medical record chapter reconfiguration system can be stored in a memory of a computer device and executed by at least one processor to perform (see fig. 1 for details) the medical record chapter reconfiguration functions.
In this embodiment, the medical record section reconfiguration system may be divided into a plurality of functional modules according to the functions performed by the medical record section reconfiguration system, as shown in fig. 2. The functional modules of system 200 may include: a text recognition module 210, a feature advance module 220, a chapter matching module 230, a content combining module 240, and a chapter configuration module 250. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.
The character recognition module is used for recognizing characters of the electronic medical record and dividing the character recognition content according to the paragraphs to obtain a plurality of paragraphs;
The feature extraction module is used for extracting keywords from the paragraphs by using a keyword extraction technology and generating weights for the keywords based on the positions of the keywords in the paragraphs;
The chapter matching module is used for calculating weighted similarity between keywords of the paragraphs and preset keyword groups of a plurality of chapters;
the content combination module is used for screening out the section with the highest similarity as the section to which the section belongs, and combining the adjacent sections with the same section to form the section content of the same section;
And the chapter configuration module is used for calling the corresponding chapter contents based on chapters and chapter sequences to be configured and arranging the corresponding chapter contents according to the chapter sequences to obtain the reconfiguration medical record.
Optionally, as an embodiment of the present invention, the feature extraction module includes:
Sequentially generating sequence numbers from small to large for characters according to the arrangement sequence of the characters in the paragraphs;
Obtaining the maximum sequence number;
Acquiring the serial numbers of the keywords, and calculating the intermediate values of the serial numbers of the keywords;
The calculation formula of the keyword weight is as follows:
Wherein Q is the weight of the key word, Is the intermediate value of the sequence number of the keyword,/>Is the largest sequence number.
Optionally, as an embodiment of the present invention, the chapter matching module includes:
The standard storage unit is used for storing the names of a plurality of chapters and key word groups corresponding to the names of the chapters in a database in advance;
the distance calculation unit is used for calculating Euclidean distance between the keywords of the paragraph and the keywords in the keyword group of the chapter, and screening out the shortest Euclidean distance;
The distance conversion unit is used for summarizing the shortest Euclidean distance between all keywords of the paragraph and the keyword groups of the chapter and converting the shortest Euclidean distance into word segmentation similarity;
and the weighting calculation unit is used for calculating the weighted sum of the segmentation similarity of all the keywords of the paragraph based on the weight of all the keywords of the paragraph, so as to obtain the similarity of the paragraph and the chapter.
Optionally, as an embodiment of the present invention, the content combining module includes:
a segment number generation unit for sequentially generating segment numbers for the paragraphs based on the positions of the paragraphs in the overall document;
the paragraph merging unit is used for merging the paragraphs which belong to the same chapter and have adjacent paragraph numbers into the same chapter content;
The target screening unit is used for screening interval paragraphs, and a previous paragraph of the interval paragraph and a next paragraph of the interval paragraph belong to the first section;
A threshold value judging unit for judging whether the similarity between the interval section and the second section exceeds a set first threshold value;
a first judging unit, configured to judge that an interval paragraph belongs to a second chapter if a similarity between the interval paragraph and the second chapter is greater than a set first threshold;
And the second judging unit is used for acquiring the similarity of the keywords of the interval section and the first section if the similarity of the interval section and the second section does not exceed a set first threshold value, and judging that the interval section belongs to the first section if the similarity reaches the set second threshold value.
Fig. 3 is a schematic structural diagram of a terminal 300 according to an embodiment of the present invention, where the terminal 300 may be used to execute the medical record chapter reconfiguration method according to the embodiment of the present invention.
The terminal 300 may include: a processor 310, a memory 320 and a communication unit 330. The components may communicate via one or more buses, and it will be appreciated by those skilled in the art that the configuration of the server as shown in the drawings is not limiting of the invention, as it may be a bus-like structure, a star-like structure, or include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.
The memory 320 may be used to store instructions for execution by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile memory terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk. The execution of the instructions in memory 320, when executed by processor 310, enables terminal 300 to perform some or all of the steps in the method embodiments described below.
The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by running or executing software programs and/or modules stored in the memory 320, and invoking data stored in the memory. The processor may be comprised of an integrated circuit (INTEGRATED CIRCUIT, simply referred to as an IC), for example, a single packaged IC, or may be comprised of multiple packaged ICs connected to one another for the same function or for different functions. For example, the processor 310 may include only a central processing unit (Central Processing Unit, CPU for short). In the embodiment of the invention, the CPU can be a single operation core or can comprise multiple operation cores.
And a communication unit 330 for establishing a communication channel so that the storage terminal can communicate with other terminals. Receiving user data sent by other terminals or sending the user data to other terminals.
The present invention also provides a computer storage medium in which a program may be stored, which program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory RAM), or the like.
Therefore, according to the method and the device, the medical record is divided into the paragraphs based on the text recognition technology, the keyword extraction and the keyword matching are further carried out on the paragraphs, so that the paragraphs are matched with the paragraphs, the paragraphs are integrated based on the paragraphs to which the paragraphs belong, the medical record is rapidly disassembled, the disassembled contents are combined based on the configuration requirements, the medical record is integrated and effectively analyzed, various query requirements can be met, and the technical effects achieved by the method and the device can be seen from the description above, and are not repeated.
It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium such as a U-disc, a mobile hard disc, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc. various media capable of storing program codes, including several instructions for causing a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention.
The same or similar parts between the various embodiments in this specification are referred to each other. In particular, for the terminal embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference should be made to the description in the method embodiment for relevant points.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with respect to each other may be through some interface, indirect coupling or communication connection of systems or modules, electrical, mechanical, or other form.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

Claims (10)

1. A medical record chapter reconfiguration method, comprising:
performing character recognition on the electronic medical record, and dividing character recognition contents according to paragraphs to obtain a plurality of paragraphs;
extracting keywords from the paragraphs by using a keyword extraction technique, and generating weights for the keywords based on the positions of the keywords in the paragraphs;
Calculating weighted similarity between the keywords of the paragraph and preset keyword groups of a plurality of chapters;
Screening out the section with highest similarity as the section to which the section belongs, and combining adjacent sections with the same section to form the section content of the same section;
and calling the corresponding chapter contents based on the chapter and the chapter sequence to be configured, and arranging the corresponding chapter contents according to the chapter sequence to obtain the reconfiguration medical record.
2. The method of claim 1, wherein extracting keywords from the paragraphs using a keyword extraction technique and generating weights for the keywords based on their location in the paragraphs comprises:
Sequentially generating sequence numbers from small to large for characters according to the arrangement sequence of the characters in the paragraphs;
Obtaining the maximum sequence number;
Acquiring the serial numbers of the keywords, and calculating the intermediate values of the serial numbers of the keywords;
The calculation formula of the keyword weight is as follows:
Wherein Q is the weight of the key word, Is the intermediate value of the sequence number of the keyword,/>Is the largest sequence number.
3. The method of claim 2, wherein calculating weighted similarity of keywords of a paragraph and preset keyword groups of a plurality of chapters comprises:
the names of a plurality of chapters and key phrase corresponding to each chapter name are stored in a database in advance;
Calculating Euclidean distance between the keywords of the paragraph and the keywords in the keyword group of the chapter, and screening out the shortest Euclidean distance;
Summarizing the shortest Euclidean distance between all keywords of the paragraph and the keyword groups of the chapter, and converting the shortest Euclidean distance into word segmentation similarity;
and calculating the weighted sum of the word segmentation similarity of all the keywords of the paragraph based on the weight of all the keywords of the paragraph, and obtaining the similarity of the paragraph and the section.
4. The method according to claim 1, wherein selecting the section with the highest similarity as the section to which the section belongs and combining adjacent sections with the same section to be the section content of the same section includes:
sequentially generating segment numbers for the paragraphs based on the positions of the paragraphs in the overall document;
merging the paragraphs which belong to the same chapter and have adjacent segment numbers into the content of the same chapter;
Screening interval paragraphs, wherein a previous paragraph of the interval paragraph and a next paragraph of the interval paragraph belong to a first section;
judging whether the similarity between the interval section and the second section exceeds a set first threshold value:
If yes, judging that the interval paragraph belongs to a second chapter;
If not, acquiring the similarity of the spacing paragraph and the keywords of the first chapter, and if the similarity reaches a set second threshold, judging that the spacing paragraph belongs to the first chapter.
5. A medical record chapter reconfiguration system, comprising:
The character recognition module is used for recognizing characters of the electronic medical record and dividing the character recognition content according to the paragraphs to obtain a plurality of paragraphs;
The feature extraction module is used for extracting keywords from the paragraphs by using a keyword extraction technology and generating weights for the keywords based on the positions of the keywords in the paragraphs;
The chapter matching module is used for calculating weighted similarity between keywords of the paragraphs and preset keyword groups of a plurality of chapters;
the content combination module is used for screening out the section with the highest similarity as the section to which the section belongs, and combining the adjacent sections with the same section to form the section content of the same section;
And the chapter configuration module is used for calling the corresponding chapter contents based on chapters and chapter sequences to be configured and arranging the corresponding chapter contents according to the chapter sequences to obtain the reconfiguration medical record.
6. The system of claim 5, wherein the feature extraction module comprises:
Sequentially generating sequence numbers from small to large for characters according to the arrangement sequence of the characters in the paragraphs;
Obtaining the maximum sequence number;
Acquiring the serial numbers of the keywords, and calculating the intermediate values of the serial numbers of the keywords;
The calculation formula of the keyword weight is as follows:
Wherein Q is the weight of the key word, Is the intermediate value of the sequence number of the keyword,/>Is the largest sequence number.
7. The system of claim 6, wherein the chapter matching module comprises:
The standard storage unit is used for storing the names of a plurality of chapters and key word groups corresponding to the names of the chapters in a database in advance;
the distance calculation unit is used for calculating Euclidean distance between the keywords of the paragraph and the keywords in the keyword group of the chapter, and screening out the shortest Euclidean distance;
The distance conversion unit is used for summarizing the shortest Euclidean distance between all keywords of the paragraph and the keyword groups of the chapter and converting the shortest Euclidean distance into word segmentation similarity;
and the weighting calculation unit is used for calculating the weighted sum of the segmentation similarity of all the keywords of the paragraph based on the weight of all the keywords of the paragraph, so as to obtain the similarity of the paragraph and the chapter.
8. The system of claim 5, wherein the content combining module comprises:
a segment number generation unit for sequentially generating segment numbers for the paragraphs based on the positions of the paragraphs in the overall document;
the paragraph merging unit is used for merging the paragraphs which belong to the same chapter and have adjacent paragraph numbers into the same chapter content;
The target screening unit is used for screening interval paragraphs, and a previous paragraph of the interval paragraph and a next paragraph of the interval paragraph belong to the first section;
A threshold value judging unit for judging whether the similarity between the interval section and the second section exceeds a set first threshold value;
a first judging unit, configured to judge that an interval paragraph belongs to a second chapter if a similarity between the interval paragraph and the second chapter is greater than a set first threshold;
And the second judging unit is used for acquiring the similarity of the keywords of the interval section and the first section if the similarity of the interval section and the second section does not exceed a set first threshold value, and judging that the interval section belongs to the first section if the similarity reaches the set second threshold value.
9. A terminal, comprising:
The memory is used for storing a medical record chapter reconfiguration program;
A processor for implementing the steps of the medical record chapter reconfiguration method according to any one of claims 1-4 when executing the medical record chapter reconfiguration program.
10. A computer readable storage medium storing a computer program, wherein a medical record chapter reconfiguration program is stored on the readable storage medium, and the medical record chapter reconfiguration program, when executed by a processor, implements the steps of the medical record chapter reconfiguration method as claimed in any one of claims 1-4.
CN202410145758.8A 2024-02-02 2024-02-02 Medical record chapter reconfiguration method, system, terminal and storage medium Active CN117688927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410145758.8A CN117688927B (en) 2024-02-02 2024-02-02 Medical record chapter reconfiguration method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410145758.8A CN117688927B (en) 2024-02-02 2024-02-02 Medical record chapter reconfiguration method, system, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN117688927A CN117688927A (en) 2024-03-12
CN117688927B true CN117688927B (en) 2024-04-30

Family

ID=90135729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410145758.8A Active CN117688927B (en) 2024-02-02 2024-02-02 Medical record chapter reconfiguration method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN117688927B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317786A (en) * 2014-10-13 2015-01-28 安徽华贞信息科技有限公司 Method and system for segmenting text paragraphs
CN105474211A (en) * 2013-08-21 2016-04-06 微软技术许可有限责任公司 Presenting fixed format documents in reflowed format
KR20200036333A (en) * 2018-09-28 2020-04-07 배재대학교 산학협력단 Document analysis-based key element extraction system and method
CN113988082A (en) * 2021-10-28 2022-01-28 泰康保险集团股份有限公司 Text processing method and device, electronic equipment and storage medium
CN114817449A (en) * 2022-05-11 2022-07-29 平安国际智慧城市科技股份有限公司 Text search ordering method and device based on artificial intelligence and related equipment
CN116578696A (en) * 2023-05-16 2023-08-11 平安科技(深圳)有限公司 Text abstract generation method, device, equipment and storage medium
CN117473984A (en) * 2023-10-27 2024-01-30 北京天顶星智能信息技术有限公司 Method and system for dividing txt document content chapters

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI549003B (en) * 2014-08-18 2016-09-11 葆光資訊有限公司 Method for automatic sections division

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105474211A (en) * 2013-08-21 2016-04-06 微软技术许可有限责任公司 Presenting fixed format documents in reflowed format
CN104317786A (en) * 2014-10-13 2015-01-28 安徽华贞信息科技有限公司 Method and system for segmenting text paragraphs
KR20200036333A (en) * 2018-09-28 2020-04-07 배재대학교 산학협력단 Document analysis-based key element extraction system and method
CN113988082A (en) * 2021-10-28 2022-01-28 泰康保险集团股份有限公司 Text processing method and device, electronic equipment and storage medium
CN114817449A (en) * 2022-05-11 2022-07-29 平安国际智慧城市科技股份有限公司 Text search ordering method and device based on artificial intelligence and related equipment
CN116578696A (en) * 2023-05-16 2023-08-11 平安科技(深圳)有限公司 Text abstract generation method, device, equipment and storage medium
CN117473984A (en) * 2023-10-27 2024-01-30 北京天顶星智能信息技术有限公司 Method and system for dividing txt document content chapters

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Study on semantic paragraph partition in automatic abstracting system;Wan Min等;《2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236)》;20020806;全文 *
学术文本的结构功能识别――基于段落的识别;黄永;陆伟;程齐凯;桂思思;;情报学报;20160524(05);全文 *

Also Published As

Publication number Publication date
CN117688927A (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN108595695B (en) Data processing method, data processing device, computer equipment and storage medium
CN109670163B (en) Information identification method, information recommendation method, template construction method and computing device
EP3819785A1 (en) Feature word determining method, apparatus, and server
WO2015149533A1 (en) Method and device for word segmentation processing on basis of webpage content classification
KR101195341B1 (en) Method and apparatus for determining category of an unknown word
CN111352907A (en) Method and device for analyzing pipeline file, computer equipment and storage medium
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN111177532A (en) Vertical search method, device, computer system and readable storage medium
US7555428B1 (en) System and method for identifying compounds through iterative analysis
EP3232336A1 (en) Method and device for recognizing stop word
CN107885717B (en) Keyword extraction method and device
US11790174B2 (en) Entity recognition method and apparatus
CN113407785B (en) Data processing method and system based on distributed storage system
CN111563382A (en) Text information acquisition method and device, storage medium and computer equipment
CN111339166A (en) Word stock-based matching recommendation method, electronic device and storage medium
CN112328735A (en) Hot topic determination method and device and terminal equipment
CN114398968B (en) Method and device for labeling similar customer-obtaining files based on file similarity
CN112541109B (en) Answer abstract extraction method and device, electronic equipment, readable medium and product
CN112749258A (en) Data searching method and device, electronic equipment and storage medium
CN112527954A (en) Unstructured data full-text search method and system and computer equipment
CN117688927B (en) Medical record chapter reconfiguration method, system, terminal and storage medium
CN111985212A (en) Text keyword recognition method and device, computer equipment and readable storage medium
CN111444712A (en) Keyword extraction method, terminal and computer readable storage medium
CN110705258A (en) Text entity identification method and device
CN113095073B (en) Corpus tag generation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant