CN112016274A - Medical text structuring method and device, computer equipment and storage medium - Google Patents

Medical text structuring method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112016274A
CN112016274A CN202010935255.2A CN202010935255A CN112016274A CN 112016274 A CN112016274 A CN 112016274A CN 202010935255 A CN202010935255 A CN 202010935255A CN 112016274 A CN112016274 A CN 112016274A
Authority
CN
China
Prior art keywords
text
medical
code file
characteristic
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010935255.2A
Other languages
Chinese (zh)
Other versions
CN112016274B (en
Inventor
朱威
何义龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010935255.2A priority Critical patent/CN112016274B/en
Priority to PCT/CN2020/124215 priority patent/WO2021164301A1/en
Publication of CN112016274A publication Critical patent/CN112016274A/en
Application granted granted Critical
Publication of CN112016274B publication Critical patent/CN112016274B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to an artificial intelligence technology, is applied to the field of medical text processing, and particularly discloses a medical text structuring method, a medical text structuring device, computer equipment and a storage medium. The method comprises the following steps: capturing an unstructured medical knowledge text; splitting the unstructured text into a plurality of first feature sentences; after the first feature sentence is input into a preset language identification model, a semantic feature vector is obtained; after all semantic feature vectors are input into a preset article semantic recognition model, acquiring output second feature sentences; calling a first code file of the medical source text to be processed, and inserting a segmentation symbol into a position, corresponding to the position to be segmented of the second characteristic sentence, in the first code file to obtain a second code file; and running the second code file to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the medical source text to be processed. The method and the device can improve the conversion efficiency of the structured medical knowledge text.

Description

Medical text structuring method and device, computer equipment and storage medium
Technical Field
The invention relates to the field of intelligent decision making in artificial intelligence, in particular to a medical text structuring method, a medical text structuring device, computer equipment and a storage medium.
Background
At present, a large amount of medical knowledge text texts are contained in the same medical source text, and the texts relate to a plurality of medical knowledge in the medical field, when the medical knowledge texts need to be displayed in an interface, the medical knowledge texts need to be edited effectively by people to be structured and convenient to view, but the text formats of the medical knowledge texts in the source text are generally irregular, and most of the medical knowledge texts are presented in an unstructured form, so that errors of manual editing are easily caused, the editing efficiency is low, and the editing takes much time. Especially when some emerging medical knowledge texts (new product specifications in the medical field and the like) need to be presented to the user, the medical knowledge texts are required to have a structured specific format, such as correct segmentation and reasonable indentation. If the structured medical text which can be externally displayed is formed through manual editing, the method is time-consuming and labor-consuming. Therefore, a new technical solution is needed to solve the above problems.
Disclosure of Invention
Therefore, it is necessary to provide a method, an apparatus, a computer device and a storage medium for structuring a medical text, which are used to avoid the problems of high error rate of manual editing and much time spent on manual editing, and can improve the efficiency of converting an unstructured medical knowledge text into a structured medical knowledge text.
A medical text structuring method, comprising:
grabbing the whole section of unstructured medical knowledge text in the medical source text to be processed;
identifying all punctuation marks in the unstructured medical knowledge text, and splitting the unstructured text into a plurality of first characteristic sentences according to the punctuation marks;
after the first characteristic sentences are input into a preset language identification model, a semantic characteristic vector corresponding to each first characteristic sentence is obtained;
after all the semantic feature vectors are input into a preset article semantic recognition model, acquiring a second feature sentence output by the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;
calling a first code file of the medical source text to be processed, inquiring the second characteristic sentence from the first code file, and inserting a segmentation symbol into a position, corresponding to the position to be segmented of the second characteristic sentence, in the first code file to obtain a second code file;
and running the second code file to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the to-be-processed medical source text.
A medical text structuring apparatus comprising:
the grabbing module is used for grabbing the whole section of unstructured medical knowledge text in the medical source text to be processed;
the splitting module is used for identifying all punctuation marks in the unstructured medical knowledge text and splitting the unstructured text into a plurality of first characteristic sentences according to the punctuation marks;
the first acquisition module is used for acquiring a semantic feature vector corresponding to each first feature sentence after the first feature sentences are input into a preset language identification model;
the second acquisition module is used for acquiring a second characteristic sentence output by the preset article semantic recognition model after all the semantic feature vectors are input to the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;
the inserting module is used for calling a first code file of the medical source text to be processed, inquiring the second characteristic sentence from the first code file, and inserting a segmentation symbol into the first code file at a position corresponding to the position to be segmented of the second characteristic sentence to obtain a second code file;
and the display module is used for operating the second code file so as to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the to-be-processed medical source text.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the above medical text structuring method when executing the computer program.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the above-mentioned medical text structuring method.
According to the medical text structuring method, the medical text structuring device, the computer equipment and the storage medium, the model and the segmentation symbols are used for replacing the mode of manually editing the unstructured medical knowledge text in the medical source text to be processed in the prior art, the problems that manual editing is high in error rate and time is consumed in manual editing are solved, and the efficiency of converting the unstructured medical knowledge text into the structured medical knowledge text is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a diagram illustrating an application environment of a method for structuring medical texts according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a method for structuring medical text in one embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a medical text structuring apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The medical text structuring method provided by the invention can be applied to the application environment shown in fig. 1, wherein a client communicates with a server through a network. Among other things, the client may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.
In an embodiment, as shown in fig. 2, a medical text structuring method is provided, which is described by taking the server in fig. 1 as an example, and includes the following steps:
s10, grabbing the whole section of unstructured medical knowledge text in the medical source text to be processed;
understandably, the to-be-processed medical source text may refer to an unstructured medical knowledge text in the medical field in a web page, wherein the unstructured medical knowledge text may include, but is not limited to, formulas of various medical drugs, various medical function descriptions of the medical drugs, or product specifications of the medical drugs on the web page; unstructured medical knowledge text refers to text without a fixed format, where fixed formats include, but are not limited to, text paragraph format, text format, indented format, and space format, because the text in the medical source text to be processed in the embodiment is uploaded by different users, and the formats used by the users in the editing process are inconsistent, the whole text is uploaded by different users of the same medical source text to be processed, the problem that the whole text presented finally has inconsistent text format is solved, and various input components or display components and the like in the medical source text to be processed can cause the problem of incompatible text, copying the whole text from the display component in one medical source text to be processed to the input component in another medical source text to be processed, and possibly changing the previously structured medical knowledge text into an unstructured medical knowledge text; specifically, the capturing of the unstructured medical knowledge text in the medical source text to be processed in this embodiment may be determined by identifying a text, after all texts in the display interface in the medical source text to be processed are identified, the text selected by the user may be used as the unstructured medical knowledge text, or may be identified by the NLP model, and when the NLP model identifies that the text in the medical source text to be processed has a plurality of inconsistent formats or does not have a format, the text in the medical source text to be processed may be captured as the unstructured medical knowledge text.
S20, recognizing all punctuation marks in the unstructured medical knowledge text, and splitting the unstructured text into a plurality of first characteristic sentences according to the punctuation marks;
understandably, the punctuation marks in the unstructured medical knowledge text can be recognized through a punctuation mark recognition component, the punctuation marks can also be recognized through an NLP model, sentences in the unstructured medical knowledge text can be segmented through the recognized punctuation marks, and the plurality of first characteristic sentences are obtained, wherein the punctuation marks in the first characteristic sentences can be other marks which can be used for splitting complete sentences, such as sentence marks, exclamation marks or question marks. In the embodiment, the sentence is split into the plurality of first characteristic sentences, each sentence represents the characteristic of a complete sentence, and the characteristic provides the link relation of the complete sentence in the subsequent semantic recognition process, so that the phenomenon of mixed recognition among the sentences is avoided.
S30, inputting the first feature sentences into a preset language identification model, and then acquiring a semantic feature vector corresponding to each first feature sentence;
understandably, the predetermined language identification model may be a bert model, wherein the bert model may be used to capture the first feature sentence and a level description of each word of the first feature sentence, and the goal of the bert model is to train with a large-scale unlabeled corpus to obtain a representation of the first feature sentence that includes rich semantic information. The core of the bert model is a Transformer module, and the Transformer module is created by using an Attention mechanism, and the created Transformer module can be assembled into the bert model. In this embodiment, a semantic feature vector corresponding to a first feature statement is obtained by using a word-to-sentence relationship in a bert model.
S40, after all the semantic feature vectors are input into a preset article semantic recognition model, acquiring a second feature sentence output by the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;
understandably, the article semantic recognition model is preset as an LSTM model, and the LSTM model aims to memorize information for a long time so as to recognize each complete sentence in the input text. The core processing of the LSTM model is completed through 3 thresholds, wherein the 3 thresholds are a forgetting threshold, an input threshold and an output threshold respectively, in addition, when the LSTM model is combined with the context of an input text, a complete second characteristic statement can be determined, and one second characteristic statement can form two positions to be divided.
S50, calling out a first code file of the medical source text to be processed, inquiring the second characteristic sentence from the first code file, and inserting a segmentation symbol into a position, corresponding to the position to be segmented, of the second characteristic sentence in the first code file to obtain a second code file;
understandably, the first code file represents a background code file corresponding to the medical source text to be processed, and the first code file can be called through a script language; the second characteristic sentence is converted from the unstructured medical knowledge text in the medical source text to be processed, so that the first code file of the second characteristic sentence in the medical source text to be processed also has a character display position (the character display position comprises a plurality of second characteristic sentences), the second characteristic sentence can be specifically inquired and written in the first code file to a code language corresponding to the character display position, and finally, a word corresponding to the second characteristic sentence is identified in the character display position through the code language so as to determine the second characteristic sentence; the segmentation symbol can be understood as an html symbol, and specifically, two segmentation symbols (that is, at least two positions to be segmented are included in the text display position) can be inserted into two positions to be segmented corresponding to the text display position, and the two segmentation symbols form segmentation on one second feature sentence, wherein the segmentation symbols include div symbols, h 1-h 6 title symbols, and the like.
S60, the second code file is operated to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the medical source text to be processed.
Understandably, the second code file is a background code file including segmentation symbols, and the second code file is required to be run when a specific medical source text to be processed is displayed.
Further, after the capturing the whole unstructured medical knowledge text in the medical source text to be processed, the method further includes:
detecting the unstructured medical knowledge text through a preset natural language processing model, marking words with errors in the unstructured medical knowledge text and obtaining a marking result;
calling a first code file of the medical source text to be processed, correcting the words with errors in the first code file according to the marking result to obtain a third code file, and operating the third code file to obtain the corrected unstructured medical knowledge text.
Understandably, the preset natural language processing model can be an NLP model, words with repeated or wrong words appearing in the unstructured medical knowledge text are marked through a semantic recognition function established by the model, and the words with written errors in the first code file are corrected through a marked result, wherein the correction includes deletion of the repeated words and words with wrongly written words.
Further, the first feature statement is stored in a block chain; the preset language identification model is a bert model;
after the first feature sentences are input into a preset language identification model, obtaining a semantic feature vector corresponding to each first feature sentence, including:
after the first characteristic statement is input into the bert model, querying a word vector of each word in the first characteristic statement through the bert model;
selecting one word vector in the first characteristic statement as a Query vector through an Attention mechanism in the bert model, and using other word vectors of the first characteristic statement as Key vectors;
performing similarity calculation on the Query vector and each Key vector to obtain a weight coefficient, and performing weighted operation on the Value values corresponding to the Query vector and the Key vectors through the weight coefficient to obtain a first enhanced semantic feature vector corresponding to the Query vector output by the Attention mechanism;
performing linear conversion on the first enhanced semantic feature vector through a plurality of push-over Transformer encoders in the bert model to obtain a second enhanced semantic feature vector;
and combining the second enhanced semantic feature vectors corresponding to the word vectors of each word in the first feature sentence to obtain the semantic feature vectors corresponding to the first feature sentence.
Understandably, the embodiment mainly allows the model to pay Attention to the input first feature sentence through the Attention mechanism in the bert model; the Attention mechanism in this embodiment includes Query vectors, Key vectors, and valid values, where both the Query vectors and the Key vectors are derived from word vectors, and each word vector has a corresponding valid Value, and the Attention essence can be described as a mapping of a Query (Query) to a series of (Key-Value) pairs; specifically, in this embodiment, first, after a first feature statement is input into a bert model, each word in the first feature statement is queried through the bert model, each word is converted into a one-dimensional word vector through the bert model, then, one word vector of a first feature statement is used as a target vector Query vector, other word vectors in the first feature statement are used as Key vectors, similarity calculation is performed on the Query vectors and each Key vector to obtain a weight coefficient, wherein common similarity functions include, but are not limited to, dot product, concatenation and perceptron, then, a preset softmax function is used to normalize the obtained weight coefficient, and the normalized weight coefficient and a Key value corresponding to the Query vector and the Key vector are subjected to weighted summation operation to obtain a first enhanced semantic feature vector corresponding to the Query vector and output by an Attention mechanism, and finally, performing data processing on the first enhanced semantic feature vector by using each transform Encoder formed by an Attention mechanism, wherein the data processing comprises incomplete connection (directly adding the word vector and the first enhanced semantic feature vector to be finally output), standardization of 0-mean-1 variance and linear conversion (performing linear conversion on the first enhanced semantic feature vector to enhance the expression capability of the bert model) on a certain layer of neural network nodes, and combining second enhanced semantic feature vectors corresponding to each word vector to obtain the semantic feature vectors corresponding to the first feature sentences. The present embodiment uses the bert model as the preset language identification model, and the purpose can be achieved by: 1. the relationship between the first feature sentences, namely the contact context, can be learned; 2. semantic representations (second enhanced semantic feature vectors) at the sentence level are well acquired.
It should be emphasized that, in order to further ensure the privacy and security of the first feature statement, the first feature statement may also be stored in a node of a blockchain. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like. The decentralized fully distributed DNS service provided by the blockchain can realize the query and analysis of the domain name through the point-to-point data transmission service among all nodes in the network, can be used for ensuring that an operating system and firmware of certain important infrastructure are not tampered, can monitor the state and integrity of software, finds out bad tampering, ensures that transmitted data are not tampered, stores the first characteristic statement in the blockchain, and can ensure the privacy and the safety of the first characteristic statement.
Further, after inserting a segmentation symbol into a position in the first code file corresponding to a position to be segmented of the second feature sentence to obtain a second code file, the method further includes:
calling out a corresponding cascading style sheet according to a preset style format, and adding the cascading style sheet into the second code file.
Understandably, the embodiment mainly adds a corresponding cascading style sheet such as color, font number, frame body, etc. in the second code file to show specific format states in the medical source text to be processed, such as color in CSS, font-size in CSS, and box in CSS.
Further, the preset article semantic recognition model is an LSTM model;
after all the semantic feature vectors are input into a preset article semantic recognition model, the method includes:
selecting discarded information through a forgetting threshold in the LSTM model;
selecting required information from the semantic feature vector through an input threshold in the LSTM model and the discarded information;
and outputting the second characteristic statement through an output threshold in the LSTM model and the required information.
Understandably, the LSTM model is a threshold RNN, the key of which is the cell state, and therefore, each threshold of the LSTM model design is the ability to remove or add information to the cell state (which can be regarded as a semantic feature vector), wherein each threshold comprises a Sigmoid neural network layer and a pointwise multiplication operation, the Sigmoid neural network layer outputs a value between 0 and 1 to describe how much of each part can pass, 0 stands for no pass, and 1 stands for pass; the forgetting gate can determine the discarded information in the cell state, the discarded information is the subject corresponding to the last semantic feature vector, the input threshold can update the stored information in the cell state, specifically, the discarded information is discarded from the semantic feature vector by the input threshold, the required information required to be updated is determined from the discarded semantic feature vector, the second feature sentence with the output threshold determined is output, and the second feature sentence is output according to the determined required information in the input threshold.
In summary, the above-mentioned method for structuring medical texts is provided, where the model and the segmentation symbol are used to replace the original work of manually editing the unstructured medical knowledge text in the medical source text to be processed, so as to avoid the problems of high error rate of manual editing and long time spent on manual editing, and improve the efficiency of converting the unstructured medical knowledge text into the structured medical knowledge text. The method can be applied to intelligent medical treatment, so that the construction of an intelligent city is promoted.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, a medical text structuring apparatus is provided, which corresponds to the medical text structuring method in the above embodiments one to one. As shown in fig. 3, the medical text structuring device comprises a grabbing module 11, a splitting module 12, a first obtaining module 13, a second obtaining module 14, an inserting module 15 and a displaying module 16. The functional modules are explained in detail as follows:
the grabbing module 11 is used for grabbing the whole unstructured medical knowledge text in the medical source text to be processed;
the splitting module 12 is configured to identify all punctuation marks in the unstructured medical knowledge text, and split the unstructured text into a plurality of first feature sentences according to the punctuation marks;
a first obtaining module 13, configured to obtain a semantic feature vector corresponding to each first feature sentence after the first feature sentence is input to a preset language identification model;
the second obtaining module 14 is configured to obtain a second feature sentence output by a preset article semantic recognition model after all the semantic feature vectors are input to the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;
the inserting module 15 is configured to call a first code file of the medical source text to be processed, query the second feature sentence from the first code file, and insert a segmentation symbol in a position of the first code file corresponding to a position to be segmented of the second feature sentence to obtain a second code file;
a display module 16, configured to run the second code file to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the to-be-processed medical source text.
Further, the medical text structuring apparatus further comprises:
the marking module is used for detecting the unstructured medical knowledge text through a preset natural language processing model, marking words with errors in the unstructured text and acquiring a marking result;
and the operation module is used for calling the first code file of the medical source text to be processed, correcting the words with errors in the first code file according to the marking result to obtain a third code file, and operating the third code file to obtain the corrected unstructured medical knowledge text.
Further, the preset language identification model is a bert model, and the first obtaining module includes:
the input sub-module is used for inquiring the word vector of each word in the first characteristic statement through the bert model after the first characteristic statement is input to the bert model;
the selecting submodule is used for selecting one word vector in the first characteristic statement as a Query vector through an Attention mechanism in the bert model, and using other word vectors of the first characteristic statement as Key vectors;
the weighted operation submodule is used for carrying out similarity calculation on the Query vector and each Key vector to obtain a weight coefficient, carrying out weighted operation on the Query vector and a Value corresponding to the Key vector through the weight coefficient to obtain a first enhanced semantic feature vector which is output by the Attention mechanism and corresponds to the Query vector;
the linear conversion submodule is used for performing linear conversion on the first enhanced semantic feature vector through a plurality of push-to-fold Transformer encoders in the bert model to obtain a second enhanced semantic feature vector;
and the combination submodule is used for combining the second enhanced semantic feature vector corresponding to the word vector of each word in the first feature statement to obtain the semantic feature vector corresponding to the first feature statement.
Further, the medical text structuring apparatus further comprises:
and the adding module is used for calling out a corresponding cascading style sheet according to a preset style format and adding the cascading style sheet into the second code file.
Further, the preset article semantic recognition model is an LSTM model, and the second obtaining module includes:
the first selection submodule is used for selecting discarded information through a forgetting threshold in the LSTM model;
a second selection sub-module for selecting the required information from the semantic feature vector through the input threshold in the LSTM model and the discarded information;
and the output sub-module is used for outputting the second characteristic statement through an output threshold in the LSTM model and the required information.
For specific definitions of the medical text structuring apparatus, reference may be made to the above definitions of the medical text structuring method, which are not further described herein. The various modules in the medical text structuring apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data involved in the medical text structuring method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a medical text structuring method.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the steps of the medical text structuring method in the above-mentioned embodiments, such as steps S10 to S60 shown in fig. 2. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units of the medical text structuring apparatus in the above-described embodiments, such as the functions of the modules 11 to 16 shown in fig. 3. To avoid repetition, further description is omitted here.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the medical text structuring method in the above-described embodiments, such as the steps S10 to S30 shown in fig. 2. Alternatively, the computer program, when executed by the processor, implements the functions of the modules/units of the medical text structuring apparatus in the above-described embodiments, such as the functions of the modules 11 to 16 shown in fig. 3. To avoid repetition, further description is omitted here.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A medical text structuring method, comprising:
grabbing the whole section of unstructured medical knowledge text in the medical source text to be processed;
identifying all punctuation marks in the unstructured medical knowledge text, and splitting the unstructured text into a plurality of first characteristic sentences according to the punctuation marks;
after the first characteristic sentences are input into a preset language identification model, a semantic characteristic vector corresponding to each first characteristic sentence is obtained;
after all the semantic feature vectors are input into a preset article semantic recognition model, acquiring a second feature sentence output by the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;
calling a first code file of the medical source text to be processed, inquiring the second characteristic sentence from the first code file, and inserting a segmentation symbol into a position, corresponding to the position to be segmented of the second characteristic sentence, in the first code file to obtain a second code file;
and running the second code file to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the to-be-processed medical source text.
2. The method for structuring medical texts according to claim 1, wherein after the capturing of the unstructured medical knowledge text of the whole segment in the medical source text to be processed, the method further comprises:
detecting the unstructured medical knowledge text through a preset natural language processing model, marking words with errors in the unstructured medical knowledge text and obtaining a marking result;
calling a first code file of the medical source text to be processed, correcting the words with errors in the first code file according to the marking result to obtain a third code file, and operating the third code file to obtain the corrected unstructured medical knowledge text.
3. The medical text structuring method according to claim 1, wherein the preset language recognition model is a bert model;
after the first feature sentences are input into a preset language identification model, obtaining a semantic feature vector corresponding to each first feature sentence, including:
after the first characteristic statement is input into the bert model, querying a word vector of each word in the first characteristic statement through the bert model;
selecting one word vector in the first characteristic statement as a Query vector through an Attention mechanism in the bert model, and using other word vectors of the first characteristic statement as Key vectors;
performing similarity calculation on the Query vector and each Key vector to obtain a weight coefficient, and performing weighted operation on the Value values corresponding to the Query vector and the Key vectors through the weight coefficient to obtain a first enhanced semantic feature vector corresponding to the Query vector output by the Attention mechanism;
performing linear conversion on the first enhanced semantic feature vector through a plurality of push-over Transformer encoders in the bert model to obtain a second enhanced semantic feature vector;
and combining the second enhanced semantic feature vectors corresponding to the word vectors of each word in the first feature sentence to obtain the semantic feature vectors corresponding to the first feature sentence.
4. The method according to claim 1, wherein after inserting a segmentation symbol into the first code file at a position corresponding to a position to be segmented of the second eigen-sentence, and obtaining a second code file, the method further comprises:
calling a corresponding cascading style sheet according to a preset style format, and nesting the cascading style sheet into the second code file.
5. The medical text structuring method according to claim 1, wherein the preset article semantic recognition model is an LSTM model;
after all the semantic feature vectors are input into a preset article semantic recognition model, the method includes:
selecting discarded information through a forgetting threshold in the LSTM model;
selecting required information from the semantic feature vector through an input threshold in the LSTM model and the discarded information;
and outputting the second characteristic statement through an output threshold in the LSTM model and the required information.
6. A medical text structuring apparatus, comprising:
the grabbing module is used for grabbing the whole section of unstructured medical knowledge text in the medical source text to be processed;
the splitting module is used for identifying all punctuation marks in the unstructured medical knowledge text and splitting the unstructured text into a plurality of first characteristic sentences according to the punctuation marks;
the first acquisition module is used for acquiring a semantic feature vector corresponding to each first feature sentence after the first feature sentences are input into a preset language identification model;
the second acquisition module is used for acquiring a second characteristic sentence output by the preset article semantic recognition model after all the semantic feature vectors are input to the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;
the inserting module is used for calling a first code file of the medical source text to be processed, inquiring the second characteristic sentence from the first code file, and inserting a segmentation symbol into the first code file at a position corresponding to the position to be segmented of the second characteristic sentence to obtain a second code file;
and the display module is used for operating the second code file so as to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the to-be-processed medical source text.
7. The medical text structuring device according to claim 6, further comprising:
the marking module is used for detecting the unstructured medical knowledge text through a preset natural language processing model, marking words with errors in the unstructured text and acquiring a marking result;
and the operation module is used for calling the first code file of the medical source text to be processed, correcting the words with errors in the first code file according to the marking result to obtain a third code file, and operating the third code file to obtain the corrected unstructured medical knowledge text.
8. The medical text structuring device according to claim 6, wherein the preset language recognition model is a bert model; the first obtaining module comprises:
the input sub-module is used for inquiring the word vector of each word in the first characteristic statement through the bert model after the first characteristic statement is input to the bert model;
the selecting submodule is used for selecting one word vector in the first characteristic statement as a Query vector through an Attention mechanism in the bert model, and using other word vectors of the first characteristic statement as Key vectors;
the weighted operation submodule is used for carrying out similarity calculation on the Query vector and each Key vector to obtain a weight coefficient, carrying out weighted operation on the Query vector and a Value corresponding to the Key vector through the weight coefficient to obtain a first enhanced semantic feature vector which is output by the Attention mechanism and corresponds to the Query vector;
the linear conversion submodule is used for performing linear conversion on the first enhanced semantic feature vector through a plurality of push-to-fold Transformer encoders in the bert model to obtain a second enhanced semantic feature vector;
and the combination submodule is used for combining the second enhanced semantic feature vector corresponding to the word vector of each word in the first feature statement to obtain the semantic feature vector corresponding to the first feature statement.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the medical text structuring method according to any one of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a medical text structuring method as defined in any one of claims 1 to 5.
CN202010935255.2A 2020-09-08 2020-09-08 Medical text structuring method, device, computer equipment and storage medium Active CN112016274B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010935255.2A CN112016274B (en) 2020-09-08 2020-09-08 Medical text structuring method, device, computer equipment and storage medium
PCT/CN2020/124215 WO2021164301A1 (en) 2020-09-08 2020-10-28 Medical text structuring method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010935255.2A CN112016274B (en) 2020-09-08 2020-09-08 Medical text structuring method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112016274A true CN112016274A (en) 2020-12-01
CN112016274B CN112016274B (en) 2024-03-08

Family

ID=73516342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010935255.2A Active CN112016274B (en) 2020-09-08 2020-09-08 Medical text structuring method, device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112016274B (en)
WO (1) WO2021164301A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113138773A (en) * 2021-04-19 2021-07-20 杭州科技职业技术学院 Cloud computing distributed service clustering method
WO2021164301A1 (en) * 2020-09-08 2021-08-26 平安科技(深圳)有限公司 Medical text structuring method and apparatus, computer device and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115034204B (en) * 2022-05-12 2023-05-23 浙江大学 Method for generating structured medical text, computer device and storage medium
CN116882496B (en) * 2023-09-07 2023-12-05 中南大学湘雅医院 Medical knowledge base construction method for multistage logic reasoning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032739A (en) * 2019-04-18 2019-07-19 清华大学 Chinese electronic health record name entity abstracting method and system
CN110032648A (en) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 A kind of case history structuring analytic method based on medical domain entity
CN111191456A (en) * 2018-11-15 2020-05-22 零氪科技(天津)有限公司 Method for identifying text segmentation by using sequence label

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222654A (en) * 2019-06-10 2019-09-10 北京百度网讯科技有限公司 Text segmenting method, device, equipment and storage medium
CN112016274B (en) * 2020-09-08 2024-03-08 平安科技(深圳)有限公司 Medical text structuring method, device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191456A (en) * 2018-11-15 2020-05-22 零氪科技(天津)有限公司 Method for identifying text segmentation by using sequence label
CN110032648A (en) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 A kind of case history structuring analytic method based on medical domain entity
CN110032739A (en) * 2019-04-18 2019-07-19 清华大学 Chinese electronic health record name entity abstracting method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王晓璐 等: "基于XML的医学病案案例化研究", 《电脑知识与技术》, vol. 8, no. 25, pages 5952 - 5954 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021164301A1 (en) * 2020-09-08 2021-08-26 平安科技(深圳)有限公司 Medical text structuring method and apparatus, computer device and storage medium
CN113138773A (en) * 2021-04-19 2021-07-20 杭州科技职业技术学院 Cloud computing distributed service clustering method

Also Published As

Publication number Publication date
WO2021164301A1 (en) 2021-08-26
CN112016274B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN112016274A (en) Medical text structuring method and device, computer equipment and storage medium
CN111814466A (en) Information extraction method based on machine reading understanding and related equipment thereof
CN111177184A (en) Structured query language conversion method based on natural language and related equipment thereof
CN109977394B (en) Text model training method, text analysis method, device, equipment and medium
CN115618371A (en) Desensitization method and device for non-text data and storage medium
CN113255583B (en) Data annotation method and device, computer equipment and storage medium
CN113343677B (en) Intention identification method and device, electronic equipment and storage medium
CN112131888A (en) Method, device and equipment for analyzing semantic emotion and storage medium
CN112016300B (en) Pre-training model processing method, pre-training model processing device, downstream task processing device and storage medium
CN114580424A (en) Labeling method and device for named entity identification of legal document
CN112036172A (en) Entity identification method and device based on abbreviated data of model and computer equipment
CN113010679A (en) Question and answer pair generation method, device and equipment and computer readable storage medium
CN115617614A (en) Log sequence anomaly detection method based on time interval perception self-attention mechanism
CN113868419B (en) Text classification method, device, equipment and medium based on artificial intelligence
CN112989829B (en) Named entity recognition method, device, equipment and storage medium
CN110826325A (en) Language model pre-training method and system based on confrontation training and electronic equipment
CN113434652B (en) Intelligent question-answering method, intelligent question-answering device, equipment and storage medium
CN115169370A (en) Corpus data enhancement method and device, computer equipment and medium
CN115238653A (en) Report generation method, device, equipment and medium
Ichise et al. Unified Workbench for Knowledge Graph Management.
CN111796830A (en) Protocol analysis processing method, device, equipment and medium
CN110569401A (en) paper marking method and device, computer equipment and storage medium
CN111863268B (en) Method suitable for extracting and structuring medical report content
EP4361847A1 (en) Method and system for restoring consistency of a digital twin database
CN117540004B (en) Industrial domain intelligent question-answering method and system based on knowledge graph and user behavior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant