CN112016274A

CN112016274A - Medical text structuring method and device, computer equipment and storage medium

Info

Publication number: CN112016274A
Application number: CN202010935255.2A
Authority: CN
Inventors: 朱威; 何义龙
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2020-12-01
Anticipated expiration: 2040-09-08
Also published as: WO2021164301A1; CN112016274B

Abstract

The invention relates to an artificial intelligence technology, is applied to the field of medical text processing, and particularly discloses a medical text structuring method, a medical text structuring device, computer equipment and a storage medium. The method comprises the following steps: capturing an unstructured medical knowledge text; splitting the unstructured text into a plurality of first feature sentences; after the first feature sentence is input into a preset language identification model, a semantic feature vector is obtained; after all semantic feature vectors are input into a preset article semantic recognition model, acquiring output second feature sentences; calling a first code file of the medical source text to be processed, and inserting a segmentation symbol into a position, corresponding to the position to be segmented of the second characteristic sentence, in the first code file to obtain a second code file; and running the second code file to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the medical source text to be processed. The method and the device can improve the conversion efficiency of the structured medical knowledge text.

Description

Medical text structuring method and device, computer equipment and storage medium

Technical Field

The invention relates to the field of intelligent decision making in artificial intelligence, in particular to a medical text structuring method, a medical text structuring device, computer equipment and a storage medium.

Background

At present, a large amount of medical knowledge text texts are contained in the same medical source text, and the texts relate to a plurality of medical knowledge in the medical field, when the medical knowledge texts need to be displayed in an interface, the medical knowledge texts need to be edited effectively by people to be structured and convenient to view, but the text formats of the medical knowledge texts in the source text are generally irregular, and most of the medical knowledge texts are presented in an unstructured form, so that errors of manual editing are easily caused, the editing efficiency is low, and the editing takes much time. Especially when some emerging medical knowledge texts (new product specifications in the medical field and the like) need to be presented to the user, the medical knowledge texts are required to have a structured specific format, such as correct segmentation and reasonable indentation. If the structured medical text which can be externally displayed is formed through manual editing, the method is time-consuming and labor-consuming. Therefore, a new technical solution is needed to solve the above problems.

Disclosure of Invention

Therefore, it is necessary to provide a method, an apparatus, a computer device and a storage medium for structuring a medical text, which are used to avoid the problems of high error rate of manual editing and much time spent on manual editing, and can improve the efficiency of converting an unstructured medical knowledge text into a structured medical knowledge text.

A medical text structuring method, comprising:

grabbing the whole section of unstructured medical knowledge text in the medical source text to be processed;

identifying all punctuation marks in the unstructured medical knowledge text, and splitting the unstructured text into a plurality of first characteristic sentences according to the punctuation marks;

after the first characteristic sentences are input into a preset language identification model, a semantic characteristic vector corresponding to each first characteristic sentence is obtained;

after all the semantic feature vectors are input into a preset article semantic recognition model, acquiring a second feature sentence output by the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;

calling a first code file of the medical source text to be processed, inquiring the second characteristic sentence from the first code file, and inserting a segmentation symbol into a position, corresponding to the position to be segmented of the second characteristic sentence, in the first code file to obtain a second code file;

and running the second code file to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the to-be-processed medical source text.

A medical text structuring apparatus comprising:

the grabbing module is used for grabbing the whole section of unstructured medical knowledge text in the medical source text to be processed;

the splitting module is used for identifying all punctuation marks in the unstructured medical knowledge text and splitting the unstructured text into a plurality of first characteristic sentences according to the punctuation marks;

the first acquisition module is used for acquiring a semantic feature vector corresponding to each first feature sentence after the first feature sentences are input into a preset language identification model;

the second acquisition module is used for acquiring a second characteristic sentence output by the preset article semantic recognition model after all the semantic feature vectors are input to the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;

the inserting module is used for calling a first code file of the medical source text to be processed, inquiring the second characteristic sentence from the first code file, and inserting a segmentation symbol into the first code file at a position corresponding to the position to be segmented of the second characteristic sentence to obtain a second code file;

and the display module is used for operating the second code file so as to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the to-be-processed medical source text.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the above medical text structuring method when executing the computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the above-mentioned medical text structuring method.

According to the medical text structuring method, the medical text structuring device, the computer equipment and the storage medium, the model and the segmentation symbols are used for replacing the mode of manually editing the unstructured medical knowledge text in the medical source text to be processed in the prior art, the problems that manual editing is high in error rate and time is consumed in manual editing are solved, and the efficiency of converting the unstructured medical knowledge text into the structured medical knowledge text is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a diagram illustrating an application environment of a method for structuring medical texts according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a method for structuring medical text in one embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a medical text structuring apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The medical text structuring method provided by the invention can be applied to the application environment shown in fig. 1, wherein a client communicates with a server through a network. Among other things, the client may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

In an embodiment, as shown in fig. 2, a medical text structuring method is provided, which is described by taking the server in fig. 1 as an example, and includes the following steps:

s10, grabbing the whole section of unstructured medical knowledge text in the medical source text to be processed;

understandably, the to-be-processed medical source text may refer to an unstructured medical knowledge text in the medical field in a web page, wherein the unstructured medical knowledge text may include, but is not limited to, formulas of various medical drugs, various medical function descriptions of the medical drugs, or product specifications of the medical drugs on the web page; unstructured medical knowledge text refers to text without a fixed format, where fixed formats include, but are not limited to, text paragraph format, text format, indented format, and space format, because the text in the medical source text to be processed in the embodiment is uploaded by different users, and the formats used by the users in the editing process are inconsistent, the whole text is uploaded by different users of the same medical source text to be processed, the problem that the whole text presented finally has inconsistent text format is solved, and various input components or display components and the like in the medical source text to be processed can cause the problem of incompatible text, copying the whole text from the display component in one medical source text to be processed to the input component in another medical source text to be processed, and possibly changing the previously structured medical knowledge text into an unstructured medical knowledge text; specifically, the capturing of the unstructured medical knowledge text in the medical source text to be processed in this embodiment may be determined by identifying a text, after all texts in the display interface in the medical source text to be processed are identified, the text selected by the user may be used as the unstructured medical knowledge text, or may be identified by the NLP model, and when the NLP model identifies that the text in the medical source text to be processed has a plurality of inconsistent formats or does not have a format, the text in the medical source text to be processed may be captured as the unstructured medical knowledge text.

S20, recognizing all punctuation marks in the unstructured medical knowledge text, and splitting the unstructured text into a plurality of first characteristic sentences according to the punctuation marks;

understandably, the punctuation marks in the unstructured medical knowledge text can be recognized through a punctuation mark recognition component, the punctuation marks can also be recognized through an NLP model, sentences in the unstructured medical knowledge text can be segmented through the recognized punctuation marks, and the plurality of first characteristic sentences are obtained, wherein the punctuation marks in the first characteristic sentences can be other marks which can be used for splitting complete sentences, such as sentence marks, exclamation marks or question marks. In the embodiment, the sentence is split into the plurality of first characteristic sentences, each sentence represents the characteristic of a complete sentence, and the characteristic provides the link relation of the complete sentence in the subsequent semantic recognition process, so that the phenomenon of mixed recognition among the sentences is avoided.

S30, inputting the first feature sentences into a preset language identification model, and then acquiring a semantic feature vector corresponding to each first feature sentence;

understandably, the predetermined language identification model may be a bert model, wherein the bert model may be used to capture the first feature sentence and a level description of each word of the first feature sentence, and the goal of the bert model is to train with a large-scale unlabeled corpus to obtain a representation of the first feature sentence that includes rich semantic information. The core of the bert model is a Transformer module, and the Transformer module is created by using an Attention mechanism, and the created Transformer module can be assembled into the bert model. In this embodiment, a semantic feature vector corresponding to a first feature statement is obtained by using a word-to-sentence relationship in a bert model.

S40, after all the semantic feature vectors are input into a preset article semantic recognition model, acquiring a second feature sentence output by the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;

understandably, the article semantic recognition model is preset as an LSTM model, and the LSTM model aims to memorize information for a long time so as to recognize each complete sentence in the input text. The core processing of the LSTM model is completed through 3 thresholds, wherein the 3 thresholds are a forgetting threshold, an input threshold and an output threshold respectively, in addition, when the LSTM model is combined with the context of an input text, a complete second characteristic statement can be determined, and one second characteristic statement can form two positions to be divided.

S50, calling out a first code file of the medical source text to be processed, inquiring the second characteristic sentence from the first code file, and inserting a segmentation symbol into a position, corresponding to the position to be segmented, of the second characteristic sentence in the first code file to obtain a second code file;

understandably, the first code file represents a background code file corresponding to the medical source text to be processed, and the first code file can be called through a script language; the second characteristic sentence is converted from the unstructured medical knowledge text in the medical source text to be processed, so that the first code file of the second characteristic sentence in the medical source text to be processed also has a character display position (the character display position comprises a plurality of second characteristic sentences), the second characteristic sentence can be specifically inquired and written in the first code file to a code language corresponding to the character display position, and finally, a word corresponding to the second characteristic sentence is identified in the character display position through the code language so as to determine the second characteristic sentence; the segmentation symbol can be understood as an html symbol, and specifically, two segmentation symbols (that is, at least two positions to be segmented are included in the text display position) can be inserted into two positions to be segmented corresponding to the text display position, and the two segmentation symbols form segmentation on one second feature sentence, wherein the segmentation symbols include div symbols, h 1-h 6 title symbols, and the like.

S60, the second code file is operated to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the medical source text to be processed.

Understandably, the second code file is a background code file including segmentation symbols, and the second code file is required to be run when a specific medical source text to be processed is displayed.

Further, after the capturing the whole unstructured medical knowledge text in the medical source text to be processed, the method further includes:

detecting the unstructured medical knowledge text through a preset natural language processing model, marking words with errors in the unstructured medical knowledge text and obtaining a marking result;

calling a first code file of the medical source text to be processed, correcting the words with errors in the first code file according to the marking result to obtain a third code file, and operating the third code file to obtain the corrected unstructured medical knowledge text.

Understandably, the preset natural language processing model can be an NLP model, words with repeated or wrong words appearing in the unstructured medical knowledge text are marked through a semantic recognition function established by the model, and the words with written errors in the first code file are corrected through a marked result, wherein the correction includes deletion of the repeated words and words with wrongly written words.

Further, the first feature statement is stored in a block chain; the preset language identification model is a bert model;

after the first feature sentences are input into a preset language identification model, obtaining a semantic feature vector corresponding to each first feature sentence, including:

after the first characteristic statement is input into the bert model, querying a word vector of each word in the first characteristic statement through the bert model;

selecting one word vector in the first characteristic statement as a Query vector through an Attention mechanism in the bert model, and using other word vectors of the first characteristic statement as Key vectors;

performing similarity calculation on the Query vector and each Key vector to obtain a weight coefficient, and performing weighted operation on the Value values corresponding to the Query vector and the Key vectors through the weight coefficient to obtain a first enhanced semantic feature vector corresponding to the Query vector output by the Attention mechanism;

performing linear conversion on the first enhanced semantic feature vector through a plurality of push-over Transformer encoders in the bert model to obtain a second enhanced semantic feature vector;

and combining the second enhanced semantic feature vectors corresponding to the word vectors of each word in the first feature sentence to obtain the semantic feature vectors corresponding to the first feature sentence.

Understandably, the embodiment mainly allows the model to pay Attention to the input first feature sentence through the Attention mechanism in the bert model; the Attention mechanism in this embodiment includes Query vectors, Key vectors, and valid values, where both the Query vectors and the Key vectors are derived from word vectors, and each word vector has a corresponding valid Value, and the Attention essence can be described as a mapping of a Query (Query) to a series of (Key-Value) pairs; specifically, in this embodiment, first, after a first feature statement is input into a bert model, each word in the first feature statement is queried through the bert model, each word is converted into a one-dimensional word vector through the bert model, then, one word vector of a first feature statement is used as a target vector Query vector, other word vectors in the first feature statement are used as Key vectors, similarity calculation is performed on the Query vectors and each Key vector to obtain a weight coefficient, wherein common similarity functions include, but are not limited to, dot product, concatenation and perceptron, then, a preset softmax function is used to normalize the obtained weight coefficient, and the normalized weight coefficient and a Key value corresponding to the Query vector and the Key vector are subjected to weighted summation operation to obtain a first enhanced semantic feature vector corresponding to the Query vector and output by an Attention mechanism, and finally, performing data processing on the first enhanced semantic feature vector by using each transform Encoder formed by an Attention mechanism, wherein the data processing comprises incomplete connection (directly adding the word vector and the first enhanced semantic feature vector to be finally output), standardization of 0-mean-1 variance and linear conversion (performing linear conversion on the first enhanced semantic feature vector to enhance the expression capability of the bert model) on a certain layer of neural network nodes, and combining second enhanced semantic feature vectors corresponding to each word vector to obtain the semantic feature vectors corresponding to the first feature sentences. The present embodiment uses the bert model as the preset language identification model, and the purpose can be achieved by: 1. the relationship between the first feature sentences, namely the contact context, can be learned; 2. semantic representations (second enhanced semantic feature vectors) at the sentence level are well acquired.

It should be emphasized that, in order to further ensure the privacy and security of the first feature statement, the first feature statement may also be stored in a node of a blockchain. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like. The decentralized fully distributed DNS service provided by the blockchain can realize the query and analysis of the domain name through the point-to-point data transmission service among all nodes in the network, can be used for ensuring that an operating system and firmware of certain important infrastructure are not tampered, can monitor the state and integrity of software, finds out bad tampering, ensures that transmitted data are not tampered, stores the first characteristic statement in the blockchain, and can ensure the privacy and the safety of the first characteristic statement.

Further, after inserting a segmentation symbol into a position in the first code file corresponding to a position to be segmented of the second feature sentence to obtain a second code file, the method further includes:

calling out a corresponding cascading style sheet according to a preset style format, and adding the cascading style sheet into the second code file.

Understandably, the embodiment mainly adds a corresponding cascading style sheet such as color, font number, frame body, etc. in the second code file to show specific format states in the medical source text to be processed, such as color in CSS, font-size in CSS, and box in CSS.

Further, the preset article semantic recognition model is an LSTM model;

after all the semantic feature vectors are input into a preset article semantic recognition model, the method includes:

selecting discarded information through a forgetting threshold in the LSTM model;

selecting required information from the semantic feature vector through an input threshold in the LSTM model and the discarded information;

and outputting the second characteristic statement through an output threshold in the LSTM model and the required information.

Understandably, the LSTM model is a threshold RNN, the key of which is the cell state, and therefore, each threshold of the LSTM model design is the ability to remove or add information to the cell state (which can be regarded as a semantic feature vector), wherein each threshold comprises a Sigmoid neural network layer and a pointwise multiplication operation, the Sigmoid neural network layer outputs a value between 0 and 1 to describe how much of each part can pass, 0 stands for no pass, and 1 stands for pass; the forgetting gate can determine the discarded information in the cell state, the discarded information is the subject corresponding to the last semantic feature vector, the input threshold can update the stored information in the cell state, specifically, the discarded information is discarded from the semantic feature vector by the input threshold, the required information required to be updated is determined from the discarded semantic feature vector, the second feature sentence with the output threshold determined is output, and the second feature sentence is output according to the determined required information in the input threshold.

In summary, the above-mentioned method for structuring medical texts is provided, where the model and the segmentation symbol are used to replace the original work of manually editing the unstructured medical knowledge text in the medical source text to be processed, so as to avoid the problems of high error rate of manual editing and long time spent on manual editing, and improve the efficiency of converting the unstructured medical knowledge text into the structured medical knowledge text. The method can be applied to intelligent medical treatment, so that the construction of an intelligent city is promoted.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In an embodiment, a medical text structuring apparatus is provided, which corresponds to the medical text structuring method in the above embodiments one to one. As shown in fig. 3, the medical text structuring device comprises a grabbing module 11, a splitting module 12, a first obtaining module 13, a second obtaining module 14, an inserting module 15 and a displaying module 16. The functional modules are explained in detail as follows:

the grabbing module 11 is used for grabbing the whole unstructured medical knowledge text in the medical source text to be processed;

the splitting module 12 is configured to identify all punctuation marks in the unstructured medical knowledge text, and split the unstructured text into a plurality of first feature sentences according to the punctuation marks;

a first obtaining module 13, configured to obtain a semantic feature vector corresponding to each first feature sentence after the first feature sentence is input to a preset language identification model;

the second obtaining module 14 is configured to obtain a second feature sentence output by a preset article semantic recognition model after all the semantic feature vectors are input to the preset article semantic recognition model; the second characteristic sentence comprises a preset number of positions to be segmented, which are determined by the preset article semantic recognition model according to the context association relation of the unstructured medical knowledge text;

the inserting module 15 is configured to call a first code file of the medical source text to be processed, query the second feature sentence from the first code file, and insert a segmentation symbol in a position of the first code file corresponding to a position to be segmented of the second feature sentence to obtain a second code file;

a display module 16, configured to run the second code file to display the structured medical knowledge text corresponding to the unstructured medical knowledge text on the to-be-processed medical source text.

Further, the medical text structuring apparatus further comprises:

the marking module is used for detecting the unstructured medical knowledge text through a preset natural language processing model, marking words with errors in the unstructured text and acquiring a marking result;

and the operation module is used for calling the first code file of the medical source text to be processed, correcting the words with errors in the first code file according to the marking result to obtain a third code file, and operating the third code file to obtain the corrected unstructured medical knowledge text.

Further, the preset language identification model is a bert model, and the first obtaining module includes:

the input sub-module is used for inquiring the word vector of each word in the first characteristic statement through the bert model after the first characteristic statement is input to the bert model;

the selecting submodule is used for selecting one word vector in the first characteristic statement as a Query vector through an Attention mechanism in the bert model, and using other word vectors of the first characteristic statement as Key vectors;

the weighted operation submodule is used for carrying out similarity calculation on the Query vector and each Key vector to obtain a weight coefficient, carrying out weighted operation on the Query vector and a Value corresponding to the Key vector through the weight coefficient to obtain a first enhanced semantic feature vector which is output by the Attention mechanism and corresponds to the Query vector;

the linear conversion submodule is used for performing linear conversion on the first enhanced semantic feature vector through a plurality of push-to-fold Transformer encoders in the bert model to obtain a second enhanced semantic feature vector;

and the combination submodule is used for combining the second enhanced semantic feature vector corresponding to the word vector of each word in the first feature statement to obtain the semantic feature vector corresponding to the first feature statement.

Further, the medical text structuring apparatus further comprises:

and the adding module is used for calling out a corresponding cascading style sheet according to a preset style format and adding the cascading style sheet into the second code file.

Further, the preset article semantic recognition model is an LSTM model, and the second obtaining module includes:

the first selection submodule is used for selecting discarded information through a forgetting threshold in the LSTM model;

a second selection sub-module for selecting the required information from the semantic feature vector through the input threshold in the LSTM model and the discarded information;

and the output sub-module is used for outputting the second characteristic statement through an output threshold in the LSTM model and the required information.

For specific definitions of the medical text structuring apparatus, reference may be made to the above definitions of the medical text structuring method, which are not further described herein. The various modules in the medical text structuring apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data involved in the medical text structuring method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a medical text structuring method.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the steps of the medical text structuring method in the above-mentioned embodiments, such as steps S10 to S60 shown in fig. 2. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units of the medical text structuring apparatus in the above-described embodiments, such as the functions of the modules 11 to 16 shown in fig. 3. To avoid repetition, further description is omitted here.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the medical text structuring method in the above-described embodiments, such as the steps S10 to S30 shown in fig. 2. Alternatively, the computer program, when executed by the processor, implements the functions of the modules/units of the medical text structuring apparatus in the above-described embodiments, such as the functions of the modules 11 to 16 shown in fig. 3. To avoid repetition, further description is omitted here.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A medical text structuring method, comprising:

2. The method for structuring medical texts according to claim 1, wherein after the capturing of the unstructured medical knowledge text of the whole segment in the medical source text to be processed, the method further comprises:

3. The medical text structuring method according to claim 1, wherein the preset language recognition model is a bert model;

4. The method according to claim 1, wherein after inserting a segmentation symbol into the first code file at a position corresponding to a position to be segmented of the second eigen-sentence, and obtaining a second code file, the method further comprises:

calling a corresponding cascading style sheet according to a preset style format, and nesting the cascading style sheet into the second code file.

5. The medical text structuring method according to claim 1, wherein the preset article semantic recognition model is an LSTM model;

6. A medical text structuring apparatus, comprising:

7. The medical text structuring device according to claim 6, further comprising:

8. The medical text structuring device according to claim 6, wherein the preset language recognition model is a bert model; the first obtaining module comprises:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the medical text structuring method according to any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a medical text structuring method as defined in any one of claims 1 to 5.