CN118072901A - Outpatient electronic medical record generation method and system based on voice recognition - Google Patents

Outpatient electronic medical record generation method and system based on voice recognition Download PDF

Info

Publication number
CN118072901A
CN118072901A CN202410465169.8A CN202410465169A CN118072901A CN 118072901 A CN118072901 A CN 118072901A CN 202410465169 A CN202410465169 A CN 202410465169A CN 118072901 A CN118072901 A CN 118072901A
Authority
CN
China
Prior art keywords
voice
training
medical record
electronic medical
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410465169.8A
Other languages
Chinese (zh)
Inventor
张爱珍
马丽
刘晓春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
People's Liberation Army Navy Navy Qingdao Special Service Sanatorium
Original Assignee
People's Liberation Army Navy Navy Qingdao Special Service Sanatorium
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by People's Liberation Army Navy Navy Qingdao Special Service Sanatorium filed Critical People's Liberation Army Navy Navy Qingdao Special Service Sanatorium
Priority to CN202410465169.8A priority Critical patent/CN118072901A/en
Publication of CN118072901A publication Critical patent/CN118072901A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention belongs to the field of voice recognition, in particular to a method and a system for generating an outpatient electronic medical record based on voice recognition, which are characterized in that a small amount of original voice data sets are used for carrying out WGAN model training, and more similar voices generated by training are used for expanding the original voice data sets to form a voice recognition data set for training a voice recognition model; processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, carrying out structuring processing on the communication text, generating and outputting an electronic medical record of a current patient. According to the invention, the electronic medical record of the current patient is generated through the voice recognition model, so that doctors have more time to concentrate on patient care, the working efficiency is improved, and the operation cost is reduced.

Description

Outpatient electronic medical record generation method and system based on voice recognition
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to an outpatient electronic medical record generation method and system based on voice recognition.
Background
With the rapid development of information technology, the medical industry is gradually introducing advanced information technology to improve the working efficiency and the service quality. The accuracy and integrity of the outpatient medical record, which is an important document for recording patient visit information, is critical for doctors to formulate treatment plans and subsequent health management of patients. However, conventional medical records are often dependent on manual input by a physician, which is time consuming, labor consuming, and prone to omission and error. Therefore, how to efficiently and accurately generate the outpatient medical records becomes a problem to be solved.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a system for generating an outpatient electronic medical record based on voice recognition, which are used for converting the dialogue content between a doctor and a patient into characters by utilizing a voice recognition technology so as to generate a structured medical record, so as to solve the technical problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions.
In a first aspect of the present invention, there is provided a method for generating an electronic medical record for an outpatient service based on speech recognition, the method comprising the steps of:
Training WGAN models by using a small amount of original voice data sets, and expanding the original voice data sets by using more similar voices generated by training to form voice recognition data sets for training a voice recognition model;
Dividing a voice recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;
and acquiring communication voice between a doctor and a patient in real time by using a pickup, processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, carrying out structural processing on the communication text, generating and outputting an electronic medical record of the current patient.
In some embodiments of the invention, the objective function of the WGAN model is expressed as:
(1);
Where x represents a small number of original speech segments of the original speech data set, Is a fixed distribution over x, sup represents the upper bound of all 1-Lipschitz Li Pu litz continuous functions f in equation (1), with one implicit random variable space/>Generator g is/>Mapping to x,/>Is a parameter in the mapping process, i.e./>,/>Representative is a parameterIs a distribution function of (a);
The speech converted from the original speech segment is input to the formula (1) for training, Representing the expectations of the original speech as a true sample, and/>Representing the desire to generate speech,/>And/>When Nash equilibrium is reached, training is stopped, and more similar voices are obtained; when Nash equilibrium cannot be reached, training is continued until Nash equilibrium can be reached.
In some embodiments of the invention, the step of constructing an LFR-DFSMN model includes:
constructing a basic network structure, wherein the basic network structure comprises an input layer, a hidden layer, CFSMN layers, DFSMN layers and an output layer;
each CFSMN layer is introduced with a memory module, jump connection (skip-connection) is further introduced between the memory modules of the adjacent CFSMN layers, and step length factors s are added to form DFSMN layers of the model;
and integrating an input layer, a hidden layer, a CFSMN layer, a DFSMN layer and an output layer of the basic network structure to obtain the LFR-DFSMN model.
In some embodiments of the invention, the firstThe output of the layer memory module at the t-th moment is expressed as:
(2);
wherein, Represents the/>-An output of the layer 1 memory module at time t; /(I)Represents the/>Outputting the layer memory module at the time t; /(I)And/>Representing the order of review and the order of forward looking, respectively; /(I)Representing element multiplication; /(I)Representation/>In the layer, the weight of the ith historical moment time step; /(I)Representation/>In the layer, the weight of the j-th future time step; /(I)Representation/>Layer j future time step,/>Is a step size factor at a future time; /(I)Representation/>Layer i historical time step,/>Is the step size factor of the historic time.
In some embodiments of the present invention, the LFR-DFSMN model constructed uses 8 layers DFSMN, where the number of cells per layer DFSMN (the number of neurons in the hidden layer) is set to 1024, the front-to-back step size is set to 40, and the DFSMN output is normalized and then carried into the activation function swish, resulting in the next layer output, where the regularized loss rate is set to 0.5 in each layer DFSMN.
In some embodiments of the present invention, the step of processing the alternating speech to obtain a plurality of speech frame segments with a time sequence comprises:
noise reduction processing is carried out on alternating current voice by utilizing a noise suppression algorithm, and continuous voice signals are divided into a series of short frames;
Windowing is carried out on each short frame by applying a Hamming window function, FFT is carried out on the windowed voice frame, and the voice frame is converted from a time domain to a frequency domain, so that the frequency spectrum representation of the frame is obtained;
Processing the FFT results by a mel filter bank, which typically consists of a set of triangular filters that cover the hearing range of the human ear and output the response of each filter, simulating the sensitivity of the human ear to different frequencies;
calculating the logarithmic energy of the response of each Mel filter, and performing discrete cosine transform DCT on the logarithmic energy to obtain MFCCs features;
And splicing the features of the continuous frames to form a feature sequence, and capturing the time continuity of the voice to obtain a plurality of voice frame fragments with the time sequence.
In some embodiments of the present invention, the step of structuring the communication text to generate and output an electronic medical record of the current patient includes:
removing irrelevant information in the communication text, wherein the irrelevant information comprises non-medical greetings and non-critical words;
dividing the text into words or phrases, marking the parts of speech of the divided text, and identifying the parts of speech;
identifying medical related entities in the text, and identifying relationships among the entities;
and mapping the identified entities and relations to corresponding fields of the template according to the defined structured template of the electronic medical record, and generating the electronic medical record of the current patient.
In a second aspect, in another embodiment of the present invention, there is provided an outpatient electronic medical record generating system based on voice recognition, the outpatient electronic medical record generating system including:
The corpus creation module is used for training WGAN models by using a small amount of original voice data sets, expanding the original voice data sets by using more similar voices generated by training, and forming voice recognition data sets for training the voice recognition models;
the voice recognition model training module is used for dividing a voice recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;
The electronic medical record generation module is used for acquiring the communication voice of a doctor and a patient in real time by utilizing the pickup, processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, and carrying out structural processing on the communication text to generate and output the electronic medical record of the current patient.
In a third aspect, in yet another embodiment of the present invention, there is provided a computer apparatus comprising:
A memory for storing a computer program;
A processor for implementing the method for generating an electronic medical record for clinic based on voice recognition as provided in the first aspect when executing the computer program.
In a fourth aspect, in a further embodiment of the present invention, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the speech recognition based outpatient electronic medical record generation method as provided in the first aspect.
Compared with the prior art, the outpatient electronic medical record generation method and system based on voice recognition have the beneficial effects that:
Firstly, according to the voice recognition model provided by the invention, a memory module is introduced into each CFSMN layer, jump connection (skip-connection) is further introduced between the memory modules of the adjacent CFSMN layers, step factors (stride) are added, DFSMN layers of the model are formed, LFR is adopted in an input layer, specifically, adjacent time voice frame fragments are bound and used as input of the voice recognition model, target output of the voice frame fragments is predicted, and alternating text is obtained, so that input and output are reduced, and efficiency in voice recognition is greatly improved;
Secondly, when the invention is used for training the voice recognition model, the WGAN model is trained by a small amount of voice materials to expand the corpus used for training the voice recognition model, so that the problems of few samples and unbalanced samples of the corpus can be solved, and the WGAN model is simple and fast in convergence, low in calculation complexity, fast in response speed and strong in operability;
Thirdly, the invention generates the electronic medical record of the current patient by carrying out structuring treatment on the communication text obtained by the voice recognition model, thereby accelerating the speed of medical record recording and enabling doctors to have more time to concentrate on patient nursing; and the dictation content of the doctor can be more accurately identified and transcribed, so that the accuracy of medical records is improved; and through the automatic medical record process, the voice recognition technology can simplify the medical workflow, improve the working efficiency, thereby reducing the operation cost.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
In the figure:
FIG. 1 is an application scenario architecture diagram of an outpatient electronic medical record generating method based on speech recognition according to the present invention;
FIG. 2 is a flow chart of an implementation of a method for generating an electronic medical record for clinic based on voice recognition;
FIG. 3 is a sub-flowchart of a method for generating an electronic medical record for an clinic based on speech recognition according to the present invention;
FIG. 4 is another sub-flowchart of a method for generating an electronic medical record for an clinic based on speech recognition according to the present invention;
FIG. 5 is a further sub-flowchart of a method for generating an electronic medical record for an clinic based on speech recognition according to the present invention;
FIG. 6 is a block diagram of an outpatient electronic medical record generating system based on speech recognition according to the present invention;
fig. 7 is a block diagram of the electronic medical record generating module in the outpatient electronic medical record generating system according to the present invention.
Detailed Description
The present application will be further described with reference to the accompanying drawings and detailed description, wherein it is to be understood that, on the premise of no conflict, the following embodiments or technical features may be arbitrarily combined to form new embodiments.
In order to make the objects, technical solutions and advantages of the present application more apparent, the following embodiments of the present application will be described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two non-identical entities with the same name or non-identical parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such as a process, method, system, article, or other step or unit that comprises a list of steps or units.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Currently, the outpatient medical record is taken as an important document for recording the patient treatment information, and the accuracy and the integrity of the outpatient medical record are important for a doctor to formulate a treatment scheme and the subsequent health management of the patient. However, conventional medical records are often dependent on manual input by a physician, which is time consuming, labor consuming, and prone to omission and error. Therefore, how to efficiently and accurately generate the outpatient medical records becomes a problem to be solved.
In order to solve the problems, the voice recognition model provided by the invention binds the voice frame fragments at adjacent moments as the input of the voice recognition model to predict the target output of the voice frame fragments and obtain the alternating text, so that the input and the output can be reduced, and the efficiency in voice recognition is greatly improved; when the method is used for training the speech recognition model, the WGAN model is trained through a small amount of speech materials, so that a corpus used for training the speech recognition model is expanded, and the problems of few samples and unbalanced samples of the corpus can be solved; through carrying out structural processing on the communication text obtained by the voice recognition model, generating the electronic medical record of the current patient, the speed of medical record recording can be increased, and a doctor can have more time to concentrate on patient care; and the dictation of the doctor can be more accurately identified and transcribed, so that the accuracy of medical records is improved.
Referring to fig. 1, an application scenario chart diagram of the outpatient electronic medical record generating method based on voice recognition is shown, wherein a sound pick-up is installed in a doctor and patient inquiry environment; the system is used for recording the dialogue between the patient and the doctor so as to acquire communication voice; the communication voice data acquired by the pickup are sent to a database server;
The invention utilizes the database server to train the voice recognition model and recognize the communication voice, recognizes the communication voice into communication text, and further constructs the communication text to generate temporary electronic medical records, the temporary electronic medical records are temporarily stored in the buffer, and a terminal computer used by a doctor can display, modify, store and other operations on the electronic medical records in the buffer; and further returning the final electronic medical record to the database server;
After the clinic is finished, the patient can print the electronic medical record of the patient by the printer so as to obtain paper medical records.
In an exemplary embodiment of the present disclosure, the microphone and the database server are connected through a network. The network may include various types of wired or wireless communication links;
for example: the wired communication link includes optical fiber, and the wireless communication link includes a Bluetooth communication link, a wireless fidelity (WIreless-FIdelity, wi-Fi) communication link.
The specific implementation of the outpatient electronic medical record generating method based on voice recognition according to the present invention is described in detail below with reference to specific embodiments.
As shown in fig. 2, in one embodiment of the present invention, there is provided a method for generating an electronic medical record for clinic based on voice recognition, including the steps of:
Step S101: training WGAN models by using a small amount of original voice data sets, and expanding the original voice data sets by using more similar voices generated by training to form voice recognition data sets for training a voice recognition model;
Step S102: dividing a voice recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;
Step S103: and acquiring communication voice between a doctor and a patient in real time by using a pickup, processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, carrying out structural processing on the communication text, generating and outputting an electronic medical record of the current patient.
Preferably, the objective function of the WGAN model in step S101 in the embodiment of the present invention is expressed as:
(1);
Where x represents a small number of original speech segments of the original speech data set, Is a fixed distribution over x, sup represents the upper bound of all 1-Lipschitz Li Pu litz continuous functions f in equation (1), with one implicit random variable space/>Generator g is/>Mapping to x,/>Is a parameter in the mapping process, i.e./>,/>Representative is a parameterIs a distribution function of (a);
The speech converted from the original speech segment is input to the formula (1) for training, Representing the expectations of the original speech as a true sample, and/>Representing the desire to generate speech,/>And/>When Nash equilibrium is reached, training is stopped, and more similar voices are obtained; when Nash equilibrium cannot be reached, training is continued until Nash equilibrium can be reached.
When the method is used for training the speech recognition model, the WGAN model is trained through a small amount of speech materials, so that a corpus used for training the speech recognition model is expanded, the problems of few samples and unbalanced samples of the corpus can be solved, and the WGAN model is simple and fast in convergence, low in calculation complexity, high in response speed and strong in operability.
Further, as shown in fig. 2 and 3, the method of the present invention includes the steps of:
Step S201: constructing a basic network structure, wherein the basic network structure comprises an input layer, a hidden layer, CFSMN layers, DFSMN layers and an output layer;
step S202: each CFSMN layer is introduced with a memory module, jump connection (skip-connection) is further introduced between the memory modules of the adjacent CFSMN layers, and step factors (stride) are added to form DFSMN layers of the model;
Step S203: integrating an input layer, a hidden layer, a CFSMN layer, a DFSMN layer and an output layer of the basic network structure to obtain an LFR-DFSMN model;
Wherein, in the LFR-DFSMN model of the invention, the first The output of the layer memory module at the t-th moment is expressed as:
(2);
wherein, Represents the/>-An output of the layer 1 memory module at time t; /(I)Represents the/>Outputting the layer memory module at the time t; /(I)And/>Representing the order of review and the order of forward looking, respectively; /(I)Representing element multiplication; /(I)Representation/>In the layer, the weight of the ith historical moment time step; /(I)Representation/>In the layer, the weight of the j-th future time step; /(I)Representation/>Layer j future time step,/>Is a step size factor at a future time; /(I)Representation/>Layer i historical time step,/>Is the step size factor of the historic time.
Preferably, the LFR-DFSMN model constructed uses 8 layers DFSMN, wherein the number of units (the number of neurons in the hidden layer) of each layer DFSMN is set to 1024, the front-back step size is set to 40, and the output of DFSMN is carried into the activation function swish after normalization processing, so that the output of the next layer is finally obtained, and in each layer DFSMN, the loss rate of the normalization processing is set to 0.5.
According to the voice recognition model, a memory module is introduced into each CFSMN layers, jump connection is further introduced between the memory modules of the adjacent CFSMN layers, step factors (stride) are added, DFSMN layers of the model are formed, LFR is adopted by an input layer, specifically, voice frame fragments at adjacent moments are bound and used as input of the voice recognition model, target output of the voice frame fragments is predicted, alternating text is obtained, input and output can be reduced, and efficiency in voice recognition is greatly improved.
In the embodiment of the invention, in the process of acquiring the alternating-current voice of doctors and patients, a microphone of a pickup or other recording equipment is used for capturing the alternating-current voice signal, the analog voice signal is converted into a digital signal, and the proper sampling rate is selected through a sampling theorem, and the sampling rate is 16kHz; typically the sampling rate determines the maximum frequency that can be captured, and other sampling rates, such as 8kHz, 44.1kHz, etc., may be used in implementations.
Further, as shown in fig. 4, in the embodiment of the present invention, the step of processing the ac speech to obtain a plurality of speech frame segments with time sequences includes:
Step S301: noise reduction processing is performed on alternating-current voice by using a noise suppression algorithm, a continuous voice signal is divided into a series of short frames, each frame usually comprises a fixed number of sampling points, such as a window of 20ms or 30ms, and the window length is usually related to the sampling rate;
Step S302: windowing is performed on each short frame by applying a hamming window function to reduce the discontinuity of frame boundaries, which helps to reduce the frequency spectrum leakage effect in frequency domain analysis; performing FFT on the windowed speech frame, and converting the speech frame from a time domain to a frequency domain to obtain a frequency spectrum representation of the frame;
Step S303: processing the FFT results by a mel filter bank, which typically consists of a set of triangular filters that cover the hearing range of the human ear and output the response of each filter, simulating the sensitivity of the human ear to different frequencies;
Step S304: calculating the logarithmic energy of the response of each Mel filter, and performing discrete cosine transform DCT on the logarithmic energy to obtain MFCCs features, MFCCs being a common feature in speech recognition;
Step S305: and splicing the features of the continuous frames to form a feature sequence, and capturing the time continuity of the voice to obtain a plurality of voice frame fragments with the time sequence.
Furthermore, in the process of processing the communication voice, the characteristics of the front frame and the rear frame are considered to construct a context window, so that the method is beneficial to capturing the context information of the voice; further, post-processing of features is also required, including feature filling and feature normalization:
And (3) feature filling: for short audio segments, padding features may be required to maintain consistent length;
Feature standardization: before obtaining a plurality of speech frame segments with a time sequence, the features need to be normalized to ensure that they are on the same scale.
By the above steps you can get a series of speech frame fragments with time series information from the communication speech, which fragments can be used as input for speech recognition or other speech processing tasks. It should be noted that the specific steps and parameter settings of feature extraction need to be adjusted according to the specific application scenario and data set.
Further, in the embodiment of the present invention, the text of the communication is structured, which involves Natural Language Processing (NLP) and information extraction technology, and text data of the communication between the doctor and the patient needs to be collected before the structuring process, which may include diagnosis records of the doctor, nursing records, self-description of the patient, and the like.
Further, as shown in fig. 5, preprocessing needs to be performed on the text of the communication, including:
step S401: removing irrelevant information in the communication text, wherein the irrelevant information comprises non-medical greetings, non-critical words and the like;
step S402: dividing the text into words or phrases, marking the parts of speech of the divided text, and identifying the parts of speech; such as nouns, verbs, adjectives, and the like.
Further, it is also necessary to identify entities in the text, specifically including:
Step S403: medical related entities in the text are identified, and relationships between the entities are identified.
Wherein, the medical related entity in the text is identified, such as disease name, drug name, symptom description, etc.; relationships between entities are identified, for example, an association between a symptom and a particular disease is determined.
Further, the method further comprises the following steps:
step S404: and mapping the identified entities and relations to corresponding fields of the template according to the defined structured template of the electronic medical record, and generating the electronic medical record of the current patient.
In the embodiment of the invention, a structured template, such as basic information, medical history, diagnosis result, treatment scheme and the like of a patient, is defined according to the requirements of the electronic medical record; the identified entities and relationships are mapped into corresponding fields of the template.
Preferably, the step of structuring the alternating text may further include:
and (3) entity filling: the medical entities identified by NER are populated to the corresponding locations of the template.
And (3) relation filling: the relation among the entities is filled into the template according to the logic sequence, so that the continuity and the accuracy of the information are ensured;
Rule checking: checking the filled information by using a predefined rule to ensure the integrity and consistency of the information;
correcting errors: for errors or inconsistencies found in the verification process, performing manual correction or performing automatic correction by using a machine learning model;
Extracting time information: extracting time-related information from the text, such as the time at which symptoms appear, the time at which treatment begins, etc.;
time line construction: constructing a time line of the patient's condition according to the extracted time information, so as to track the condition change and the treatment progress;
Integrating information: integrating all the structured information into an electronic medical record template to generate a complete electronic medical record document;
formatting the output: formatting the generated electronic medical record to ensure that the electronic medical record meets the medical standard and the readability requirement;
dynamically updating: with the change of the patient's illness state and the generation of new medical information, the electronic medical record is updated regularly, so as to ensure that the latest medical condition is reflected.
Through the steps, the key medical information can be extracted from the unstructured alternating text and structured into the electronic medical record. The process can greatly improve the efficiency and quality of medical records, and is convenient for medical staff to manage and analyze patient information. It is noted that the processing of medical information requires compliance with associated privacy regulations and standards.
According to the invention, through carrying out structural processing on the communication text obtained by the voice recognition model, the electronic medical record of the current patient is generated, so that the speed of medical record recording can be increased, and a doctor can have more time to concentrate on patient care; and the dictation content of the doctor can be more accurately identified and transcribed, so that the accuracy of medical records is improved; and through the automatic medical record process, the voice recognition technology can simplify the medical workflow, improve the working efficiency, thereby reducing the operation cost.
It is noted that the above-described figures are only schematic illustrations of processes involved in a method according to an exemplary embodiment of the invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
It should be understood that although described in a certain order, the steps are not necessarily performed sequentially in the order described. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, some steps of the present embodiment may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.
In a second aspect, as shown in fig. 6, in another embodiment of the present invention, there is provided an outpatient electronic medical record generating system based on voice recognition, the outpatient electronic medical record generating system including:
the corpus creation module 501 is configured to perform WGAN model training by using a small amount of original speech data sets, and expand the original speech data sets by using more similar speech generated by training to form a speech recognition data set for speech recognition model training;
The speech recognition model training module 502 is configured to divide a speech recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;
The electronic medical record generating module 503 is configured to obtain, in real time, communication voice between a doctor and a patient by using a pickup, process the communication voice to obtain a plurality of voice frame segments with time sequences, bind the voice frame segments at adjacent moments as input of a voice recognition model, predict target output of the voice frame segments to obtain communication text, and perform structural processing on the communication text to generate and output an electronic medical record of the current patient.
Further, as shown in fig. 7, in an embodiment of the present invention, the electronic medical record generating module further includes:
A sound pick-up 5031 for acquiring communication voice between a doctor and a patient in real time;
a speech processor 5032 for processing the alternating speech to obtain a plurality of speech frame segments with a time sequence;
The voice recognition submodule 5033 is used for binding voice frame fragments at adjacent moments as input of a voice recognition model to predict target output of the voice frame fragments so as to obtain an alternating text;
The text structuring processing sub-module 5034 is configured to perform structuring processing on the communication text, generate an electronic medical record of the current patient, and output the electronic medical record.
The outpatient electronic medical record generating system based on voice recognition has wide application prospect in the field of intelligent outpatient service, can remarkably improve the diagnosis and treatment efficiency of patients, and provides higher-level support for the creation of intelligent hospitals.
In a third aspect of the embodiment of the present invention, there is also provided a computer device, including a memory and a processor, where the memory stores a computer program, and the computer program implements the method for generating an electronic medical record for outpatient service based on speech recognition according to any one of the embodiments.
The memory is used as a non-volatile computer readable storage medium and can be used for storing non-volatile software programs, non-volatile computer executable programs and modules, such as program instructions/modules corresponding to the outpatient electronic medical record generating method based on voice recognition in the embodiment of the application. The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by use of a voice recognition based outpatient electronic medical record generation method, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the local module through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor is typically used to control the overall operation of the computer device. In this embodiment, the processor is configured to execute the program code stored in the memory or process the data. The processors of the multiple computer devices of the computer device of the embodiment execute various functional applications and data processing of the server by running nonvolatile software programs, instructions and modules stored in the memory, that is, the steps of the outpatient electronic medical record generating method based on voice recognition in the embodiment of the method are implemented.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
Finally, it should be noted that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, RAM may be available in a variety of forms such as synchronous RAM (DRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform such functions: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP and/or any other such configuration.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (10)

1. The outpatient electronic medical record generation method based on voice recognition is characterized by comprising the following steps of:
Training WGAN models by using a small amount of original voice data sets, and expanding the original voice data sets by using more similar voices generated by training to form voice recognition data sets for training a voice recognition model;
Dividing a voice recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;
and acquiring communication voice between a doctor and a patient in real time by using a pickup, processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, carrying out structural processing on the communication text, generating and outputting an electronic medical record of the current patient.
2. The method for generating an electronic medical record for an clinic based on voice recognition according to claim 1, wherein the objective function of the WGAN model is expressed as:
(1);
in the formula (1): x represents the original speech segment of a small number of original speech data sets, Is a fixed distribution over x, sup represents the upper bound of all 1-Lipschitz Li Pu litz continuous functions f in equation (1), with an implicit random variable space in equation (1)Generator g is/>Mapping to x,/>Is a parameter in the mapping process, i.e./>,/>Representative is a parameterIs a distribution function of (a);
The speech converted from the original speech segment is input to the formula (1) for training, Representing the expectations of the original speech as a true sample, and/>Representing the desire to generate speech,/>And/>When Nash equilibrium is reached, training is stopped, and more similar voices are obtained; when Nash equilibrium cannot be reached, training is continued until Nash equilibrium can be reached.
3. The method for generating an electronic medical record for an outpatient service based on speech recognition according to claim 2, wherein the step of constructing an LFR-DFSMN model comprises:
constructing a basic network structure, wherein the basic network structure comprises an input layer, a hidden layer, CFSMN layers, DFSMN layers and an output layer;
Each CFSMN layer is introduced with a memory module, jump connection is further introduced between the memory modules of the adjacent CFSMN layers, and step length factors s are added to form DFSMN layers of the model;
and integrating an input layer, a hidden layer, a CFSMN layer, a DFSMN layer and an output layer of the basic network structure to obtain the LFR-DFSMN model.
4. The method for generating an electronic medical record for an outpatient service based on speech recognition according to claim 3, wherein the first step isThe output of the layer memory module at the t-th moment is expressed as:
(2);
wherein, Represents the/>-An output of the layer 1 memory module at time t; /(I)Represents the/>Outputting the layer memory module at the time t; /(I)And/>Representing the order of review and the order of forward looking, respectively; /(I)Representing element multiplication; /(I)Representation/>In the layer, the weight of the ith historical moment time step; /(I)Representation/>In the layer, the weight of the j-th future time step; /(I)Representation/>Layer j future time step,/>Is a step size factor at a future time; /(I)Representation/>Layer i historical time step,/>Is the step size factor of the historic time.
5. The method for generating an electronic medical record for clinic based on voice recognition according to claim 4, wherein the LFR-DFSMN model is constructed by 8 layers DFSMN, wherein the number of units (the number of neurons in a hidden layer) of each layer DFSMN is set to 1024, the front-back step size is set to 40, the output of DFSMN is normalized and then substituted into the activation function swish, and finally the output of the next layer is obtained, and the loss rate of regularization is set to 0.5 in each layer DFSMN.
6. The method for generating an electronic medical record for an outpatient service based on voice recognition according to any one of claims 2 to 5, wherein the step of processing the communication voice to obtain a plurality of voice frame segments with a time sequence includes:
noise reduction processing is carried out on alternating current voice by utilizing a noise suppression algorithm, and continuous voice signals are divided into a series of short frames;
Windowing is carried out on each short frame by applying a Hamming window function, FFT is carried out on the windowed voice frame, and the voice frame is converted from a time domain to a frequency domain, so that the frequency spectrum representation of the frame is obtained;
processing the FFT result by a mel-filter bank, typically consisting of a set of triangular filters, and outputting the response of each filter;
calculating the logarithmic energy of the response of each Mel filter, and performing discrete cosine transform DCT on the logarithmic energy to obtain MFCCs features;
And splicing the features of the continuous frames to form a feature sequence, and capturing the time continuity of the voice to obtain a plurality of voice frame fragments with the time sequence.
7. The method for generating an electronic medical record for an outpatient service based on speech recognition according to claim 6, wherein the step of structuring the text of the communication to generate and output an electronic medical record for the current patient comprises:
removing irrelevant information in the communication text, wherein the irrelevant information comprises non-medical greetings and non-critical words;
dividing the text into words or phrases, marking the parts of speech of the divided text, and identifying the parts of speech;
identifying medical related entities in the text, and identifying relationships among the entities;
and mapping the identified entities and relations to corresponding fields of the template according to the defined structured template of the electronic medical record, and generating the electronic medical record of the current patient.
8. An outpatient electronic medical record generating system based on voice recognition, which is characterized by comprising:
The corpus creation module is used for training WGAN models by using a small amount of original voice data sets, expanding the original voice data sets by using more similar voices generated by training, and forming voice recognition data sets for training the voice recognition models;
the voice recognition model training module is used for dividing a voice recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;
The electronic medical record generation module is used for acquiring the communication voice of a doctor and a patient in real time by utilizing the pickup, processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, and carrying out structural processing on the communication text to generate and output the electronic medical record of the current patient.
9. A computer device, comprising:
A memory for storing a computer program;
A processor for implementing the method for generating an electronic medical record for clinic based on speech recognition according to any one of claims 1-7 when executing the computer program.
10. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the speech recognition based outpatient electronic medical record generation method according to any of claims 1 to 7.
CN202410465169.8A 2024-04-18 2024-04-18 Outpatient electronic medical record generation method and system based on voice recognition Pending CN118072901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410465169.8A CN118072901A (en) 2024-04-18 2024-04-18 Outpatient electronic medical record generation method and system based on voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410465169.8A CN118072901A (en) 2024-04-18 2024-04-18 Outpatient electronic medical record generation method and system based on voice recognition

Publications (1)

Publication Number Publication Date
CN118072901A true CN118072901A (en) 2024-05-24

Family

ID=91097400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410465169.8A Pending CN118072901A (en) 2024-04-18 2024-04-18 Outpatient electronic medical record generation method and system based on voice recognition

Country Status (1)

Country Link
CN (1) CN118072901A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102290047A (en) * 2011-09-22 2011-12-21 哈尔滨工业大学 Robust speech characteristic extraction method based on sparse decomposition and reconfiguration
KR102298330B1 (en) * 2021-01-27 2021-09-06 주식회사 두유비 System for generating medical consultation summary and electronic medical record based on speech recognition and natural language processing algorithm
CN115472157A (en) * 2022-08-22 2022-12-13 成都信息工程大学 Traditional Chinese medicine clinical speech recognition method and model based on deep learning
CN117253576A (en) * 2023-10-30 2023-12-19 来未来科技(浙江)有限公司 Outpatient electronic medical record generation method based on Chinese medical large model
US20240021202A1 (en) * 2020-11-20 2024-01-18 Beijing Youzhuju Network Technology Co., Ltd. Method and apparatus for recognizing voice, electronic device and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102290047A (en) * 2011-09-22 2011-12-21 哈尔滨工业大学 Robust speech characteristic extraction method based on sparse decomposition and reconfiguration
US20240021202A1 (en) * 2020-11-20 2024-01-18 Beijing Youzhuju Network Technology Co., Ltd. Method and apparatus for recognizing voice, electronic device and medium
KR102298330B1 (en) * 2021-01-27 2021-09-06 주식회사 두유비 System for generating medical consultation summary and electronic medical record based on speech recognition and natural language processing algorithm
CN115472157A (en) * 2022-08-22 2022-12-13 成都信息工程大学 Traditional Chinese medicine clinical speech recognition method and model based on deep learning
CN117253576A (en) * 2023-10-30 2023-12-19 来未来科技(浙江)有限公司 Outpatient electronic medical record generation method based on Chinese medical large model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
付婧 等: "前馈序列记忆网络在语音识别中的应用综述", 内江师范学院学报, vol. 35, no. 4, 30 April 2020 (2020-04-30), pages 44 *
付婧;罗建;龙彦霖;苗晨;程玉勤;: "前馈序列记忆网络在语音识别中的应用综述", 内江师范学院学报, no. 04, 25 April 2020 (2020-04-25) *
杨博雄: "深度学习理论与实践", 30 September 2020, 北京邮电大学出版社, pages: 197 - 198 *
潘益婷 等: "人工智能技术应用导论", 31 August 2022, 机械工业出版社, pages: 117 - 122 *
王海波 等: "信息技术与外语实验教学", 31 March 2022, 北京邮电大学出版社, pages: 111 - 113 *

Similar Documents

Publication Publication Date Title
CN110120224B (en) Method and device for constructing bird sound recognition model, computer equipment and storage medium
CN108597492B (en) Phoneme synthesizing method and device
CN112562691B (en) Voiceprint recognition method, voiceprint recognition device, computer equipment and storage medium
KR102216160B1 (en) Apparatus and method for diagnosing disease that causes voice and swallowing disorders
CN110168665B (en) Computing method and apparatus for generating template documents using machine learning techniques
US11282503B2 (en) Voice conversion training method and server and computer readable storage medium
CN111095259A (en) Natural language processing using N-GRAM machines
WO2021179717A1 (en) Speech recognition front-end processing method and apparatus, and terminal device
Abdusalomov et al. Improved feature parameter extraction from speech signals using machine learning algorithm
CN109036437A (en) Accents recognition method, apparatus, computer installation and computer readable storage medium
CN112687263A (en) Voice recognition neural network model, training method thereof and voice recognition method
WO2019080502A1 (en) Voice-based disease prediction method, application server, and computer readable storage medium
CN111048071A (en) Voice data processing method and device, computer equipment and storage medium
CN113129867B (en) Training method of voice recognition model, voice recognition method, device and equipment
Milling et al. Is speech the new blood? recent progress in ai-based disease detection from audio in a nutshell
Zhang et al. Intelligent speech technologies for transcription, disease diagnosis, and medical equipment interactive control in smart hospitals: A review
CN113421584B (en) Audio noise reduction method, device, computer equipment and storage medium
Sekkate et al. A statistical feature extraction for deep speech emotion recognition in a bilingual scenario
WO2021135454A1 (en) Method, device, and computer-readable storage medium for recognizing fake speech
WO2022072936A2 (en) Text-to-speech using duration prediction
CN113077812A (en) Speech signal generation model training method, echo cancellation method, device and equipment
CN118072901A (en) Outpatient electronic medical record generation method and system based on voice recognition
CN116595541A (en) Transformer-based source code vulnerability detection method, device, computer equipment and medium
CN116542783A (en) Risk assessment method, device, equipment and storage medium based on artificial intelligence
Yerigeri et al. Meta-heuristic approach in neural network for stress detection in Marathi speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination