CN118072901A

CN118072901A - Outpatient electronic medical record generation method and system based on voice recognition

Info

Publication number: CN118072901A
Application number: CN202410465169.8A
Authority: CN
Inventors: 张爱珍; 马丽; 刘晓春
Original assignee: People's Liberation Army Navy Navy Qingdao Special Service Sanatorium
Current assignee: People's Liberation Army Navy Navy Qingdao Special Service Sanatorium
Priority date: 2024-04-18
Filing date: 2024-04-18
Publication date: 2024-05-24

Abstract

The invention belongs to the field of voice recognition, in particular to a method and a system for generating an outpatient electronic medical record based on voice recognition, which are characterized in that a small amount of original voice data sets are used for carrying out WGAN model training, and more similar voices generated by training are used for expanding the original voice data sets to form a voice recognition data set for training a voice recognition model; processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, carrying out structuring processing on the communication text, generating and outputting an electronic medical record of a current patient. According to the invention, the electronic medical record of the current patient is generated through the voice recognition model, so that doctors have more time to concentrate on patient care, the working efficiency is improved, and the operation cost is reduced.

Description

Outpatient electronic medical record generation method and system based on voice recognition

Technical Field

The invention belongs to the technical field of voice recognition, and particularly relates to an outpatient electronic medical record generation method and system based on voice recognition.

Background

With the rapid development of information technology, the medical industry is gradually introducing advanced information technology to improve the working efficiency and the service quality. The accuracy and integrity of the outpatient medical record, which is an important document for recording patient visit information, is critical for doctors to formulate treatment plans and subsequent health management of patients. However, conventional medical records are often dependent on manual input by a physician, which is time consuming, labor consuming, and prone to omission and error. Therefore, how to efficiently and accurately generate the outpatient medical records becomes a problem to be solved.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a system for generating an outpatient electronic medical record based on voice recognition, which are used for converting the dialogue content between a doctor and a patient into characters by utilizing a voice recognition technology so as to generate a structured medical record, so as to solve the technical problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions.

In a first aspect of the present invention, there is provided a method for generating an electronic medical record for an outpatient service based on speech recognition, the method comprising the steps of:

Training WGAN models by using a small amount of original voice data sets, and expanding the original voice data sets by using more similar voices generated by training to form voice recognition data sets for training a voice recognition model;

Dividing a voice recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;

and acquiring communication voice between a doctor and a patient in real time by using a pickup, processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, carrying out structural processing on the communication text, generating and outputting an electronic medical record of the current patient.

In some embodiments of the invention, the objective function of the WGAN model is expressed as:

（1）；

Where x represents a small number of original speech segments of the original speech data set, Is a fixed distribution over x, sup represents the upper bound of all 1-Lipschitz Li Pu litz continuous functions f in equation (1), with one implicit random variable space/>Generator g is/>Mapping to x,/>Is a parameter in the mapping process, i.e./>，/>Representative is a parameterIs a distribution function of (a);

The speech converted from the original speech segment is input to the formula (1) for training, Representing the expectations of the original speech as a true sample, and/>Representing the desire to generate speech,/>And/>When Nash equilibrium is reached, training is stopped, and more similar voices are obtained; when Nash equilibrium cannot be reached, training is continued until Nash equilibrium can be reached.

In some embodiments of the invention, the step of constructing an LFR-DFSMN model includes:

constructing a basic network structure, wherein the basic network structure comprises an input layer, a hidden layer, CFSMN layers, DFSMN layers and an output layer;

each CFSMN layer is introduced with a memory module, jump connection (skip-connection) is further introduced between the memory modules of the adjacent CFSMN layers, and step length factors s are added to form DFSMN layers of the model;

and integrating an input layer, a hidden layer, a CFSMN layer, a DFSMN layer and an output layer of the basic network structure to obtain the LFR-DFSMN model.

In some embodiments of the invention, the firstThe output of the layer memory module at the t-th moment is expressed as:

（2）；

wherein, Represents the/>-An output of the layer 1 memory module at time t; /(I)Represents the/>Outputting the layer memory module at the time t; /(I)And/>Representing the order of review and the order of forward looking, respectively; /(I)Representing element multiplication; /(I)Representation/>In the layer, the weight of the ith historical moment time step; /(I)Representation/>In the layer, the weight of the j-th future time step; /(I)Representation/>Layer j future time step,/>Is a step size factor at a future time; /(I)Representation/>Layer i historical time step,/>Is the step size factor of the historic time.

In some embodiments of the present invention, the LFR-DFSMN model constructed uses 8 layers DFSMN, where the number of cells per layer DFSMN (the number of neurons in the hidden layer) is set to 1024, the front-to-back step size is set to 40, and the DFSMN output is normalized and then carried into the activation function swish, resulting in the next layer output, where the regularized loss rate is set to 0.5 in each layer DFSMN.

In some embodiments of the present invention, the step of processing the alternating speech to obtain a plurality of speech frame segments with a time sequence comprises:

noise reduction processing is carried out on alternating current voice by utilizing a noise suppression algorithm, and continuous voice signals are divided into a series of short frames;

Windowing is carried out on each short frame by applying a Hamming window function, FFT is carried out on the windowed voice frame, and the voice frame is converted from a time domain to a frequency domain, so that the frequency spectrum representation of the frame is obtained;

Processing the FFT results by a mel filter bank, which typically consists of a set of triangular filters that cover the hearing range of the human ear and output the response of each filter, simulating the sensitivity of the human ear to different frequencies;

calculating the logarithmic energy of the response of each Mel filter, and performing discrete cosine transform DCT on the logarithmic energy to obtain MFCCs features;

And splicing the features of the continuous frames to form a feature sequence, and capturing the time continuity of the voice to obtain a plurality of voice frame fragments with the time sequence.

In some embodiments of the present invention, the step of structuring the communication text to generate and output an electronic medical record of the current patient includes:

removing irrelevant information in the communication text, wherein the irrelevant information comprises non-medical greetings and non-critical words;

dividing the text into words or phrases, marking the parts of speech of the divided text, and identifying the parts of speech;

identifying medical related entities in the text, and identifying relationships among the entities;

and mapping the identified entities and relations to corresponding fields of the template according to the defined structured template of the electronic medical record, and generating the electronic medical record of the current patient.

In a second aspect, in another embodiment of the present invention, there is provided an outpatient electronic medical record generating system based on voice recognition, the outpatient electronic medical record generating system including:

The corpus creation module is used for training WGAN models by using a small amount of original voice data sets, expanding the original voice data sets by using more similar voices generated by training, and forming voice recognition data sets for training the voice recognition models;

the voice recognition model training module is used for dividing a voice recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;

The electronic medical record generation module is used for acquiring the communication voice of a doctor and a patient in real time by utilizing the pickup, processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, and carrying out structural processing on the communication text to generate and output the electronic medical record of the current patient.

In a third aspect, in yet another embodiment of the present invention, there is provided a computer apparatus comprising:

A memory for storing a computer program;

A processor for implementing the method for generating an electronic medical record for clinic based on voice recognition as provided in the first aspect when executing the computer program.

In a fourth aspect, in a further embodiment of the present invention, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the speech recognition based outpatient electronic medical record generation method as provided in the first aspect.

Compared with the prior art, the outpatient electronic medical record generation method and system based on voice recognition have the beneficial effects that:

Firstly, according to the voice recognition model provided by the invention, a memory module is introduced into each CFSMN layer, jump connection (skip-connection) is further introduced between the memory modules of the adjacent CFSMN layers, step factors (stride) are added, DFSMN layers of the model are formed, LFR is adopted in an input layer, specifically, adjacent time voice frame fragments are bound and used as input of the voice recognition model, target output of the voice frame fragments is predicted, and alternating text is obtained, so that input and output are reduced, and efficiency in voice recognition is greatly improved;

Secondly, when the invention is used for training the voice recognition model, the WGAN model is trained by a small amount of voice materials to expand the corpus used for training the voice recognition model, so that the problems of few samples and unbalanced samples of the corpus can be solved, and the WGAN model is simple and fast in convergence, low in calculation complexity, fast in response speed and strong in operability;

Thirdly, the invention generates the electronic medical record of the current patient by carrying out structuring treatment on the communication text obtained by the voice recognition model, thereby accelerating the speed of medical record recording and enabling doctors to have more time to concentrate on patient nursing; and the dictation content of the doctor can be more accurately identified and transcribed, so that the accuracy of medical records is improved; and through the automatic medical record process, the voice recognition technology can simplify the medical workflow, improve the working efficiency, thereby reducing the operation cost.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.

In the figure:

FIG. 1 is an application scenario architecture diagram of an outpatient electronic medical record generating method based on speech recognition according to the present invention;

FIG. 2 is a flow chart of an implementation of a method for generating an electronic medical record for clinic based on voice recognition;

FIG. 3 is a sub-flowchart of a method for generating an electronic medical record for an clinic based on speech recognition according to the present invention;

FIG. 4 is another sub-flowchart of a method for generating an electronic medical record for an clinic based on speech recognition according to the present invention;

FIG. 5 is a further sub-flowchart of a method for generating an electronic medical record for an clinic based on speech recognition according to the present invention;

FIG. 6 is a block diagram of an outpatient electronic medical record generating system based on speech recognition according to the present invention;

fig. 7 is a block diagram of the electronic medical record generating module in the outpatient electronic medical record generating system according to the present invention.

Detailed Description

The present application will be further described with reference to the accompanying drawings and detailed description, wherein it is to be understood that, on the premise of no conflict, the following embodiments or technical features may be arbitrarily combined to form new embodiments.

In order to make the objects, technical solutions and advantages of the present application more apparent, the following embodiments of the present application will be described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two non-identical entities with the same name or non-identical parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such as a process, method, system, article, or other step or unit that comprises a list of steps or units.

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Currently, the outpatient medical record is taken as an important document for recording the patient treatment information, and the accuracy and the integrity of the outpatient medical record are important for a doctor to formulate a treatment scheme and the subsequent health management of the patient. However, conventional medical records are often dependent on manual input by a physician, which is time consuming, labor consuming, and prone to omission and error. Therefore, how to efficiently and accurately generate the outpatient medical records becomes a problem to be solved.

In order to solve the problems, the voice recognition model provided by the invention binds the voice frame fragments at adjacent moments as the input of the voice recognition model to predict the target output of the voice frame fragments and obtain the alternating text, so that the input and the output can be reduced, and the efficiency in voice recognition is greatly improved; when the method is used for training the speech recognition model, the WGAN model is trained through a small amount of speech materials, so that a corpus used for training the speech recognition model is expanded, and the problems of few samples and unbalanced samples of the corpus can be solved; through carrying out structural processing on the communication text obtained by the voice recognition model, generating the electronic medical record of the current patient, the speed of medical record recording can be increased, and a doctor can have more time to concentrate on patient care; and the dictation of the doctor can be more accurately identified and transcribed, so that the accuracy of medical records is improved.

Referring to fig. 1, an application scenario chart diagram of the outpatient electronic medical record generating method based on voice recognition is shown, wherein a sound pick-up is installed in a doctor and patient inquiry environment; the system is used for recording the dialogue between the patient and the doctor so as to acquire communication voice; the communication voice data acquired by the pickup are sent to a database server;

The invention utilizes the database server to train the voice recognition model and recognize the communication voice, recognizes the communication voice into communication text, and further constructs the communication text to generate temporary electronic medical records, the temporary electronic medical records are temporarily stored in the buffer, and a terminal computer used by a doctor can display, modify, store and other operations on the electronic medical records in the buffer; and further returning the final electronic medical record to the database server;

After the clinic is finished, the patient can print the electronic medical record of the patient by the printer so as to obtain paper medical records.

In an exemplary embodiment of the present disclosure, the microphone and the database server are connected through a network. The network may include various types of wired or wireless communication links;

for example: the wired communication link includes optical fiber, and the wireless communication link includes a Bluetooth communication link, a wireless fidelity (WIreless-FIdelity, wi-Fi) communication link.

The specific implementation of the outpatient electronic medical record generating method based on voice recognition according to the present invention is described in detail below with reference to specific embodiments.

As shown in fig. 2, in one embodiment of the present invention, there is provided a method for generating an electronic medical record for clinic based on voice recognition, including the steps of:

Step S101: training WGAN models by using a small amount of original voice data sets, and expanding the original voice data sets by using more similar voices generated by training to form voice recognition data sets for training a voice recognition model;

Step S102: dividing a voice recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;

Step S103: and acquiring communication voice between a doctor and a patient in real time by using a pickup, processing the communication voice to obtain a plurality of voice frame fragments with time sequences, binding the voice frame fragments at adjacent moments as input of a voice recognition model, predicting target output of the voice frame fragments to obtain communication text, carrying out structural processing on the communication text, generating and outputting an electronic medical record of the current patient.

Preferably, the objective function of the WGAN model in step S101 in the embodiment of the present invention is expressed as:

（1）；

When the method is used for training the speech recognition model, the WGAN model is trained through a small amount of speech materials, so that a corpus used for training the speech recognition model is expanded, the problems of few samples and unbalanced samples of the corpus can be solved, and the WGAN model is simple and fast in convergence, low in calculation complexity, high in response speed and strong in operability.

Further, as shown in fig. 2 and 3, the method of the present invention includes the steps of:

Step S201: constructing a basic network structure, wherein the basic network structure comprises an input layer, a hidden layer, CFSMN layers, DFSMN layers and an output layer;

step S202: each CFSMN layer is introduced with a memory module, jump connection (skip-connection) is further introduced between the memory modules of the adjacent CFSMN layers, and step factors (stride) are added to form DFSMN layers of the model;

Step S203: integrating an input layer, a hidden layer, a CFSMN layer, a DFSMN layer and an output layer of the basic network structure to obtain an LFR-DFSMN model;

Wherein, in the LFR-DFSMN model of the invention, the first The output of the layer memory module at the t-th moment is expressed as:

（2）；

Preferably, the LFR-DFSMN model constructed uses 8 layers DFSMN, wherein the number of units (the number of neurons in the hidden layer) of each layer DFSMN is set to 1024, the front-back step size is set to 40, and the output of DFSMN is carried into the activation function swish after normalization processing, so that the output of the next layer is finally obtained, and in each layer DFSMN, the loss rate of the normalization processing is set to 0.5.

According to the voice recognition model, a memory module is introduced into each CFSMN layers, jump connection is further introduced between the memory modules of the adjacent CFSMN layers, step factors (stride) are added, DFSMN layers of the model are formed, LFR is adopted by an input layer, specifically, voice frame fragments at adjacent moments are bound and used as input of the voice recognition model, target output of the voice frame fragments is predicted, alternating text is obtained, input and output can be reduced, and efficiency in voice recognition is greatly improved.

In the embodiment of the invention, in the process of acquiring the alternating-current voice of doctors and patients, a microphone of a pickup or other recording equipment is used for capturing the alternating-current voice signal, the analog voice signal is converted into a digital signal, and the proper sampling rate is selected through a sampling theorem, and the sampling rate is 16kHz; typically the sampling rate determines the maximum frequency that can be captured, and other sampling rates, such as 8kHz, 44.1kHz, etc., may be used in implementations.

Further, as shown in fig. 4, in the embodiment of the present invention, the step of processing the ac speech to obtain a plurality of speech frame segments with time sequences includes:

Step S301: noise reduction processing is performed on alternating-current voice by using a noise suppression algorithm, a continuous voice signal is divided into a series of short frames, each frame usually comprises a fixed number of sampling points, such as a window of 20ms or 30ms, and the window length is usually related to the sampling rate;

Step S302: windowing is performed on each short frame by applying a hamming window function to reduce the discontinuity of frame boundaries, which helps to reduce the frequency spectrum leakage effect in frequency domain analysis; performing FFT on the windowed speech frame, and converting the speech frame from a time domain to a frequency domain to obtain a frequency spectrum representation of the frame;

Step S303: processing the FFT results by a mel filter bank, which typically consists of a set of triangular filters that cover the hearing range of the human ear and output the response of each filter, simulating the sensitivity of the human ear to different frequencies;

Step S304: calculating the logarithmic energy of the response of each Mel filter, and performing discrete cosine transform DCT on the logarithmic energy to obtain MFCCs features, MFCCs being a common feature in speech recognition;

Step S305: and splicing the features of the continuous frames to form a feature sequence, and capturing the time continuity of the voice to obtain a plurality of voice frame fragments with the time sequence.

Furthermore, in the process of processing the communication voice, the characteristics of the front frame and the rear frame are considered to construct a context window, so that the method is beneficial to capturing the context information of the voice; further, post-processing of features is also required, including feature filling and feature normalization:

And (3) feature filling: for short audio segments, padding features may be required to maintain consistent length;

Feature standardization: before obtaining a plurality of speech frame segments with a time sequence, the features need to be normalized to ensure that they are on the same scale.

By the above steps you can get a series of speech frame fragments with time series information from the communication speech, which fragments can be used as input for speech recognition or other speech processing tasks. It should be noted that the specific steps and parameter settings of feature extraction need to be adjusted according to the specific application scenario and data set.

Further, in the embodiment of the present invention, the text of the communication is structured, which involves Natural Language Processing (NLP) and information extraction technology, and text data of the communication between the doctor and the patient needs to be collected before the structuring process, which may include diagnosis records of the doctor, nursing records, self-description of the patient, and the like.

Further, as shown in fig. 5, preprocessing needs to be performed on the text of the communication, including:

step S401: removing irrelevant information in the communication text, wherein the irrelevant information comprises non-medical greetings, non-critical words and the like;

step S402: dividing the text into words or phrases, marking the parts of speech of the divided text, and identifying the parts of speech; such as nouns, verbs, adjectives, and the like.

Further, it is also necessary to identify entities in the text, specifically including:

Step S403: medical related entities in the text are identified, and relationships between the entities are identified.

Wherein, the medical related entity in the text is identified, such as disease name, drug name, symptom description, etc.; relationships between entities are identified, for example, an association between a symptom and a particular disease is determined.

Further, the method further comprises the following steps:

step S404: and mapping the identified entities and relations to corresponding fields of the template according to the defined structured template of the electronic medical record, and generating the electronic medical record of the current patient.

In the embodiment of the invention, a structured template, such as basic information, medical history, diagnosis result, treatment scheme and the like of a patient, is defined according to the requirements of the electronic medical record; the identified entities and relationships are mapped into corresponding fields of the template.

Preferably, the step of structuring the alternating text may further include:

and (3) entity filling: the medical entities identified by NER are populated to the corresponding locations of the template.

And (3) relation filling: the relation among the entities is filled into the template according to the logic sequence, so that the continuity and the accuracy of the information are ensured;

Rule checking: checking the filled information by using a predefined rule to ensure the integrity and consistency of the information;

correcting errors: for errors or inconsistencies found in the verification process, performing manual correction or performing automatic correction by using a machine learning model;

Extracting time information: extracting time-related information from the text, such as the time at which symptoms appear, the time at which treatment begins, etc.;

time line construction: constructing a time line of the patient's condition according to the extracted time information, so as to track the condition change and the treatment progress;

Integrating information: integrating all the structured information into an electronic medical record template to generate a complete electronic medical record document;

formatting the output: formatting the generated electronic medical record to ensure that the electronic medical record meets the medical standard and the readability requirement;

dynamically updating: with the change of the patient's illness state and the generation of new medical information, the electronic medical record is updated regularly, so as to ensure that the latest medical condition is reflected.

Through the steps, the key medical information can be extracted from the unstructured alternating text and structured into the electronic medical record. The process can greatly improve the efficiency and quality of medical records, and is convenient for medical staff to manage and analyze patient information. It is noted that the processing of medical information requires compliance with associated privacy regulations and standards.

According to the invention, through carrying out structural processing on the communication text obtained by the voice recognition model, the electronic medical record of the current patient is generated, so that the speed of medical record recording can be increased, and a doctor can have more time to concentrate on patient care; and the dictation content of the doctor can be more accurately identified and transcribed, so that the accuracy of medical records is improved; and through the automatic medical record process, the voice recognition technology can simplify the medical workflow, improve the working efficiency, thereby reducing the operation cost.

It is noted that the above-described figures are only schematic illustrations of processes involved in a method according to an exemplary embodiment of the invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

It should be understood that although described in a certain order, the steps are not necessarily performed sequentially in the order described. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, some steps of the present embodiment may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.

In a second aspect, as shown in fig. 6, in another embodiment of the present invention, there is provided an outpatient electronic medical record generating system based on voice recognition, the outpatient electronic medical record generating system including:

the corpus creation module 501 is configured to perform WGAN model training by using a small amount of original speech data sets, and expand the original speech data sets by using more similar speech generated by training to form a speech recognition data set for speech recognition model training;

The speech recognition model training module 502 is configured to divide a speech recognition data set to form a training set, a verification set and a test set; constructing an LFR-DFSMN model, training the LFR-DFSMN model by using voice data of a training set, evaluating model performance by using a verification set in the training process, updating model parameters, and obtaining a required voice recognition model by using final performance of the evaluation model on a test set after training is completed;

The electronic medical record generating module 503 is configured to obtain, in real time, communication voice between a doctor and a patient by using a pickup, process the communication voice to obtain a plurality of voice frame segments with time sequences, bind the voice frame segments at adjacent moments as input of a voice recognition model, predict target output of the voice frame segments to obtain communication text, and perform structural processing on the communication text to generate and output an electronic medical record of the current patient.

Further, as shown in fig. 7, in an embodiment of the present invention, the electronic medical record generating module further includes:

A sound pick-up 5031 for acquiring communication voice between a doctor and a patient in real time;

a speech processor 5032 for processing the alternating speech to obtain a plurality of speech frame segments with a time sequence;

The voice recognition submodule 5033 is used for binding voice frame fragments at adjacent moments as input of a voice recognition model to predict target output of the voice frame fragments so as to obtain an alternating text;

The text structuring processing sub-module 5034 is configured to perform structuring processing on the communication text, generate an electronic medical record of the current patient, and output the electronic medical record.

The outpatient electronic medical record generating system based on voice recognition has wide application prospect in the field of intelligent outpatient service, can remarkably improve the diagnosis and treatment efficiency of patients, and provides higher-level support for the creation of intelligent hospitals.

In a third aspect of the embodiment of the present invention, there is also provided a computer device, including a memory and a processor, where the memory stores a computer program, and the computer program implements the method for generating an electronic medical record for outpatient service based on speech recognition according to any one of the embodiments.

The memory is used as a non-volatile computer readable storage medium and can be used for storing non-volatile software programs, non-volatile computer executable programs and modules, such as program instructions/modules corresponding to the outpatient electronic medical record generating method based on voice recognition in the embodiment of the application. The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by use of a voice recognition based outpatient electronic medical record generation method, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the local module through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor is typically used to control the overall operation of the computer device. In this embodiment, the processor is configured to execute the program code stored in the memory or process the data. The processors of the multiple computer devices of the computer device of the embodiment execute various functional applications and data processing of the server by running nonvolatile software programs, instructions and modules stored in the memory, that is, the steps of the outpatient electronic medical record generating method based on voice recognition in the embodiment of the method are implemented.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

Finally, it should be noted that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, RAM may be available in a variety of forms such as synchronous RAM (DRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.

The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform such functions: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP and/or any other such configuration.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.

Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims

1. The outpatient electronic medical record generation method based on voice recognition is characterized by comprising the following steps of:

2. The method for generating an electronic medical record for an clinic based on voice recognition according to claim 1, wherein the objective function of the WGAN model is expressed as:

（1）；

in the formula (1): x represents the original speech segment of a small number of original speech data sets, Is a fixed distribution over x, sup represents the upper bound of all 1-Lipschitz Li Pu litz continuous functions f in equation (1), with an implicit random variable space in equation (1)Generator g is/>Mapping to x,/>Is a parameter in the mapping process, i.e./>，/>Representative is a parameterIs a distribution function of (a);

3. The method for generating an electronic medical record for an outpatient service based on speech recognition according to claim 2, wherein the step of constructing an LFR-DFSMN model comprises:

Each CFSMN layer is introduced with a memory module, jump connection is further introduced between the memory modules of the adjacent CFSMN layers, and step length factors s are added to form DFSMN layers of the model;

4. The method for generating an electronic medical record for an outpatient service based on speech recognition according to claim 3, wherein the first step isThe output of the layer memory module at the t-th moment is expressed as:

（2）；

5. The method for generating an electronic medical record for clinic based on voice recognition according to claim 4, wherein the LFR-DFSMN model is constructed by 8 layers DFSMN, wherein the number of units (the number of neurons in a hidden layer) of each layer DFSMN is set to 1024, the front-back step size is set to 40, the output of DFSMN is normalized and then substituted into the activation function swish, and finally the output of the next layer is obtained, and the loss rate of regularization is set to 0.5 in each layer DFSMN.

6. The method for generating an electronic medical record for an outpatient service based on voice recognition according to any one of claims 2 to 5, wherein the step of processing the communication voice to obtain a plurality of voice frame segments with a time sequence includes:

processing the FFT result by a mel-filter bank, typically consisting of a set of triangular filters, and outputting the response of each filter;

7. The method for generating an electronic medical record for an outpatient service based on speech recognition according to claim 6, wherein the step of structuring the text of the communication to generate and output an electronic medical record for the current patient comprises:

8. An outpatient electronic medical record generating system based on voice recognition, which is characterized by comprising:

9. A computer device, comprising:

A memory for storing a computer program;

A processor for implementing the method for generating an electronic medical record for clinic based on speech recognition according to any one of claims 1-7 when executing the computer program.

10. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the speech recognition based outpatient electronic medical record generation method according to any of claims 1 to 7.