CN113555133A - Medical inquiry data processing method and device - Google Patents

Medical inquiry data processing method and device Download PDF

Info

Publication number
CN113555133A
CN113555133A CN202110601186.6A CN202110601186A CN113555133A CN 113555133 A CN113555133 A CN 113555133A CN 202110601186 A CN202110601186 A CN 202110601186A CN 113555133 A CN113555133 A CN 113555133A
Authority
CN
China
Prior art keywords
voice
data
text
inquiry
user identity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110601186.6A
Other languages
Chinese (zh)
Inventor
赖伟
周昌伟
陈良军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yikang Medical Technology Co ltd
Original Assignee
Beijing Yikang Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yikang Medical Technology Co ltd filed Critical Beijing Yikang Medical Technology Co ltd
Priority to CN202110601186.6A priority Critical patent/CN113555133A/en
Publication of CN113555133A publication Critical patent/CN113555133A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The disclosure relates to a medical inquiry data processing method and a medical inquiry data processing device, wherein the method comprises the following steps: acquiring inquiry voice data of a target duration in a medical inquiry process; cutting inquiry voice data according to the voice direction information to obtain a first voice fragment set and a second voice fragment set; performing voice recognition on the first voice fragment set and the second voice fragment set to generate a first text set and a second text set; semantic understanding is carried out on the first text set and the second text set, and a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction are determined; and performing semantic analysis on the first text set and the second text set according to the first user identity and the second user identity, acquiring structured data and inquiring voice data, and storing the structured data and the inquiring voice data in a preset database. Therefore, more efficient information recording and original data storage are realized in the clinical inquiry process.

Description

Medical inquiry data processing method and device
Technical Field
The present disclosure relates to the field of medical data processing technologies, and in particular, to a medical inquiry data processing method and device.
Background
With the popularization of medical informatization, a plurality of hospitals (particularly the third hospital) already have a complete set of medical informatization systems, and a large amount of medical data is recorded in the systems in a digital form, such as electronic medical records, image data and the like, and the data is not only used for inquiring and tracing patient medical record information, but also used for retrospective clinical research of expert doctors, so that the quality and the integrity of the data are extremely high.
In the current medical scene, most of the data are recorded in the clinical diagnosis process, and as most of time and energy of an expert doctor are used for diagnosis and treatment of a patient in the clinical diagnosis process, especially in the outpatient scene, the doctor only spends a small amount of time on the information record, and the information is generally recorded in the following two ways: firstly, after the doctor finishes the diagnosis and treatment of the previous patient and before the next patient comes in, the doctor inputs some most important information into the system in a relatively simple way at the fastest speed (such as 1 minute); first, each doctor is specially matched with an assistant, and during the diagnosis and treatment process of the doctor, the doctor is specially responsible for transcribing and inputting inquiry information.
However, in the first mode, since the time spent by the doctor for the entry is extremely limited, the entered information is very little, which results in incomplete information recording, and the error rate of the information entry is high due to the time spent in manual entry; under the second mode, every doctor need join in marriage an assistant alone, has increased extra human cost by a wide margin, and ordinary doctor is unable to bear, and only a small number of major experts doctors have this condition, and the assistant is mostly students of expert's area etc. moreover, has knowledge level and clinical experience not enough, also can lead to the information of typing-in and doctor expert's expectation not to conform, and the information has the disappearance promptly, has the problem that the doctor has the information record inefficiency, incomplete in the clinical inquiry in-process even makes mistakes promptly.
Disclosure of Invention
In order to solve the technical problems or at least partially solve the technical problems, the present disclosure provides a method and an apparatus for processing medical inquiry data, which solve the problems that manual summary transcription is performed in the original inquiry process, the efficiency is low, only part of brief text information is saved, so that the original inquiry voice is not saved and the information record is incomplete in the diagnosis and treatment process, thereby affecting the subsequent patient history tracing and retrospective study.
The present disclosure provides a medical inquiry data processing method, including:
acquiring inquiry voice data of a target duration in a medical inquiry process;
cutting the inquiry voice data according to the voice direction information to obtain a first voice fragment set belonging to a first voice direction and a second voice fragment set belonging to a second voice direction;
performing voice recognition on the first voice fragment set and the second voice fragment set to generate a first text set and a second text set;
semantic understanding is carried out on the first text set and the second text set, and a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction are determined;
and performing semantic analysis on the first text set and the second text set according to the first user identity identification and the second user identity identification to obtain structured data, and storing the inquiry voice data and the structured data in a preset database.
In an optional embodiment of the present disclosure, the semantically understanding the first text set and the second text set, and obtaining and determining a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction includes:
determining a first probability that each first text belongs to the first user identity and a second probability of the second user identity according to a semantic understanding result of each first text in the first text set;
determining a third probability that each second text belongs to the first user identity and a fourth probability of the second user identity according to semantic understanding results of each second text in the second text set;
determining a first total probability of the first subscriber identity according to the plurality of first probabilities, determining a second total probability of the second subscriber identity according to the plurality of second probabilities, and determining the first voice direction as the first subscriber identity if the first total probability is greater than or equal to the second total probability;
and determining a third total probability of the first subscriber identity according to the plurality of third probabilities, determining a fourth total probability of the second subscriber identity according to the plurality of fourth probabilities, and determining the second voice direction as the second subscriber identity under the condition that the third total probability is smaller than the fourth total probability.
In an optional embodiment of the present disclosure, the performing speech recognition on the first set of speech segments and the second set of speech segments to generate a first set of texts and a second set of texts includes:
performing feature extraction on each first voice fragment in the first voice fragment set and each second voice fragment in the second voice fragment set to obtain a plurality of first acoustic features and a plurality of second acoustic features;
and respectively decoding and searching the first acoustic features and the second acoustic features through a pre-trained acoustic model and a language model to obtain the first text set and the second text set.
In an optional embodiment of the present disclosure, a labeled voice data sample is obtained, and the voice data sample is input to a neural network for training to obtain a basic model;
and inputting the medical inquiry voice data into the basic model for training through the labeled medical inquiry voice data, and adjusting the model parameters of the basic model to obtain the acoustic model.
In an optional embodiment of the present disclosure, professional text data in the medical field is acquired, the professional text data and general text data are mixed according to a preset weight, and the language model is trained.
In an optional embodiment of the present disclosure, the performing semantic analysis on the first text set and the second text set according to the first user id and the second user id to obtain structured data, and storing the inquiry speech data and the structured data in a preset database includes:
determining a plurality of groups of question and answer texts according to the first text set and the second text set, classifying the plurality of groups of question and answer texts through a pre-trained classifier, and acquiring a question and answer type corresponding to each group of question and answer texts;
determining a target text according to the question and answer type, and extracting information of the target text through a pre-trained information extraction model to obtain a plurality of keywords;
carrying out data standardization processing on the plurality of keywords according to a preset dictionary and a mapping model to obtain target words;
and generating the structured data according to the target words, and storing the inquiry voice data and the structured data in a preset database.
In an optional embodiment of the present disclosure, the medical inquiry data processing method further includes:
sending the structured data to a terminal for display;
and receiving confirmation information or update information of the structured data, and acquiring inquiry voice data and confirming or updating text information to train an acoustic model and a language model.
In an optional embodiment of the present disclosure, the medical inquiry data processing method further includes:
sending the structured data to a terminal for display;
and receiving confirmation information of the structured data, acquiring identified inquiry dialogue text data, and optimizing a classifier according to the inquiry dialogue text data.
In an optional embodiment of the present disclosure, the medical inquiry data processing method further includes:
sending the structured data to a terminal for display;
receiving an updating instruction of the structured data, and updating the structured data according to the updating instruction;
and marking the updated structured data as a training sample for training the information extraction model.
The present disclosure provides a medical inquiry data processing apparatus, including:
the acquisition module is used for acquiring inquiry voice data of a target duration in the medical inquiry process;
the cutting module is used for cutting the inquiry voice data according to the voice direction information to obtain a first voice fragment set belonging to a first voice direction and a second voice fragment set belonging to a second voice direction;
the recognition module is used for carrying out voice recognition on the first voice fragment set and the second voice fragment set to generate a first text set and a second text set;
a semantic understanding module, configured to perform semantic understanding on the first text set and the second text set, and determine a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction;
and the acquisition and storage module is used for performing semantic analysis on the first text set and the second text set according to the first user identity identification and the second user identity identification to acquire structured data, and storing the inquiry voice data and the structured data in a preset database.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
in the medical inquiry process, acquiring inquiry voice data of a target duration, cutting the inquiry voice data according to voice direction information, acquiring a first voice fragment set belonging to a first voice direction and a second voice fragment set belonging to a second voice direction, performing voice recognition on the first voice fragment set and the second voice fragment set to generate a first text set and a second text set, performing semantic understanding on the first text set and the second text set, determining a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction, performing semantic analysis on the first text set and the second text set according to the first user identity and the second user identity to acquire structured data, and storing the inquiry voice data and the structured data in a preset database. Therefore, in the clinical inquiry process, more efficient information recording and original data storage are realized, data tracing and comparison can be carried out at any time according to needs, and the real validity of the data is ensured.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a flowchart illustrating a medical inquiry data processing method according to an embodiment of the disclosure;
fig. 2 is a diagram illustrating a scenario of a medical inquiry data processing method according to an embodiment of the disclosure;
fig. 3 is an exemplary diagram of a speaker separation and identification process according to an embodiment of the disclosure;
FIG. 4 is a diagram illustrating an exemplary process of speech recognition according to an embodiment of the present disclosure;
FIG. 5 is a diagram illustrating an exemplary process of semantic analysis according to an embodiment of the present disclosure;
FIG. 6 is a diagram illustrating an example of a structure of an information extraction model according to an embodiment of the present disclosure;
FIG. 7 is an exemplary diagram of model optimization according to an embodiment of the disclosure;
fig. 8 is a diagram illustrating an exemplary structure of a medical inquiry data processing device according to an embodiment of the disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
Fig. 1 is a flowchart illustrating a medical inquiry data processing method according to an embodiment of the present disclosure.
Specifically, as shown in fig. 1, the method includes:
step 101, acquiring inquiry voice data of a target duration in a medical inquiry process.
In the embodiment of the present disclosure, the medical inquiry refers to a process of diagnosis and treatment of a patient by a doctor, and mainly learns patient information through conversation between the doctor and the patient.
In the embodiment of the present disclosure, in a medical interrogation scenario, a sound collection device including a microphone array, such as a mobile phone, a voice recorder, or the like, is provided, and settings are specifically selected according to an application scenario.
In the embodiment of the present disclosure, the target duration may be selected and set according to an application scenario, for example, 1 minute, 2 minutes, and the like, the inquiry voice data of the target duration may be inquiry voice data of a period of time obtained by starting an inquiry for a patient, manually starting the sound collection device by a doctor until the inquiry is completed, and manually stopping the sound collection device, or may be inquiry voice data of a period of time obtained by automatically starting collection and completing collection by the sound collection device according to silence detection.
In the disclosed embodiment, the interrogation speech data refers to audio data of a conversation between a doctor and a patient during a medical interrogation.
And step 102, cutting inquiry voice data according to the voice direction information to obtain a first voice fragment set belonging to a first voice direction and a second voice fragment set belonging to a second voice direction.
In the embodiment of the present disclosure, the microphone array can judge the speaker direction of the current voice every preset time length, for example, 10 milliseconds, while acquiring the inquiry voice data, and the positions of the doctor and the patient are different, so the corresponding voice directions are different. Therefore, the inquiry voice data can be segmented into two types according to the voice direction information, namely a first voice segment set and a second voice segment set.
Step 103, performing voice recognition on the first voice fragment set and the second voice fragment set to generate a first text set and a second text set.
In the embodiment of the present disclosure, the first speech segment set and the second speech segment set may be subjected to speech recognition by a speech recognition model or the like, so as to generate the first text set and the second text set.
As a possible implementation manner, feature extraction is performed on each first voice segment in the first voice segment set and each second voice segment in the second voice segment set to obtain a plurality of first acoustic features and a plurality of second acoustic features, and the plurality of first acoustic features and the plurality of second acoustic features are decoded and searched through a pre-trained acoustic model and a language model to obtain a first text set and a second text set.
And 104, performing semantic understanding on the first text set and the second text set, and determining a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction.
In the embodiment of the disclosure, the semantic understanding may be performed on the first text set and the second text set by a semantic understanding model, an algorithm, and the like, so as to determine the first user identity and the second user identity.
As a possible implementation manner, determining a first probability that each first text belongs to a first user identity and a second probability of a second user identity according to a semantic understanding result of each first text in the first text set; determining a third probability that each second text belongs to the first user identity and a fourth probability of the second user identity according to semantic understanding results of each second text in the second text set; determining a first total probability of the first user identity according to the plurality of first probabilities, determining a second total probability of the second user identity according to the plurality of second probabilities, and determining the first voice direction as the first user identity under the condition that the first total probability is larger than or equal to the second total probability; and determining a third total probability of the first user identity according to the plurality of third probabilities, determining a fourth total probability of the second user identity according to the plurality of fourth probabilities, and determining the second voice direction as the second user identity under the condition that the third total probability is smaller than the fourth total probability.
And 105, performing semantic analysis on the first text set and the second text set according to the first user identity and the second user identity to obtain structured data, and storing the inquiry voice data and the structured data in a preset database and storing the inquiry voice data and the structured data in the preset database.
In one embodiment of the disclosure, a plurality of groups of question and answer texts are determined according to a first text set and a second text set, the plurality of groups of question and answer texts are classified through a pre-trained classifier, a question and answer type corresponding to each group of question and answer texts is obtained, a target text is determined according to the question and answer type, information extraction is performed on the target text through a pre-trained information extraction model, a plurality of keywords are obtained, data normalization processing is performed on the keywords according to a preset dictionary and a mapping model, target words are obtained, structured data are generated according to the target words, and question and answer voice data and the structured data are stored in a preset database.
Therefore, in the medical inquiry process, inquiry voice data of a target duration is collected, the inquiry voice data is cut according to voice direction information, a first voice fragment set belonging to a first voice direction and a second voice fragment set belonging to a second voice direction are obtained, performing speech recognition on the first set of speech segments and the second set of speech segments to generate a first set of text and a second set of text, semantic understanding is carried out on the first text set and the second text set, a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction are determined, and semantic analysis is carried out on the first text set and the second text set according to the first user identity identification and the second user identity identification to obtain structured data, and the inquiry voice data and the structured data are stored in a preset database. Therefore, in the clinical inquiry process, more efficient information recording and original data storage are realized, data tracing and comparison can be carried out at any time according to needs, and the real validity of the data is ensured.
As an example of a scenario, as shown in fig. 2, for example, voice data in an inquiry process of a doctor is acquired through a microphone array of an intelligent terminal, on one hand, the complete inquiry voice data is stored in a data platform, and on the other hand, real-time artificial intelligent analysis is performed on the inquiry voice data, including speaker separation, voice recognition and semantic understanding, so that corresponding structured data is obtained and returned to the doctor for confirmation and editing, so that the doctor only needs to spend a small amount of time to confirm and edit inquiry information, and then the inquiry information data can be completely recorded. Through the mode, on one hand, the workload of the doctors and the experts in information input is reduced, the efficiency of the doctors and the experts is improved, on the other hand, the original inquiry voice data is recorded, data can be traced and compared at any time according to the requirement, and the real validity of the data is ensured.
In one possible implementation manner of the present disclosure, a first probability that each first text belongs to a first subscriber identity and a second probability of a second subscriber identity are determined according to a semantic understanding result of each first text in the first text set; determining a third probability that each second text belongs to the first user identity and a fourth probability of the second user identity according to semantic understanding results of each second text in the second text set; determining a first total probability of the first user identity according to the plurality of first probabilities, determining a second total probability of the second user identity according to the plurality of second probabilities, and determining the first voice direction as the first user identity under the condition that the first total probability is larger than or equal to the second total probability; and determining a third total probability of the first user identity according to the plurality of third probabilities, determining a fourth total probability of the second user identity according to the plurality of fourth probabilities, and determining the second voice direction as the second user identity under the condition that the third total probability is smaller than the fourth total probability.
Specifically, before the inquiry speech is recognized and understood, the corresponding speaker recognition is first performed, i.e., it is determined which of the speakers the doctor says and which of the speakers the patient says. As shown in fig. 3, the speaker separation and identification process mainly uses two information to perform speaker separation and identification, and finally determines the identity information of the speaker a and the speaker B, that is, which is the doctor and which is the patient, by semantically understanding and classifying the text of the corresponding segment of each speaker. The semantic classification adopts a statistical classifier algorithm (two types: doctor/patient), the probability of the doctor/patient category is calculated for each sentence of each speaker, then the category probabilities of all sentences of each speaker are accumulated, and finally whether the speaker A and the speaker B belong to the doctor or the patient is judged, namely the first user identity and the second user identity are described in the embodiment.
In one possible implementation manner of the present disclosure, feature extraction is performed on each first speech segment in the first speech segment set and each second speech segment in the second speech segment set to obtain a plurality of first acoustic features and a plurality of second acoustic features, and the plurality of first acoustic features and the plurality of second acoustic features are decoded and searched through a pre-trained acoustic model and a language model to obtain a first text set and a second text set.
Specifically, for a general non-human-specific speech recognition system, speech signals are first input into a feature extraction and processing module to obtain the required acoustic features, and then mathematical models are used to describe the statistical characteristics of the pronunciation of a large number of speech features and the statistical characteristics of a large number of pronunciation texts, wherein the former mathematical model is generally called an acoustic model, and the latter mathematical model is generally called a language model. And decoding and searching the voice signal to be recognized through an acoustic model and a language model generated by training data to obtain a recognized text. The entire flow chart is shown in fig. 4 below.
In the embodiment of the present disclosure, a labeled voice data sample is obtained, the voice data sample is input to a neural network for training, a basic model is obtained, medical inquiry voice data is input to the basic model for training through labeled medical inquiry voice data, model parameters of the basic model are adjusted, and an acoustic model is obtained.
Specifically, parameters of a basic model are finely adjusted and optimized by collecting and labeling a small amount of voice data of medical inquiry and adopting a model self-adaption and transfer learning algorithm.
In the embodiment of the disclosure, professional text data in the medical field is acquired, the professional text data and the general text data are mixed according to a preset weight, and a language model is trained.
Specifically, there are a large number of terms in the medical field, including diseases, symptoms, drugs, etc., and many terms are easily recognized as mistakes, such as "mahalanobis test" and "paraphernalia" if the language model is not domain customized. The special language model of the medical field is retrained by collecting a large amount of special text data (comprising special books, inquiry dialogue, case history text and the like) of the medical field and mixing the special text data and the general text data together according to a certain weight.
In one possible implementation manner of the disclosure, a plurality of groups of question and answer texts are determined according to a first text set and a second text set, the plurality of groups of question and answer texts are classified through a pre-trained classifier, a question and answer type corresponding to each group of question and answer texts is obtained, a target text is determined according to the question and answer type, information extraction is performed on the target text through a pre-trained information extraction model, a plurality of keywords are obtained, data normalization processing is performed on the keywords according to a preset dictionary and a mapping model, target words are obtained, structured data are generated according to the target words, and question and answer voice data and the structured data are stored in a preset database.
Specifically, in the medical inquiry process, the medical history data acquisition is performed in stages and mainly comprises chief complaints, current medical history, past medical history, personal history, family history, marriage and childbearing history and the like, the semantic analysis module analyzes and understands the inquiry stage and specific questions of a doctor and the answer content of a user according to the collected text data after the doctor and patient answer voice recognition, corresponding key information is extracted from the answer content, the data is normalized, and corresponding structured data is formed, and the specific flow is shown as the following fig. 5.
The semantic analysis module mainly customizes and optimizes a corresponding algorithm and a corresponding model, and specifically comprises the following steps: namely, aiming at the questions and answers of knowledge and diagnosis and treatment scenes in the medical industry, corresponding classification algorithms and models are designed, and corresponding structured information extraction templates and algorithms are designed according to entity information to be extracted, such as relevant information of diseases, symptoms, medicines and the like.
Specifically, for each group of doctor-patient answers, classification is needed, that is, the inquiry stage and specific information type of the answer, such as "what symptom you have recently", are questions of the symptom type in the chief complaint stage, the classification not only depends on the current inquiry text, but also is highly related to the context of the inquiry flow, such as "how long it has been", if the previous inquiry is the chief complaint symptom, the classification of the question is the duration of the symptom, and if the previous inquiry is about the past history, the classification of the question is the time period of the past history. Therefore, the input information of the classifier algorithm comprises the text of the current question and answer, the text of the previous round of question and answer and the classification decision information of the previous round of question and answer, the output information is the classification decision of the current question and answer, the classifier algorithm is based on the deep learning BERT model, and on the basis of the pre-training model, a large amount of dialogue texts for doctor-patient inquiry are adopted for adjusting and training (fine-tuning) of the model parameters.
In particular, the key information answered by the patient, such as medical information including diseases, symptoms, medicines and the like, has a large vocabulary, and simultaneously, the nonstandard word distribution long tail effect is obvious, and the word boundary is difficult to determine when information extraction is carried out. According to the embodiment of the disclosure, by labeling the entity word information in the doctor-patient inquiry dialogue database and adopting a model of a deep migration learning algorithm and conditional random field CRF (the model architecture is shown in the following figure 6), the word boundary can be well determined, and effective entity words are finally extracted.
Specifically, in the inquiry process, the spoken text of the patient answers is more prominent, and the universal language model cannot well cover the spoken words. According to the embodiment of the disclosure, through statistical learning of a doctor-patient inquiry dialogue database, an information entropy and text clustering algorithm is adopted, a large number of unknown words which do not appear in a dictionary are excavated, such as words which represent coughs (cough, somewhat coughing, severe coughs, frequent coughs, occasional coughs and the like), then a support vector machine is used for classifying according to context statistical information of the words, high-quality spoken words are finally screened out, and a corresponding normalized dictionary and a mapping model are formed.
In the disclosure, after the inquiry voice data is intelligently analyzed and structured and the doctor confirms the information editing, the data and the confirmation information can be fed back to the system to retrain and optimize the artificial intelligence model, so as to form a closed loop optimized by the artificial intelligence model.
In one possible implementation manner of the disclosure, the structured data is sent to a terminal for displaying, confirmation information or update information of the structured data is received, inquiry voice data is obtained, and text information is confirmed or updated to train an acoustic model and a language model.
Specifically, the inquiry Speech data and the edited and confirmed text information can be used for retraining an ASR (Automatic Speech Recognition) model shared in the medical field on one hand, and can also be used for adaptive training of a personalized special ASR model for the doctor on the other hand.
In one possible implementation of the present disclosure, the structured data is sent to a terminal for display, confirmation information for the structured data is received, the identified inquiry dialogue text data is obtained, and the classifier is optimized according to the inquiry dialogue text data.
Specifically, after editing and auditing the dialog text data for inquiry, a large amount of dialog corpus is generated, and a text classification algorithm can be optimized.
In one possible implementation manner of the present disclosure, the structured data is sent to the terminal for display, an update instruction for the structured data is received, the structured data is updated according to the update instruction, and the updated structured data is labeled as a training sample for training the information extraction model.
Specifically, the extracted structured data can be used to optimize a migration learning model for key information extraction and the like after being labeled.
As shown in fig. 7, the data edited and confirmed by the doctor is fed back to be used for model optimization of speech recognition (ASR) and semantic analysis (NLP).
Specifically, in addition, data edited and confirmed by a doctor are divided into two types, one type is data which are directly confirmed without modification, the second type is data which are confirmed after editing and modifying, and because the first type of data are correctly analyzed by an Artificial Intelligence (AI) model and the second type of data are processed by an AI model with errors, in model training optimization, the embodiment disclosed by the invention can add larger weight to the second type of data, so that the model training optimization is more targeted, and the performance improvement is more efficient.
Through the closed loop of the model optimization, the AI technical capability is continuously optimized and enhanced, and the optimized AI technology can be used for identifying and analyzing the data in the data platform, so that the data can be better mined for clinical research.
Based on the description, the original voice data in the inquiry process is collected through the intelligent device, and is uploaded to the cloud end in real time for storage, and the data are used for subsequent data analysis and history tracing; analyzing the acquired original voice data by a customized and optimized AI technology, wherein the analysis and the structuring of the data comprise speaker recognition, voice recognition and semantic analysis technologies; the acquired data is subjected to AI analysis to form structured data, and then the structured data is fed back to a doctor expert for confirmation in real time, so that the efficiency and the accuracy of doctor information input are greatly improved; and feeding back data confirmed by the doctor expert for model training and optimization, thereby forming a closed loop for model optimization.
Therefore, the medical inquiry data processing method disclosed by the invention has the following advantages: the method has the advantages that the original voice conversation data in the inquiry process of doctors are completely recorded, the traditional mode has no original data record and only has a small amount of character information transcribed by doctors, the problems that the information is lost or even has many errors and the like due to the fact that doctors/assistants transcribe partial information in a manual mode according to own memory in the traditional mode are solved, the data are recognized and understood through the customized AI technology, then the doctors edit and confirm the data in real time, the efficiency, the integrity and the accuracy of data entry are greatly improved, and meanwhile the time of the doctors is saved; through the AI model optimization closed loop, data confirmed by doctors and experts can be fully utilized to carry out model training optimization, so that the AI model is more and more intelligent in the use process, and the recognition and understanding capacity is stronger and stronger; according to the method and the device, through the voice recognition and voice retrieval technology, the rapid query and analysis can be directly carried out on the original inquiry voice data subsequently, more valuable information is mined, the scientific research efficiency is improved, and the traditional mode has no way of tracing the original data for information mining.
Corresponding to the medical inquiry data processing method provided in the embodiments of fig. 1 to 7, the present disclosure also provides a medical inquiry data processing apparatus, and since the medical inquiry data processing apparatus provided in the embodiments of the present disclosure corresponds to the medical inquiry data processing method provided in the embodiments of fig. 1 to 7, the implementation manner of the medical inquiry data processing method is also applicable to the medical inquiry data processing apparatus provided in the embodiments of the present disclosure, and is not described in detail in the embodiments of the present disclosure.
Fig. 8 is a schematic structural diagram of a medical inquiry data processing device according to an embodiment of the disclosure.
As shown in fig. 8, the medical inquiry data processing apparatus includes: an acquisition module 801, a cutting module 802, a recognition module 803, a semantic understanding module 804, and an acquisition storage module 805.
The acquisition module 801 is configured to acquire inquiry voice data of a target duration in a medical inquiry process.
The cutting module 802 is configured to cut the inquiry speech data according to the speech direction information, and obtain a first speech segment set belonging to a first speech direction and a second speech segment set belonging to a second speech direction.
A recognition module 803, configured to perform speech recognition on the first speech segment set and the second speech segment set, and generate a first text set and a second text set.
A semantic understanding module 804, configured to perform semantic understanding on the first text set and the second text set, and determine a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction.
An obtaining and storing module 805, configured to perform semantic analysis on the first text set and the second text set according to the first user identity and the second user identity to obtain structured data, and store the inquiry speech data and the structured data in a preset database.
Therefore, in the medical inquiry process, inquiry voice data of a target duration is collected, the inquiry voice data is cut according to voice direction information, a first voice fragment set belonging to a first voice direction and a second voice fragment set belonging to a second voice direction are obtained, performing speech recognition on the first set of speech segments and the second set of speech segments to generate a first set of text and a second set of text, semantic understanding is carried out on the first text set and the second text set, a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction are determined, and semantic analysis is carried out on the first text set and the second text set according to the first user identity identification and the second user identity identification to obtain structured data, and the inquiry voice data and the structured data are stored in a preset database. Therefore, in the clinical inquiry process, more efficient information recording and original data storage are realized, data tracing and comparison can be carried out at any time according to needs, and the real validity of the data is ensured.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A medical interrogation data processing method, comprising:
acquiring inquiry voice data of a target duration in a medical inquiry process;
cutting the inquiry voice data according to the voice direction information to obtain a first voice fragment set belonging to a first voice direction and a second voice fragment set belonging to a second voice direction;
performing voice recognition on the first voice fragment set and the second voice fragment set to generate a first text set and a second text set;
semantic understanding is carried out on the first text set and the second text set, and a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction are determined;
and performing semantic analysis on the first text set and the second text set according to the first user identity identification and the second user identity identification to obtain structured data, and storing the inquiry voice data and the structured data in a preset database.
2. The medical interrogation data processing method of claim 1, wherein the semantically understanding the first text set and the second text set, and obtaining a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction, comprises:
determining a first probability that each first text belongs to the first user identity and a second probability of the second user identity according to a semantic understanding result of each first text in the first text set;
determining a third probability that each second text belongs to the first user identity and a fourth probability of the second user identity according to semantic understanding results of each second text in the second text set;
determining a first total probability of the first subscriber identity according to the plurality of first probabilities, determining a second total probability of the second subscriber identity according to the plurality of second probabilities, and determining the first voice direction as the first subscriber identity if the first total probability is greater than or equal to the second total probability;
and determining a third total probability of the first subscriber identity according to the plurality of third probabilities, determining a fourth total probability of the second subscriber identity according to the plurality of fourth probabilities, and determining the second voice direction as the second subscriber identity under the condition that the third total probability is smaller than the fourth total probability.
3. The medical interrogation data processing method of claim 1, wherein the performing speech recognition on the first set of speech segments and the second set of speech segments to generate a first set of text and a second set of text comprises:
performing feature extraction on each first voice fragment in the first voice fragment set and each second voice fragment in the second voice fragment set to obtain a plurality of first acoustic features and a plurality of second acoustic features;
and respectively decoding and searching the first acoustic features and the second acoustic features through a pre-trained acoustic model and a language model to obtain the first text set and the second text set.
4. The medical interrogation data processing method of claim 3,
acquiring a marked voice data sample, inputting the voice data sample into a neural network for training, and acquiring a basic model;
and inputting the medical inquiry voice data into the basic model for training through the labeled medical inquiry voice data, and adjusting the model parameters of the basic model to obtain the acoustic model.
5. The medical interrogation data processing method of claim 3,
professional text data in the medical field are obtained, the professional text data and general text data are mixed according to preset weight, and the language model is trained.
6. The medical inquiry data processing method of claim 1, wherein the semantic analyzing the first text set and the second text set according to the first user id and the second user id to obtain structured data, and storing the inquiry voice data and the structured data in a preset database includes:
determining a plurality of groups of question and answer texts according to the first text set and the second text set, classifying the plurality of groups of question and answer texts through a pre-trained classifier, and acquiring a question and answer type corresponding to each group of question and answer texts;
determining a target text according to the question and answer type, and extracting information of the target text through a pre-trained information extraction model to obtain a plurality of keywords;
carrying out data standardization processing on the plurality of keywords according to a preset dictionary and a mapping model to obtain target words;
and generating the structured data according to the target words, and storing the inquiry voice data and the structured data in a preset database.
7. The medical interrogation data processing method of claim 1, further comprising:
sending the structured data to a terminal for display;
and receiving confirmation information or update information of the structured data, and acquiring inquiry voice data and confirming or updating text information to train an acoustic model and a language model.
8. The medical interrogation data processing method of any of claims 1-6, further comprising:
sending the structured data to a terminal for display;
and receiving confirmation information of the structured data, acquiring identified inquiry dialogue text data, and optimizing a classifier according to the inquiry dialogue text data.
9. The medical interrogation data processing method of any of claims 1-6, further comprising:
sending the structured data to a terminal for display;
receiving an updating instruction of the structured data, and updating the structured data according to the updating instruction;
and marking the updated structured data as a training sample for training the information extraction model.
10. A medical interrogation data processing apparatus, comprising:
the acquisition module is used for acquiring inquiry voice data of a target duration in the medical inquiry process;
the cutting module is used for cutting the inquiry voice data according to the voice direction information to obtain a first voice fragment set belonging to a first voice direction and a second voice fragment set belonging to a second voice direction;
the recognition module is used for carrying out voice recognition on the first voice fragment set and the second voice fragment set to generate a first text set and a second text set;
a semantic understanding module, configured to perform semantic understanding on the first text set and the second text set, and determine a first user identity corresponding to the first voice direction and a second user identity corresponding to the second voice direction;
and the acquisition and storage module is used for performing semantic analysis on the first text set and the second text set according to the first user identity identification and the second user identity identification to acquire structured data, and storing the inquiry voice data and the structured data in a preset database.
CN202110601186.6A 2021-05-31 2021-05-31 Medical inquiry data processing method and device Pending CN113555133A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110601186.6A CN113555133A (en) 2021-05-31 2021-05-31 Medical inquiry data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110601186.6A CN113555133A (en) 2021-05-31 2021-05-31 Medical inquiry data processing method and device

Publications (1)

Publication Number Publication Date
CN113555133A true CN113555133A (en) 2021-10-26

Family

ID=78130244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110601186.6A Pending CN113555133A (en) 2021-05-31 2021-05-31 Medical inquiry data processing method and device

Country Status (1)

Country Link
CN (1) CN113555133A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115579008A (en) * 2022-12-05 2023-01-06 广州小鹏汽车科技有限公司 Voice interaction method, server and computer readable storage medium
CN117253576A (en) * 2023-10-30 2023-12-19 来未来科技(浙江)有限公司 Outpatient electronic medical record generation method based on Chinese medical large model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115579008A (en) * 2022-12-05 2023-01-06 广州小鹏汽车科技有限公司 Voice interaction method, server and computer readable storage medium
CN117253576A (en) * 2023-10-30 2023-12-19 来未来科技(浙江)有限公司 Outpatient electronic medical record generation method based on Chinese medical large model
CN117253576B (en) * 2023-10-30 2024-03-05 来未来科技(浙江)有限公司 Outpatient electronic medical record generation method based on Chinese medical large model

Similar Documents

Publication Publication Date Title
US10347244B2 (en) Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response
US11646032B2 (en) Systems and methods for audio processing
US6434520B1 (en) System and method for indexing and querying audio archives
JP5330450B2 (en) Topic-specific models for text formatting and speech recognition
US8494853B1 (en) Methods and systems for providing speech recognition systems based on speech recordings logs
CN105957531B (en) Speech content extraction method and device based on cloud platform
Li et al. Learning fine-grained cross modality excitement for speech emotion recognition
CN112509560B (en) Voice recognition self-adaption method and system based on cache language model
CN113555133A (en) Medical inquiry data processing method and device
CN115019776A (en) Voice recognition model, training method thereof, voice recognition method and device
CN111180025A (en) Method and device for representing medical record text vector and inquiry system
CN113744727A (en) Model training method, system, terminal device and storage medium
Koumpis et al. Content-based access to spoken audio
CN117149977A (en) Intelligent collecting robot based on robot flow automation
CN113129895B (en) Voice detection processing system
CN112183051A (en) Intelligent voice follow-up method, system, computer equipment, storage medium and program product
CN112908296A (en) Dialect identification method
Andra et al. Contextual keyword spotting in lecture video with deep convolutional neural network
CN114333828A (en) Quick voice recognition system for digital product
Tumminia et al. Diarization of legal proceedings. Identifying and transcribing judicial speech from recorded court audio
CN112951237A (en) Automatic voice recognition method and system based on artificial intelligence
CN112233668A (en) Voice instruction and identity recognition method based on neural network
Lane et al. Local word discovery for interactive transcription
CN112820274B (en) Voice information recognition correction method and system
Gereg et al. Semi-automatic processing and annotation of meeting audio recordings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination