WO2018214663A1 - Voice-based data processing method and apparatus, and electronic device - Google Patents

Voice-based data processing method and apparatus, and electronic device Download PDF

Info

Publication number
WO2018214663A1
WO2018214663A1 PCT/CN2018/082702 CN2018082702W WO2018214663A1 WO 2018214663 A1 WO2018214663 A1 WO 2018214663A1 CN 2018082702 W CN2018082702 W CN 2018082702W WO 2018214663 A1 WO2018214663 A1 WO 2018214663A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
voice
text
segment
text data
Prior art date
Application number
PCT/CN2018/082702
Other languages
French (fr)
Chinese (zh)
Inventor
李明修
银磊
卜海亮
Original Assignee
北京搜狗科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京搜狗科技发展有限公司 filed Critical 北京搜狗科技发展有限公司
Publication of WO2018214663A1 publication Critical patent/WO2018214663A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Definitions

  • the present invention relates to the technical field, and in particular, to a voice-based data processing method, apparatus, and electronic device.
  • Speech recognition usually converts speech into text.
  • Traditional speech recognition recording tools can only convert speech data into corresponding text, but cannot distinguish between speakers. Therefore, in the case of multi-person speech, recording cannot be performed efficiently by speech recognition.
  • Embodiments of the present invention provide a voice-based data processing method to completely record a consultation process.
  • the embodiment of the present invention further provides a voice-based data processing device, an electronic device, and a readable storage medium, to ensure implementation and application of the foregoing method.
  • the embodiment of the present invention discloses a voice-based data processing method, including: obtaining an inquiry process data, where the consultation process data is determined according to voice data collected during the consultation process; The process data is identified, and the corresponding first text data and the second text data are acquired, wherein the first text data belongs to a target user, and the second text data belongs to other users than the target user; The first text data and the second text data obtain the consultation information.
  • the consultation process data is voice data; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: according to the voiceprint feature, from the voice Separating the first voice data and the second voice data from the data; respectively performing voice recognition on the first voice data and the second voice data, and acquiring corresponding first text data and second text data.
  • the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data including: dividing the voice data into multiple voice segments; The speech segment determines the first speech data and the second speech data.
  • the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature a voiceprint feature of the target user; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; acquiring a voice segment that does not match the reference voiceprint feature, and obtaining a corresponding second voice data .
  • the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining a voiceprint feature having the largest number of voice segments, generating first voice data using the voice segment corresponding to the voiceprint feature, and generating second voice data using the voice segment not belonging to the first voice data.
  • the performing the voice recognition on the first voice data and the second voice data separately, and acquiring the corresponding first text data and the second text data including: respectively, respectively, each voice segment in the first voice data
  • speech recognition generating first text data by using the recognized text segment
  • performing speech recognition on each speech segment in the second speech data and generating second text data by using the recognized text segment
  • the first text data and the second text data are used to obtain the consultation information, including: according to the time sequence of each of the text segments in the first text data and the text segments in the second text data respectively corresponding to the voice segments,
  • the text segments are sorted to get the consultation information.
  • the inquiry process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.
  • performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature including a target user language feature and a non-target user language feature; generating the first text data by using the text segment having the target user language feature, and Generating second text data using text segments having non-target user language features.
  • the embodiment of the invention further discloses a voice-based data processing device, comprising: a data acquisition module, configured to acquire the data of the consultation process, wherein the data of the consultation process is determined according to the voice data collected during the consultation process; the text recognition module And for identifying, according to the consultation process data, acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target Other users than the user; the information determining module is configured to obtain the consultation information according to the first text data and the second text data.
  • the query process data is voice data
  • the text recognition module includes: a separation module, configured to separate the first voice data and the second voice data from the voice data according to the voiceprint feature;
  • the voice recognition module is configured to separately perform voice recognition on the first voice data and the second voice data, and acquire corresponding first text data and second text data.
  • the separating module is configured to divide the voice data into multiple voice segments; and according to the voiceprint feature, the voice segment is used to determine the first voice data and the second voice data.
  • the separating module is configured to respectively match each of the voice segments by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user; and acquiring a reference voiceprint feature.
  • the audio segment obtains corresponding first voice data; and acquires an audio segment that does not match the reference voiceprint feature to obtain corresponding second voice data.
  • the separating module is configured to identify voiceprint features of each voice segment; separately count voice segments having the same voiceprint feature and their numbers, and generate second voice data by using the largest number of voice segments, where The largest voiceprint feature is the voiceprint feature of the target user; the second voice data is generated using the remaining voice segments.
  • the voice recognition module is configured to perform voice recognition on each voice segment in the first voice data, and generate first text data by using the recognized text segment; and voices in the second voice data.
  • the segment respectively performs speech recognition, and generates second text data by using the recognized text segment;
  • the information determining module is configured to respectively correspond to each text segment in the first text data and each text segment in the second text data The chronological order of the speech segments, sorting each text segment to obtain the consultation information.
  • the inquiry process data is a text recognition result obtained by the voice data identification; the text recognition module is configured to perform feature recognition on the text recognition result, and separate the first text data and the second according to the language feature. text data.
  • the text recognition module includes: a segment dividing module, configured to divide the text recognition result to obtain a corresponding text segment; and a segment identification module, configured to identify the text segment by using a preset model Determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; a text generation module, configured to generate the first text data by using the text segment having the first language feature, and adopting The text segment having the second language feature generates second text data.
  • Embodiments of the present invention also disclose a readable storage medium, when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform execution based on one or more of the embodiments of the present invention.
  • Voice data processing method when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform execution based on one or more of the embodiments of the present invention.
  • an electronic device includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to execute the one or more by one or more processors
  • the program includes instructions for: obtaining the consultation process data, the diagnosis process data is determined according to the voice data collected during the consultation process; identifying according to the consultation process data, and acquiring the corresponding first text data And second text data, wherein the first text data belongs to a target user, and the second text data belongs to other users than the target user; according to the first text data and the second text data, Get the consultation information.
  • the consultation process data is voice data; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: according to the voiceprint feature, from the voice Separating the first voice data and the second voice data from the data; respectively performing voice recognition on the first voice data and the second voice data, and acquiring corresponding first text data and second text data.
  • the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data including: dividing the voice data into multiple voice segments; The speech segment determines the first speech data and the second speech data.
  • the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature a voiceprint feature of the target user; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; acquiring a voice segment that does not match the reference voiceprint feature, and obtaining a corresponding second voice data .
  • the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining a voiceprint feature having the largest number of voice segments, generating first voice data using the voice segment corresponding to the voiceprint feature, and generating second voice data using the voice segment not belonging to the first voice data.
  • the performing the voice recognition on the first voice data and the second voice data separately, and acquiring the corresponding first text data and the second text data including: respectively, respectively, each voice segment in the first voice data
  • speech recognition generating first text data by using the recognized text segment
  • performing speech recognition on each speech segment in the second speech data and generating second text data by using the recognized text segment
  • the first text data and the second text data are used to obtain the consultation information, including: according to the time sequence of each of the text segments in the first text data and the text segments in the second text data respectively corresponding to the voice segments,
  • the text segments are sorted to get the consultation information.
  • the inquiry process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.
  • performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature including a target user language feature and a non-target user language feature; generating the first text data by using the text segment having the target user language feature, and Generating second text data using text segments having non-target user language features.
  • the first text data and the second text data may be identified according to different users by collecting voice data of the consultation process determined by the voice during the consultation process, wherein the first text data is Having a target user, the second text data belongs to other users than the target user, that is, can automatically distinguish the doctor and the patient's sentence during the consultation, and then according to the first text data and the second text data. Get the consultation information, be able to completely record the consultation process, automatically sort out the medical records, etc., and save the finishing time of the consultation records.
  • FIG. 1 is a flow chart showing the steps of an embodiment of a voice-based data processing method of the present invention
  • FIG. 2 is a flow chart showing the steps of another embodiment of the voice-based data processing method of the present invention.
  • FIG. 3 is a flow chart showing the steps of another embodiment of a voice-based data processing method of the present invention.
  • FIG. 4 is a structural block diagram of an embodiment of a voice-based data processing apparatus of the present invention.
  • FIG. 5 is a structural block diagram of another embodiment of a voice-based data processing apparatus of the present invention.
  • FIG. 6 is a structural block diagram of an electronic device for voice-based data processing according to an exemplary embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of an electronic device according to a voice-based data processing according to another exemplary embodiment of the present invention.
  • FIG. 1 there is shown a flow chart of the steps of an embodiment of a speech-based data processing method of the present invention, which may include the following steps:
  • step 102 the data of the consultation process is obtained, and the data of the consultation process is determined according to the voice data collected during the consultation process.
  • the consultation process can be collected by various electronic devices, and the data of the consultation process can be obtained based on the collected voice data, that is, the data of the consultation process can be collected voice data, or can be collected based on The speech data is converted to a text recognition result.
  • the data of the consultation process can be collected voice data, or can be collected based on The speech data is converted to a text recognition result.
  • embodiments of the present invention can be identified using data collected by various consultation processes.
  • Step 104 Perform identification according to the consultation process data, and obtain corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target Other users than the user.
  • the data of the consultation process can be identified, and different identification methods are adopted according to different data types.
  • the voice data can be processed by voiceprint features, voice recognition, etc.
  • the text data can be identified by text features, thereby obtaining a basis for the user.
  • the first text data and the second text data are distinguished.
  • the consultation process may have at least two users communicating and interacting, one user is a doctor, and other users are patients, family members, and the like. For example, according to the doctor's one-day clinic, it will include a doctor and multiple patients, and may also have one or more family members.
  • the doctor can be the target user
  • the first text data is the doctor's corresponding consultation text data
  • the text data of at least one other user is used as the second text data, that is, the patient and the family corresponding to the consultation. text data.
  • Step 106 Obtain consultation information according to the first text data and the second text data.
  • the first text data and the second text data may be composed of a plurality of text segments, so that the consultation information may be obtained based on the time of the text segment and the corresponding user.
  • Patient B I am not comfortable with XXX.
  • the patient information can also be obtained in combination with the outpatient records of the hospital, thereby distinguishing different patients and the like in the consultation information.
  • the first text data and the second text data may be identified from different data in the consultation process data for the consultation process data determined by collecting the voice during the consultation process, wherein the first text The data belongs to a target user, and the second text data belongs to other users than the target user, that is, the statement that can automatically distinguish the doctor and the patient during the consultation, and then according to the first text data and the second text.
  • the data get the consultation information, can completely record the consultation process, automatically sort out the medical records and other content, and save the finishing time of the consultation records.
  • the consultation process data includes a text recognition result obtained by the voice data and/or the voice data identification.
  • the identification methods of different types of consultation process data are different. Therefore, the embodiments of the present invention respectively discuss the processing process of different types of consultation process data.
  • the data of the consultation process is voice data.
  • the method may include the following steps:
  • Step 202 Obtain a consultation process data, where the consultation process data is voice data collected during the consultation process.
  • voice data can be collected through the electronic device through various electronic devices, for example, recording audio data through a recording pen, a mobile phone, a computer, etc., and obtaining voice data collected during the consultation process, the voice data.
  • the voice data that can be collected for one outpatient clinic, or the voice data collected by a doctor in multiple clinics, is not limited in this embodiment of the present invention. Therefore, the voice data includes voice data of a doctor, and voice data of at least one patient, and may also include voice data of at least one patient's family.
  • the step 104 is performed according to the data of the consultation process, and the corresponding first text data and the second text data are obtained, which may include the following steps 204-206.
  • Step 204 Separate the first voice data and the second voice data from the voice data according to the voiceprint feature.
  • Voiceprint refers to the spectrum of sound waves carrying speech information displayed by electroacoustic instruments. Voiceprints are characterized by specificity and stability. After adulthood, human voiceprints can remain relatively stable for a long time, so different people can be identified through voiceprints. Therefore, for the voice data, the voiceprint feature can be identified, and the voice segment corresponding to different users (voiceprint features) in the voice data is determined, thereby obtaining the first voice data of the target user and the second voice data of the other user.
  • the separating the first voice data and the second voice data from the voice data according to the voiceprint feature comprising: dividing the voice data into a plurality of voice segments; and using the voice according to the voiceprint feature The segment determines the first voice data and the second voice data.
  • the voice data can be divided into a plurality of voice segments.
  • the voice division rule for example, the pause interval of the sound segment is divided; or according to the voiceprint feature, that is, the voiceprint feature corresponding to each sound is determined, thereby dividing the voice segment according to different voiceprint features. Therefore, one voice data can divide a plurality of voice segments, and each voice segment has a sequence of front and back, and different voice segments can have the same or different voiceprint features. Therefore, based on the voiceprint feature, whether each voice segment belongs to the first voice data or the second voice data is determined, and the voiceprint feature of each voice segment can be determined, and then multiple voices having the voiceprint feature of the target user are determined.
  • the segments constitute the first voice data, and the other remaining voice segments constitute the second voice data.
  • the doctor before collecting the voice data in the consultation process, the doctor (target user) may first collect a piece of voice as the reference data, so as to identify the voiceprint feature of the doctor from the reference data, that is, the reference voiceprint. feature.
  • a voice recognition model may also be set. After the voice data is input into the voice recognition model, the voice segment conforming to the reference voiceprint data may be separated from the voice segment of other voiceprint features, thereby obtaining each voice segment of the target user. And other user's voice clips.
  • the medical record information is usually only included in one doctor, and there may be more than one patient, so that a corresponding large number of medical samples can be obtained for a specific doctor in the above manner.
  • the voiceprint feature of the target user may be collected in advance as a reference voiceprint feature, thereby dividing the voice data. That is, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature is a target user a voiceprint feature; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; and acquiring a voice segment that does not conform to the reference voiceprint feature, to obtain corresponding second voice data.
  • the voice data may be collected in advance to extract the voiceprint feature, and the voiceprint feature of the target user is used as the reference voiceprint feature, so that for the voice data having the target user, the reference voiceprint feature may be used for each
  • the voice segments are respectively matched to determine whether the voiceprint features in the voice segments are consistent with the reference voiceprint features. If they are consistent, the voice segments are considered to match the reference voiceprint features, and the voice segments are added to the first voice data (ie, the target) User-specific voice data).
  • the voice segment does not match the reference voiceprint feature
  • the voice segment does not match the reference voiceprint feature
  • the voice segment is added to the second voice data (ie, the voice data corresponding to the non-target user). That is, the first voice data and the second voice data are each composed of corresponding voice segments, wherein each voice segment also has a sequential relationship, thereby facilitating subsequent accurate determination of the consultation information.
  • the division of the voice data may also be performed by the number of voice segments corresponding to the same voiceprint feature in the voice data. That is, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature; The voiceprint feature having the largest number of voice segments generates the first voice data by using the voice segment corresponding to the voiceprint feature, wherein the largest number of voiceprint features are voiceprint features of the target user; and the voice data features that are not belonging to the first voice data are used.
  • the speech segment generates second speech data.
  • the data of the consultation process may be the record data of a doctor's multiple outpatient clinics. Therefore, in this process, doctors often occupy more time to communicate with different patients and their families, that is, voice data.
  • the Chinese doctor (target user) has the largest number of voices, so the target user and other users can be distinguished according to the number of corresponding voice segments of different users, and the first voice data and the second voice data are obtained.
  • the voiceprint features in the voice segment can be identified, the voiceprint features included in each voice segment are determined, and then the number of voice segments corresponding to each voiceprint feature is separately counted, and the voiceprint having the largest number of voice segments is determined.
  • the voiceprint feature is determined as a voiceprint feature of the target user, and the other voiceprint features are voiceprint features of other users, so that the voice segment having the voiceprint feature of the target user sequentially constitutes the first audio data, and the other
  • the speech segments i.e., the speech segments that do not belong to the first speech data
  • a voice segment may include voiceprint features of a plurality of users. For the case of identifying multiple voiceprint features from a voice segment: when different voiceprint features appear at different times, if the voiceprint feature is a voiceprint feature of other users, the voice segment may be added to the In the two voice data; if the voiceprint feature includes the voiceprint feature of the target user and the voiceprint feature of other users, the voice segment may be further divided into sub-segments and added to the corresponding voice data.
  • the voice segment may be added to the second voice data.
  • the voiceprint feature includes the voiceprint feature of the target user and the voiceprint feature of other users, it may be divided according to requirements, for example, the voice segment is classified into the voice segment of the target user to obtain the first voice data, or the voice segment is returned.
  • the second voice data is obtained for the voice segments of other users, or added separately in the voice data of the two users.
  • Step 206 Perform voice recognition on the first voice data and the second voice data respectively, and acquire corresponding first text data and second text data.
  • the two voice data can be separately identified, thereby obtaining first text data of the target user and second text data of other users.
  • performing voice recognition on the first voice data and the second voice data separately, and acquiring corresponding first text data and second text data including: performing voice segments in the first voice data Performing speech recognition separately, generating first text data by using the recognized text segment; performing speech recognition on each speech segment in the second speech data, and generating second text data by using the recognized text segment.
  • the text data corresponding to the voice segment can be obtained by the identification of each voice segment by the first voice data, so that the first text data is formed according to the sequence of the voice segments, and the second text data can also be obtained in a corresponding manner.
  • Step 208 Obtain consultation information according to the first text data and the second text data.
  • each text segment in the first text data and each text segment in the second text data may be sorted according to a corresponding order, such as a time sequence, thereby obtaining Corresponding consultation information
  • the consultation information can record the doctor's question in a consultation and the corresponding patient (family)'s answer, as well as the doctor's diagnosis, medical advice and other information.
  • Step 210 Perform analysis on the consultation information to obtain a corresponding analysis result, and the analysis result is related to disease diagnosis.
  • the embodiment of the present invention can also analyze the consultation information according to the requirements, and obtain the corresponding analysis result. Since the consultation is related to the diagnosis of the disease, the analysis result is also related to the diagnosis of the disease. Determined based on analytical needs.
  • the data of the consultation process is a text recognition result obtained by the voice data identification, and may specifically include The following steps:
  • Step 302 Acquire a text recognition result obtained by the voice data identification.
  • the voice data is collected during the consultation process, and the collected voice data is converted into the text recognition result by voice recognition, and the text recognition result can be directly obtained.
  • the step 104 is performed according to the data of the consultation process, and the corresponding first text data and the second text data are obtained, which may include the following step 304.
  • Step 304 Perform feature recognition on the text recognition result, and separate the first text data and the second text data according to the language feature.
  • the embodiment of the present invention recognizes the words of different users from the text recognition result and organizes the consultation information. Among them, during the consultation process, the doctor usually asks the symptoms, and the user will reply to the symptoms, and the doctor will diagnose the disease, the required examination, the required medicine, etc., so that the characteristics can be identified from the text recognition result based on these characteristics. The doctor and patient statements are separated, and the first text data and the second text data are separated.
  • the embodiment of the present invention can collect the text of the doctor's consultation and the text of the patient's consultation in advance, and collect the information of the examination for each analysis, thereby counting the language characteristics of the doctor (ie, the target user), and the patient and the patient.
  • a predetermined model can be established by determining the language features of different users by means of machine learning, probability statistics, and the like.
  • the embodiment of the present invention can obtain a large number of separated medical texts as training data, and the separated medical texts identify the medical information of the target user and other users, such as the textual information obtained historically based on the identification.
  • the doctor content data (first text data of the target user) and the patient content data (second text data of other users) may be separately trained to obtain a doctor content model and a patient content model, and of course, the two models may be synthesized.
  • a preset model based on which the doctor's statement and the patient's statement are recognized.
  • the doctor's content is generally a question with a symptomatic vocabulary, such as how you feel, what symptoms, what is uncomfortable, etc.; and the patient's content is generally symptomatic.
  • Question of epidemic disease for example, is it a cold, is it XX disease, etc.
  • the contents of the doctor are usually statements with symptoms and medicines, for example, you have a cold, you can eat XX medicine and so on. Therefore, both the doctor's sentence content and the patient's sentence content have relatively significant language features, so the doctor content model and the patient content model can be trained according to the separated medical case information.
  • Performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature comprising: dividing the text recognition result to obtain a corresponding text segment; and using the preset model to the text Identifying a segment, determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; generating a first text data using the text segment having the first language feature, and employing the second
  • the text segment of the linguistic feature generates second text data.
  • the text recognition result may be first divided, and the text recognition result may be divided into sentences according to Chinese sentence features, and multiple text segments may be divided according to other methods.
  • each text segment is sequentially input into a preset model, and the text segment is identified by the preset model, so that the language features of each text segment can be identified.
  • the preset model can also be set to divide the user segment into the user based on the recognized language feature. Where the character of the target user is used as the first language feature and the language feature of the other user is used as the second language feature, the preset model may be used to determine that the text segment has the first language feature or the second language feature. The text segment having the first language feature may then be generated into the first text data in accordance with the division order of the text segments, and the second text data may be generated using the text segment having the second language feature.
  • Step 306 Obtain consultation information according to the first text data and the second text data.
  • Step 308 analyzing the consultation information to obtain a corresponding analysis result, and the analysis result is related to the disease diagnosis.
  • each text segment in the first text data and each text segment in the second text data may be sorted according to a corresponding order, thereby obtaining corresponding consultation information.
  • the consultation information can record the doctor's question in a consultation and the answer of the corresponding patient (family), as well as the doctor's diagnosis, medical advice and other information.
  • the embodiment of the present invention can also analyze the consultation information according to the requirements, and obtain the corresponding analysis result. Since the consultation is related to the diagnosis of the disease, the analysis result is also related to the diagnosis of the disease. Determined based on analytical needs.
  • the communication process with the patient can be recorded by means of recording, and then the doctor and patient statements can be separated, differentiated and arranged, and provided to the doctor in the form of dialogue. As a medical case, it can effectively reduce the time spent by doctors on medical records.
  • FIG. 4 a structural block diagram of an embodiment of a voice-based data processing apparatus of the present invention is shown, which may specifically include the following modules:
  • the data acquisition module 402 is configured to obtain the data of the consultation process, and the data of the consultation process is determined according to the voice data collected during the consultation process.
  • the text identification module 404 is configured to perform identification according to the consultation process data, and obtain corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to Other users than the target user.
  • the information determining module 406 is configured to obtain the consultation information according to the first text data and the second text data.
  • the consultation process may have at least two users communicating and interacting, one user is a doctor, and other users are patients, family members, and the like.
  • the doctor can be the target user
  • the first text data is the doctor's corresponding consultation text data
  • the text data of at least one other user is used as the second text data, that is, the patient and the family corresponding to the consultation.
  • text data Since the consultation is usually a question and answer process, the first text data and the second text data may be composed of a plurality of text segments, so that the consultation information may be obtained based on the time of the text segment and the corresponding user.
  • the patient information can also be obtained in combination with the outpatient records of the hospital, thereby distinguishing different patients and the like in the consultation information.
  • the first text data and the second text data may be identified according to different users from the consultation process data, wherein the first text data Having a target user, the second text data belongs to other users than the target user, that is, can automatically distinguish the doctor and the patient's sentence during the consultation, and then according to the first text data and the second text data.
  • Get the consultation information be able to completely record the consultation process, automatically sort out the medical records, etc., and save the finishing time of the consultation records.
  • FIG. 5 a structural block diagram of an embodiment of a voice-based data processing apparatus of the present invention is shown, which may specifically include the following modules:
  • the consultation process data includes text recognition results obtained by voice data and/or voice data recognition.
  • the query process data is voice data;
  • the text recognition module 404 can include:
  • the separating module 40402 is configured to separate the first voice data and the second voice data from the voice data according to the voiceprint feature.
  • the voice recognition module 40404 is configured to separately perform voice recognition on the first voice data and the second voice data, and acquire corresponding first text data and second text data.
  • the separating module 40402 is configured to divide the voice data into a plurality of voice segments; and according to the voiceprint feature, the voice segment is used to determine the first voice data and the second voice data.
  • the separating module 40402 is configured to respectively match each of the voice segments by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user; and acquiring a reference voiceprint feature.
  • the voice segment obtains corresponding first voice data; and acquires a voice segment that does not match the reference voiceprint feature, to obtain corresponding second voice data.
  • the doctor before collecting the voice data in the consultation process, the doctor (target user) may first collect a piece of voice as the reference data, so as to identify the voiceprint feature of the doctor from the reference data, that is, the reference voiceprint. feature.
  • a voice recognition model may also be set. After the voice data is input into the voice recognition model, the voice segment conforming to the reference voiceprint data may be separated from the voice segment of other voiceprint features, thereby obtaining each voice segment of the target user. And other user's voice clips.
  • the medical record information is usually only included in one doctor, and there may be more than one patient, so that a corresponding large number of medical samples can be obtained for a specific doctor in the above manner.
  • the separating module 40402 is configured to identify voiceprint features of each voice segment; separately count the number of voice segments corresponding to each voiceprint feature, and determine the voiceprint feature having the largest number of voice segments, using the sound
  • the voice segment corresponding to the pattern feature generates first voice data, wherein the largest number of voiceprint features are voiceprint features of the target user; and the second voice data is generated using voice segments that do not belong to the first voice data.
  • the data through the consultation process may be the record data of a doctor's multiple outpatient clinics. Therefore, in this process, doctors often occupy more time to communicate with different patients and their families, that is, voice.
  • the doctor (target user) has the largest number of voices in the data, so the target user and other users can be distinguished according to the number of corresponding voice segments of different users, and the first voice data and the second voice data are obtained.
  • a voice segment may include voiceprint features of a plurality of users.
  • the separation module 40402 may perform the following processing for identifying a plurality of voiceprint features from a voice segment: when different voiceprint features occur at different times, and if the voiceprint features are voiceprint features of other users, The voice segment may be added to the second voice data; if the voiceprint feature includes the voiceprint feature of the target user and other user's voiceprint features, the voice segment may be subdivided into sub-segments and added to the corresponding voice data. in.
  • the voice segment may be added to the second voice data.
  • the voiceprint feature includes the voiceprint feature of the target user and the voiceprint feature of other users, it may be divided according to requirements, for example, the voice segment is classified into the voice segment of the target user to obtain the first voice data, or the voice segment is returned.
  • the second voice data is obtained for the voice segments of other users, or added separately in the voice data of the two users.
  • the voice recognition module 40404 is configured to perform voice recognition on each voice segment in the first voice data, and generate first text data by using the recognized text segment; and voices in the second voice data.
  • the segments respectively perform speech recognition, and the second text data is generated by using the recognized text segments.
  • the information determining module 406 is configured to sort each text segment according to the time sequence of each text segment in the first text data and each text segment in the second text data, and obtain an inquiry. information.
  • the inquiry process data is a text recognition result obtained by the voice data identification; the text recognition module 404 is configured to perform feature recognition on the text recognition result, and separate the first text data and the second according to the language feature. text data.
  • the text recognition module 404 includes:
  • the segment dividing module 40406 is configured to divide the text recognition result to obtain a corresponding text segment.
  • the segment identification module 40408 is configured to identify the text segment by using a preset model, and determine a language feature that the text segment has, the language feature including a first language feature and a second language feature.
  • the embodiment of the present invention can obtain a large number of separated medical texts as training data, and the separated medical texts identify the medical information of the target user and other users, such as the textual information obtained historically based on the identification.
  • the doctor content data (first text data of the target user) and the patient content data (second text data of other users) may be separately trained to obtain a doctor content model and a patient content model, and of course, the two models may be synthesized.
  • a preset model based on which the doctor's statement and the patient's statement are recognized. For example, in the case information obtained from the consultation, the doctor's content is generally a question with a symptomatic vocabulary, such as how you feel, what symptoms, what is uncomfortable, etc.; and the patient's content is generally symptomatic.
  • the text generating module 40410 is configured to generate the first text data by using the text segment having the first language feature, and generate the second text data by using the text segment having the second language feature.
  • the device further includes: an analysis module 408, configured to analyze the consultation information, and obtain a corresponding analysis result, where the analysis result is related to a disease diagnosis.
  • an analysis module 408 configured to analyze the consultation information, and obtain a corresponding analysis result, where the analysis result is related to a disease diagnosis.
  • each text segment in the first text data and each text segment in the second text data may be sorted according to a corresponding order, thereby obtaining corresponding consultation information.
  • the consultation information can record the doctor's question in a consultation and the answer of the corresponding patient (family), as well as the doctor's diagnosis, medical advice and other information.
  • the embodiment of the present invention can also analyze the consultation information according to the requirements, and obtain the corresponding analysis result. Since the consultation is related to the diagnosis of the disease, the analysis result is also related to the diagnosis of the disease. Determined based on analytical needs.
  • the communication process with the patient can be recorded by means of recording, and then the doctor and patient statements can be separated, differentiated and arranged, and provided to the doctor in the form of dialogue. As a medical case, it can effectively reduce the time spent by doctors on medical records.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • FIG. 6 is a structural block diagram of an electronic device 600 for voice-based data processing, according to an exemplary embodiment.
  • the electronic device 600 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc., or a server device such as a server.
  • the electronic device 600 can include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, and a sensor component 614. And communication component 616.
  • Processing component 602 typically controls the overall operation of electronic device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • Processing component 602 can include one or more processors 620 to execute instructions to perform all or part of the steps described above.
  • processing component 602 can include one or more modules to facilitate interaction between component 602 and other components.
  • processing component 602 can include a multimedia module to facilitate interaction between multimedia component 608 and processing component 602.
  • Memory 604 is configured to store various types of data to support operation at device 600. Examples of such data include instructions for any application or method operating on electronic device 600, contact data, phone book data, messages, pictures, videos, and the like.
  • the memory 604 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Disk Disk or Optical Disk.
  • Power component 604 provides power to various components of electronic device 600.
  • Power component 604 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 600.
  • the multimedia component 608 includes a screen between the electronic device 600 and a user that provides an output interface.
  • the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation.
  • the multimedia component 608 includes a front camera and/or a rear camera. When the electronic device 600 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 610 is configured to output and/or input an audio signal.
  • the audio component 610 includes a microphone (MIC) that is configured to receive an external audio signal when the electronic device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal may be further stored in memory 604 or transmitted via communication component 616.
  • audio component 610 also includes a speaker for outputting an audio signal.
  • the I/O interface 612 provides an interface between the processing component 402 and the peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a launch button, and a lock button.
  • Sensor assembly 614 includes one or more sensors for providing electronic device 600 with a status assessment of various aspects.
  • sensor component 614 can detect an open/closed state of device 600, a relative positioning of components, such as the display and keypad of electronic device 600, and sensor component 614 can also detect a component of electronic device 600 or electronic device 600. The position changes, the presence or absence of contact of the user with the electronic device 600, the orientation or acceleration/deceleration of the electronic device 600, and the temperature change of the electronic device 600.
  • Sensor assembly 614 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 614 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 616 is configured to facilitate wired or wireless communication between electronic device 600 and other devices.
  • the electronic device 400 can access a wireless network based on a communication standard such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 614 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel.
  • the communication component 614 also includes a near field communication (NFC) module to facilitate short range communication.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • electronic device 600 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), A gated array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the above methods.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA gated array
  • controller microcontroller, microprocessor, or other electronic component implementation for performing the above methods.
  • non-transitory computer readable storage medium comprising instructions, such as a memory 604 comprising instructions executable by processor 620 of electronic device 400 to perform the above method.
  • the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a non-transitory computer readable storage medium when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform a voice-based data processing method, the method comprising: obtaining a consultation Process data, the consultation process data is determined according to the voice data collected during the consultation process; the identification is performed according to the consultation process data, and the corresponding first text data and second text data are acquired, wherein the first text The data belongs to a target user, and the second text data belongs to other users than the target user; according to the first text data and the second text data, the consultation information is obtained.
  • the consultation process data includes a text recognition result obtained by the voice data and/or the voice data identification.
  • the consultation process data is voice data; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: according to the voiceprint feature, from the voice Separating the first voice data and the second voice data from the data; respectively performing voice recognition on the first voice data and the second voice data, and acquiring corresponding first text data and second text data.
  • the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data including: dividing the voice data into multiple voice segments; according to the voiceprint feature, adopting The speech segment determines the first speech data and the second speech data.
  • the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature a voiceprint feature of the target user; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; acquiring a voice segment that does not match the reference voiceprint feature, and obtaining a corresponding second voice data .
  • determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining the voiceprint feature having the largest number of voice segments, generating the first voice data using the voice segment corresponding to the voiceprint feature, and generating the second voice data using the voice segment not belonging to the first voice data.
  • performing voice recognition on the first voice data and the second voice data to obtain corresponding first text data and second text data including: performing voice separately on each voice segment in the first voice data Identifying, generating the first text data by using the recognized text segment; separately performing speech recognition on each of the second speech data segments, and generating the second text data by using the recognized text segment.
  • the consultation process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.
  • performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; generating a first text data using the text segment having the first language feature, and The second text data is generated using the text segment having the second language feature.
  • the method further includes: analyzing the consultation information, and obtaining a corresponding analysis result, where the analysis result is related to the disease diagnosis.
  • FIG. 7 is a schematic structural diagram of an electronic device 700 for voice-based data processing according to another exemplary embodiment of the present invention.
  • the electronic device 700 can be a server that can vary considerably depending on configuration or performance, and can include one or more central processing units (CPUs) 722 (eg, one or more processors) And memory 732, one or more storage media 730 storing application 742 or data 744 (eg, one or one storage device in Shanghai).
  • the memory 732 and the storage medium 730 may be short-term storage or persistent storage.
  • the program stored on storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations in the server.
  • central processor 722 can be configured to communicate with storage medium 730, executing a series of instruction operations in storage medium 730 on the server.
  • the server may also include one or more power sources 726, one or more wired or wireless network interfaces 750, one or more input and output interfaces 758, one or more keyboards 756, and/or one or more operating systems 741, For example, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the server is configured to execute, by one or more central processors 722, one or more programs including instructions for: obtaining consultation process data, the consultation process data being based on the consultation Determining the voice data collected in the process; performing identification according to the consultation process data, and acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data It belongs to other users than the target user; according to the first text data and the second text data, the consultation information is obtained.
  • the consultation process data includes a text recognition result obtained by the voice data and/or the voice data identification.
  • the consultation process data is voice data; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: according to the voiceprint feature, from the voice Separating the first voice data and the second voice data from the data; respectively performing voice recognition on the first voice data and the second voice data, and acquiring corresponding first text data and second text data.
  • the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data including: dividing the voice data into multiple voice segments; The speech segment determines the first speech data and the second speech data.
  • the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature a voiceprint feature of the target user; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; acquiring a voice segment that does not match the reference voiceprint feature, and obtaining a corresponding second voice data .
  • the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining the voiceprint feature having the largest number of voice segments, generating the first voice data using the voice segment corresponding to the voiceprint feature, and generating the second voice data using the voice segment not belonging to the first voice data.
  • performing voice recognition on the first voice data and the second voice data to obtain corresponding first text data and second text data including: performing voice separately on each voice segment in the first voice data Identifying, generating the first text data by using the recognized text segment; separately performing speech recognition on each of the second speech data segments, and generating the second text data by using the recognized text segment.
  • the inquiry process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.
  • performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; generating a first text data using the text segment having the first language feature, and The second text data is generated using the text segment having the second language feature.
  • the one or more programs includes instructions for performing the following operations: analyzing the medical consultation information to obtain a corresponding analysis result, the analysis result Related to the diagnosis of the disease.
  • embodiments of the embodiments of the invention may be provided as a method, apparatus, or computer program product.
  • embodiments of the invention may be in the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware.
  • embodiments of the invention may take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • Embodiments of the invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Abstract

A voice-based data processing method and apparatus, and an electronic device, used for completely recording a diagnostic inquiry process. The method comprises: acquiring diagnostic inquiry process data, the diagnostic inquiry process data being determined according to voice data collected in a diagnostic inquiry process (102); performing recognition according to the diagnostic inquiry process data to acquire corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to another user other than the target user (104); and obtaining diagnostic inquiry information according to the first text data and the second text data (106). By means of the method, the apparatus and the electronic device, the statements of a doctor and a patient in a diagnostic inquiry process can be automatically distinguished, the diagnostic inquiry process is completely recorded, and same is automatically organised to obtain content such as a medical record, thus saving on time for organising diagnostic inquiry records.

Description

一种基于语音的数据处理方法、装置和电子设备Voice-based data processing method, device and electronic device
本申请要求在中国申请的申请号为201710384412.3、申请日为2017年5月26日、发明名称为“一种基于语音的数据处理方法、装置和电子设备”的发明专利申请的全部优先权。The present application claims the entire priority of the invention patent application filed in the Chinese application No. 201710384412.3, the filing date is May 26, 2017, and the invention is entitled "a voice-based data processing method, apparatus and electronic device".
技术领域Technical field
本发明涉及技术领域,特别是涉及一种基于语音的数据处理方法、装置和电子设备。The present invention relates to the technical field, and in particular, to a voice-based data processing method, apparatus, and electronic device.
背景技术Background technique
语音识别通常是将语音转换成文字,传统的语音识别记录工具只能将语音数据转换为相应的文字,而无法区分说话人。因此在多人语音的情况下,通过语音识别无法有效的进行记录。Speech recognition usually converts speech into text. Traditional speech recognition recording tools can only convert speech data into corresponding text, but cannot distinguish between speakers. Therefore, in the case of multi-person speech, recording cannot be performed efficiently by speech recognition.
例如在医院实际诊疗过程中,至少会有两人进行交流,即至少会有医生和患者进行交流,有时还可能包括患者家属等,而通过现有语音识别工具无法实现对获取的语音问诊记录分别对应的语音产生者进行区分,无法全面的记录整个问诊过程。For example, in the actual hospital treatment process, at least two people will communicate, that is, at least there will be communication between the doctor and the patient, and sometimes the patient's family, etc., and the voice diagnosis record obtained by the existing speech recognition tool cannot be realized. The corresponding voice producers are distinguished, and the entire consultation process cannot be comprehensively recorded.
发明内容Summary of the invention
本发明实施例提供一种基于语音的数据处理方法,以完整的记录问诊过程。Embodiments of the present invention provide a voice-based data processing method to completely record a consultation process.
相应的,本发明实施例还提供了一种基于语音的数据处理装置、一种电子设备、一种可读存储介质,用以保证上述方法的实现及应用。Correspondingly, the embodiment of the present invention further provides a voice-based data processing device, an electronic device, and a readable storage medium, to ensure implementation and application of the foregoing method.
为了解决上述问题,本发明实施例公开了一种基于语音的数据处理方法,包括:获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定;依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户;依据所述第一文本数据和第二文本数据,得到问诊信息。In order to solve the above problem, the embodiment of the present invention discloses a voice-based data processing method, including: obtaining an inquiry process data, where the consultation process data is determined according to voice data collected during the consultation process; The process data is identified, and the corresponding first text data and the second text data are acquired, wherein the first text data belongs to a target user, and the second text data belongs to other users than the target user; The first text data and the second text data obtain the consultation information.
可选的,所述问诊过程数据为语音数据;所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据;对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。Optionally, the consultation process data is voice data; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: according to the voiceprint feature, from the voice Separating the first voice data and the second voice data from the data; respectively performing voice recognition on the first voice data and the second voice data, and acquiring corresponding first text data and second text data.
可选的,所述依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据,包括:将所述语音数据划分为多个语音片段;依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。Optionally, the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data, including: dividing the voice data into multiple voice segments; The speech segment determines the first speech data and the second speech data.
可选的,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;获取与所述基准声纹特征相符的语音片段,得到对应的第一语音数据;获取与所述基准声纹特征不相符的语音片段,得到对应的第二语音数据。Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature a voiceprint feature of the target user; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; acquiring a voice segment that does not match the reference voiceprint feature, and obtaining a corresponding second voice data .
可选的,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:对各语音片段的声纹特征进行识别;统计各声纹特征对应语音片段的数量;确定具有语音片段的数量最大的声纹特征,采用所述声纹特征对应的语音片段生成第一语音数据;采用不属于所述第一语音数据的语音片段生成第二语音数据。Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining a voiceprint feature having the largest number of voice segments, generating first voice data using the voice segment corresponding to the voiceprint feature, and generating second voice data using the voice segment not belonging to the first voice data.
可选的,所述对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据,包括:对所述第一语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第一文本数据;对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据;则,所述依据所述第一文本数据和第二文本数据,得到问诊信息,包括:依据所述第一文本数据中各文本片段和所述第二文本数据中各文本片段分别对应语音片段的时间顺序,对各文本片段进行排序,得到问诊信息。Optionally, the performing the voice recognition on the first voice data and the second voice data separately, and acquiring the corresponding first text data and the second text data, including: respectively, respectively, each voice segment in the first voice data Performing speech recognition, generating first text data by using the recognized text segment; performing speech recognition on each speech segment in the second speech data, and generating second text data by using the recognized text segment; The first text data and the second text data are used to obtain the consultation information, including: according to the time sequence of each of the text segments in the first text data and the text segments in the second text data respectively corresponding to the voice segments, The text segments are sorted to get the consultation information.
可选的,所述问诊过程数据为语音数据识别得到的文本识别结果;所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。Optionally, the inquiry process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.
可选的,对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据,包括:对所述文本识别结果进行划分,获取对应的文本片段;采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括目标用户语言特征和非目标用户语言特征;采用具有目标用户语言特征的文本片段生成第一文本数据,以及,采用具有非目标用户语言特征的文本片段生成第二文本数据。Optionally, performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature, including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature including a target user language feature and a non-target user language feature; generating the first text data by using the text segment having the target user language feature, and Generating second text data using text segments having non-target user language features.
本发明实施例还公开了一种基于语音的数据处理装置,包括:数据获取模块,用于获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定;文本识别模块,用于依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户;信息确定模块,用于依据所述第一文本数据和第二文本数据,得到问诊信息。The embodiment of the invention further discloses a voice-based data processing device, comprising: a data acquisition module, configured to acquire the data of the consultation process, wherein the data of the consultation process is determined according to the voice data collected during the consultation process; the text recognition module And for identifying, according to the consultation process data, acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target Other users than the user; the information determining module is configured to obtain the consultation information according to the first text data and the second text data.
可选的,所述问诊过程数据为语音数据;所述文本识别模块,包括:分离模块,用于依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据;语音识别模块,用于对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。Optionally, the query process data is voice data; the text recognition module includes: a separation module, configured to separate the first voice data and the second voice data from the voice data according to the voiceprint feature; The voice recognition module is configured to separately perform voice recognition on the first voice data and the second voice data, and acquire corresponding first text data and second text data.
可选的,所述分离模块,用于将所述语音数据划分为多个语音片段;依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。Optionally, the separating module is configured to divide the voice data into multiple voice segments; and according to the voiceprint feature, the voice segment is used to determine the first voice data and the second voice data.
可选的,所述分离模块,用于采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;获取与所述基准声纹特征相符的音频片段,得到对应的第一语音数据;获取与所述基准声纹特征不相符的音频片段,得到对应的第二语音数据。Optionally, the separating module is configured to respectively match each of the voice segments by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user; and acquiring a reference voiceprint feature. The audio segment obtains corresponding first voice data; and acquires an audio segment that does not match the reference voiceprint feature to obtain corresponding second voice data.
可选的,所述分离模块,用于对各语音片段的声纹特征进行识别;分别统计具有相同声纹特征的语音片段及其数量,采用数量最大语音片段生成第二语音数据,其中,数量最大的声纹特征为目标用户的声纹特征;采用剩余的语音片段生成第二语音数据。Optionally, the separating module is configured to identify voiceprint features of each voice segment; separately count voice segments having the same voiceprint feature and their numbers, and generate second voice data by using the largest number of voice segments, where The largest voiceprint feature is the voiceprint feature of the target user; the second voice data is generated using the remaining voice segments.
可选的,所述语音识别模块,用于对所述第一语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第一文本数据;对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据;所述信息确定模块,用于依据所述第一文本数据中各文本片段和所述第二文本数据中各文本片段分别对应语音片段的时间顺序,对各文本片段进行排序,得到问诊信息。Optionally, the voice recognition module is configured to perform voice recognition on each voice segment in the first voice data, and generate first text data by using the recognized text segment; and voices in the second voice data. The segment respectively performs speech recognition, and generates second text data by using the recognized text segment; the information determining module is configured to respectively correspond to each text segment in the first text data and each text segment in the second text data The chronological order of the speech segments, sorting each text segment to obtain the consultation information.
可选的,所述问诊过程数据为语音数据识别得到的文本识别结果;所述文本识别模块,用于对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。Optionally, the inquiry process data is a text recognition result obtained by the voice data identification; the text recognition module is configured to perform feature recognition on the text recognition result, and separate the first text data and the second according to the language feature. text data.
可选的,所述文本识别模块,包括:片段划分模块,用于对所述文本识别结果进行划分,获取对应的文本片段;片段识别模块,用于采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括第一语言特征和第二语言特征;文本生成模块,用于采用具有第一语言特征的文本片段生成第一文本数据,以及,采用具有第二语言特征的文本片段生成第二文本数据。Optionally, the text recognition module includes: a segment dividing module, configured to divide the text recognition result to obtain a corresponding text segment; and a segment identification module, configured to identify the text segment by using a preset model Determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; a text generation module, configured to generate the first text data by using the text segment having the first language feature, and adopting The text segment having the second language feature generates second text data.
本发明实施例还公开了一种可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如本发明实施例中一个或多个所述的基于语音的数据处理方法。Embodiments of the present invention also disclose a readable storage medium, when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform execution based on one or more of the embodiments of the present invention. Voice data processing method.
可选的,一种电子设备,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定;依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户;依据所述第一文本数据和第二文本数据,得到问诊信息。Optionally, an electronic device includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to execute the one or more by one or more processors The program includes instructions for: obtaining the consultation process data, the diagnosis process data is determined according to the voice data collected during the consultation process; identifying according to the consultation process data, and acquiring the corresponding first text data And second text data, wherein the first text data belongs to a target user, and the second text data belongs to other users than the target user; according to the first text data and the second text data, Get the consultation information.
可选的,所述问诊过程数据为语音数据;所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据;对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。Optionally, the consultation process data is voice data; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: according to the voiceprint feature, from the voice Separating the first voice data and the second voice data from the data; respectively performing voice recognition on the first voice data and the second voice data, and acquiring corresponding first text data and second text data.
可选的,所述依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据,包括:将所述语音数据划分为多个语音片段;依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。Optionally, the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data, including: dividing the voice data into multiple voice segments; The speech segment determines the first speech data and the second speech data.
可选的,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;获取与所述基准声纹特征相符的语音片段,得到对应的第一语音数据;获取与所述基准声纹特征不相符的语音片段,得到对应的第二语音数据。Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature a voiceprint feature of the target user; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; acquiring a voice segment that does not match the reference voiceprint feature, and obtaining a corresponding second voice data .
可选的,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:对各语音片段的声纹特征进行识别;统计各声纹特征对应语音片段的数量;确定具有语音片段的数量最大的声纹特征,采用所述声纹特征对应的语音片段生成第一语音数据;采用不属于所述第一语音数据的语音片段生成第二语音数据。Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining a voiceprint feature having the largest number of voice segments, generating first voice data using the voice segment corresponding to the voiceprint feature, and generating second voice data using the voice segment not belonging to the first voice data.
可选的,所述对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据,包括:对所述第一语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第一文本数据;对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据;则,所述依据所述第一文本数据和第二文本数据,得到问诊信息,包括:依据所述第一文本数据中各文本片段和所述第二文本数据中各文本片段分别对应语音片段的时间顺序,对各文本片段进行排序,得到问诊信息。Optionally, the performing the voice recognition on the first voice data and the second voice data separately, and acquiring the corresponding first text data and the second text data, including: respectively, respectively, each voice segment in the first voice data Performing speech recognition, generating first text data by using the recognized text segment; performing speech recognition on each speech segment in the second speech data, and generating second text data by using the recognized text segment; The first text data and the second text data are used to obtain the consultation information, including: according to the time sequence of each of the text segments in the first text data and the text segments in the second text data respectively corresponding to the voice segments, The text segments are sorted to get the consultation information.
可选的,所述问诊过程数据为语音数据识别得到的文本识别结果;所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。Optionally, the inquiry process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.
可选的,对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据,包括:对所述文本识别结果进行划分,获取对应的文本片段;采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括目标用户语言特征和非目标用户语言特征;采用具有目标用户语言特征的文本片段生成第一文本数据,以及,采用具有非目标用户语言特征的文本片段生成第二文本数据。Optionally, performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature, including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature including a target user language feature and a non-target user language feature; generating the first text data by using the text segment having the target user language feature, and Generating second text data using text segments having non-target user language features.
本发明实施例包括以下优点:Embodiments of the invention include the following advantages:
本发明实施例可以在问诊过程中通过采集语音确定的问诊过程数据,可从问诊过程数据中按照不同用户识别出第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户,即能够自动区分问诊过程中医生、患者的语句,再依据所述第一文本数据和第二文本数据,得到问诊信息,能够完整的记录问诊过程,自动整理得到医案等内容,节省问诊记录的整理时间。In the embodiment of the present invention, the first text data and the second text data may be identified according to different users by collecting voice data of the consultation process determined by the voice during the consultation process, wherein the first text data is Having a target user, the second text data belongs to other users than the target user, that is, can automatically distinguish the doctor and the patient's sentence during the consultation, and then according to the first text data and the second text data. Get the consultation information, be able to completely record the consultation process, automatically sort out the medical records, etc., and save the finishing time of the consultation records.
附图说明DRAWINGS
图1是本发明的一种基于语音的数据处理方法实施例的步骤流程图;1 is a flow chart showing the steps of an embodiment of a voice-based data processing method of the present invention;
图2是本发明的另一种基于语音的数据处理方法实施例的步骤流程图;2 is a flow chart showing the steps of another embodiment of the voice-based data processing method of the present invention;
图3是本发明的又一种基于语音的数据处理方法实施例的步骤流程图;3 is a flow chart showing the steps of another embodiment of a voice-based data processing method of the present invention;
图4是本发明的一种基于语音的数据处理装置实施例的结构框图;4 is a structural block diagram of an embodiment of a voice-based data processing apparatus of the present invention;
图5是本发明的另一种基于语音的数据处理装置实施例的结构框图;5 is a structural block diagram of another embodiment of a voice-based data processing apparatus of the present invention;
图6是本发明根据一示例性实施例示出的一种用于基于语音的数据处理的电子设备的结构框图;FIG. 6 is a structural block diagram of an electronic device for voice-based data processing according to an exemplary embodiment of the present invention; FIG.
图7是本发明根据另一示例性实施例示出的一种基于语音的数据处理的电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device according to a voice-based data processing according to another exemplary embodiment of the present invention.
具体实施方式detailed description
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。The present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
参照图1,示出了本发明的一种基于语音的数据处理方法实施例的步骤流程图,具 体可以包括如下步骤:Referring to Figure 1, there is shown a flow chart of the steps of an embodiment of a speech-based data processing method of the present invention, which may include the following steps:
步骤102,获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定。In step 102, the data of the consultation process is obtained, and the data of the consultation process is determined according to the voice data collected during the consultation process.
在问诊过程中,可通过各种电子设备对该问诊过程进行语音采集,基于采集的语音数据得到问诊过程数据,即该问诊过程数据可为采集的语音数据,也可为基于采集的语音数据转换得到的文本识别结果。从而本发明实施例能够采用各种问诊过程采集的数据进行识别。During the consultation process, the consultation process can be collected by various electronic devices, and the data of the consultation process can be obtained based on the collected voice data, that is, the data of the consultation process can be collected voice data, or can be collected based on The speech data is converted to a text recognition result. Thus, embodiments of the present invention can be identified using data collected by various consultation processes.
步骤104,依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户。Step 104: Perform identification according to the consultation process data, and obtain corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target Other users than the user.
可对该问诊过程数据进行识别,依据数据类型的不同采用不同的识别方法,例如对于语音数据可通过声纹特征、语音识别等方式处理,对于文本数据可通过文本特征识别,从而得到依据用户区分的第一文本数据和第二文本数据。其中,该问诊过程中可具有至少两个用户进行沟通交互,一个用户是医生,其他用户为患者、患者家属等。例如是依据医生一天门诊采集的,则其中会包括一个医生和多名患者,也可能有一名或多名患者家属。因此对于问诊记录可将医生作为目标用户,则第一文本数据即为医生对应的问诊文本数据,而将至少一个其他用户的文本数据作为第二文本数据,即患者及家属对应的问诊文本数据。The data of the consultation process can be identified, and different identification methods are adopted according to different data types. For example, the voice data can be processed by voiceprint features, voice recognition, etc., and the text data can be identified by text features, thereby obtaining a basis for the user. The first text data and the second text data are distinguished. The consultation process may have at least two users communicating and interacting, one user is a doctor, and other users are patients, family members, and the like. For example, according to the doctor's one-day clinic, it will include a doctor and multiple patients, and may also have one or more family members. Therefore, for the consultation record, the doctor can be the target user, the first text data is the doctor's corresponding consultation text data, and the text data of at least one other user is used as the second text data, that is, the patient and the family corresponding to the consultation. text data.
步骤106,依据所述第一文本数据和第二文本数据,得到问诊信息。Step 106: Obtain consultation information according to the first text data and the second text data.
由于问诊通常是问答的过程,因此上述第一文本数据和第二文本数据可以是通过多个文本片段构成的,因此可基于文本片段的时间和对应用户得到问诊信息。Since the consultation is usually a question and answer process, the first text data and the second text data may be composed of a plurality of text segments, so that the consultation information may be obtained based on the time of the text segment and the corresponding user.
例如问诊信息的一种示例如下:An example of a consultation message is as follows:
2017-4-23 10:23AM2017-4-23 10:23AM
医生A:你有什么症状?Doctor A: What are your symptoms?
患者B:我XXX不舒服。Patient B: I am not comfortable with XXX.
医生A:有没有XXX?Doctor A: Is there XXX?
患者B:有。Patient B: Yes.
……......
实际处理中,还可结合医院的门诊记录等获取患者信息,从而在问诊信息中区分出不同的患者等。In the actual processing, the patient information can also be obtained in combination with the outpatient records of the hospital, thereby distinguishing different patients and the like in the consultation information.
综上所述,对于在问诊过程中通过采集语音确定的问诊过程数据,可从问诊过程数据中按照不同用户识别出第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户,即能够自动区分问诊过程中医生、患者的语句,再依据所述第一文本数据和第二文本数据,得到问诊信息,能够完整的记录问诊过程,自动整理得到医案等内容,节省问诊记录的整理时间。In summary, the first text data and the second text data may be identified from different data in the consultation process data for the consultation process data determined by collecting the voice during the consultation process, wherein the first text The data belongs to a target user, and the second text data belongs to other users than the target user, that is, the statement that can automatically distinguish the doctor and the patient during the consultation, and then according to the first text data and the second text. The data, get the consultation information, can completely record the consultation process, automatically sort out the medical records and other content, and save the finishing time of the consultation records.
本发明实施例中,问诊过程数据包括语音数据和/或语音数据识别得到的文本识别结果。不同类型的问诊过程数据的识别方法不同,因此本发明实施例分别论述不同类型问诊过程数据的处理过程。In the embodiment of the present invention, the consultation process data includes a text recognition result obtained by the voice data and/or the voice data identification. The identification methods of different types of consultation process data are different. Therefore, the embodiments of the present invention respectively discuss the processing process of different types of consultation process data.
参照图2,示出了本发明的另一种基于语音的数据处理方法实施例的步骤流程图,该实施例中,所述问诊过程数据为语音数据;具体可以包括如下步骤:Referring to FIG. 2, a flow chart of the steps of another embodiment of the voice-based data processing method of the present invention is shown. In this embodiment, the data of the consultation process is voice data. Specifically, the method may include the following steps:
步骤202,获取问诊过程数据,所述问诊过程数据为问诊过程中采集的语音数据。Step 202: Obtain a consultation process data, where the consultation process data is voice data collected during the consultation process.
在问诊过程中,可通过各种电子设备对该问诊过程进行语音数据的采集,例如通过录音笔、手机、计算机等设备录制音频数据,得到问诊过程中采集的语音数据,该语音数据可以为一次门诊采集的语音数据,也可为一个医生在多次门诊采集的语音数据,本发明实施例对此不作限制。因此该语音数据中包括一个医生的语音数据,和至少一个患者的语音数据,还可包括至少一个患者家属的语音数据。During the consultation process, voice data can be collected through the electronic device through various electronic devices, for example, recording audio data through a recording pen, a mobile phone, a computer, etc., and obtaining voice data collected during the consultation process, the voice data. The voice data that can be collected for one outpatient clinic, or the voice data collected by a doctor in multiple clinics, is not limited in this embodiment of the present invention. Therefore, the voice data includes voice data of a doctor, and voice data of at least one patient, and may also include voice data of at least one patient's family.
其中,上述步骤104依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,可包括如下步骤204-206。The step 104 is performed according to the data of the consultation process, and the corresponding first text data and the second text data are obtained, which may include the following steps 204-206.
步骤204,依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据。Step 204: Separate the first voice data and the second voice data from the voice data according to the voiceprint feature.
声纹(Voiceprint)指的是用电声学仪器显示的携带言语信息的声波频谱。声纹具有特定性和稳定性的特征。成年以后,人的声纹可保持长期相对稳定不变,因此可通过声纹识别不同人。因此,对于语音数据,可通过声纹特征进行识别,确定该语音数据中不同用户(声纹特征)对应的语音片段,从而得到目标用户的第一语音数据和其他用户的第二语音数据。Voiceprint refers to the spectrum of sound waves carrying speech information displayed by electroacoustic instruments. Voiceprints are characterized by specificity and stability. After adulthood, human voiceprints can remain relatively stable for a long time, so different people can be identified through voiceprints. Therefore, for the voice data, the voiceprint feature can be identified, and the voice segment corresponding to different users (voiceprint features) in the voice data is determined, thereby obtaining the first voice data of the target user and the second voice data of the other user.
其中,所述依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据,包括:将所述语音数据划分为多个语音片段;依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。The separating the first voice data and the second voice data from the voice data according to the voiceprint feature, comprising: dividing the voice data into a plurality of voice segments; and using the voice according to the voiceprint feature The segment determines the first voice data and the second voice data.
具体的,可将语音数据划分为多个语音片段。其中,可依据语音划分规则,如声音片段问的停顿间隔进行划分;也可依据声纹特征,即确定各声音对应的声纹特征,从而依据不同的声纹特征划分语音片段。因此一个语音数据可划分出多个语音片段,各语音片段问具有前后顺序,不同的语音片段可具有相同或不同的声纹特征。因此还要基于声纹特征确定各语音片段属于第一语音数据还是第二语音数据,即可确定出每个语音片段所具有的声纹特征,然后将具有目标用户的声纹特征的多个语音片段构成第一语音数据,将其他剩余的语音片段构成第二语音数据。Specifically, the voice data can be divided into a plurality of voice segments. Wherein, according to the voice division rule, for example, the pause interval of the sound segment is divided; or according to the voiceprint feature, that is, the voiceprint feature corresponding to each sound is determined, thereby dividing the voice segment according to different voiceprint features. Therefore, one voice data can divide a plurality of voice segments, and each voice segment has a sequence of front and back, and different voice segments can have the same or different voiceprint features. Therefore, based on the voiceprint feature, whether each voice segment belongs to the first voice data or the second voice data is determined, and the voiceprint feature of each voice segment can be determined, and then multiple voices having the voiceprint feature of the target user are determined. The segments constitute the first voice data, and the other remaining voice segments constitute the second voice data.
本发明实施例中,在对问诊过程中语音数据的采集前,医生(目标用户)可先采集一段语音作为基准数据,以便于从该基准数据中识别出医生的声纹特征即基准声纹特征。本发明实施例中还可以设置语音识别模型,将语音数据输入该语音识别模型后,可将符合基准声纹数据的语音片段与其他声纹特征的语音片段分离,从而得到目标用户的各语音片段和其他用户的语音片段。医生门诊过程中,构成的医案信息中通常只包括一个医生,而患者可能有多个,从而通过上述方式可针对某个特定医生获取其对应的大量医案样本。In the embodiment of the present invention, before collecting the voice data in the consultation process, the doctor (target user) may first collect a piece of voice as the reference data, so as to identify the voiceprint feature of the doctor from the reference data, that is, the reference voiceprint. feature. In the embodiment of the present invention, a voice recognition model may also be set. After the voice data is input into the voice recognition model, the voice segment conforming to the reference voiceprint data may be separated from the voice segment of other voiceprint features, thereby obtaining each voice segment of the target user. And other user's voice clips. In the doctor's outpatient process, the medical record information is usually only included in one doctor, and there may be more than one patient, so that a corresponding large number of medical samples can be obtained for a specific doctor in the above manner.
本发明一个可选实施例中,可预先采集目标用户的声纹特征,作为基准声纹特征,从而进行语音数据的划分。即所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;获取与所述基准声纹特征相符的语音片段,得到对应的第一语音数据;获取与所述基准声纹特征不相符的语音片段,得到对应的第二语音 数据。即对于目标用户如医生,可预先采集其语音数据来提取声纹特征,将目标用户的声纹特征作为基准声纹特征,从而针对具有目标用户的语音数据,可采用该基准声纹特征对各语音片段分别进行匹配,确定各语音片段中声纹特征与基准声纹特征是否一致,如果一致则认为该语音片段与基准声纹特征匹配,将该语音片段添加到第一语音数据(即为目标用户对应的语音数据)中。当语音片段中声纹特征与基准声纹特征不一致后,该语音片段与基准声纹特征不匹配,将该语音片段添加到第二语音数据(即为非目标用户对应的语音数据)中。即第一语音数据和第二语音数据均由相应的语音片段构成,其中各语音片段还具有顺序关系,从而便于后续准确确定问诊信息。In an optional embodiment of the present invention, the voiceprint feature of the target user may be collected in advance as a reference voiceprint feature, thereby dividing the voice data. That is, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature is a target user a voiceprint feature; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; and acquiring a voice segment that does not conform to the reference voiceprint feature, to obtain corresponding second voice data. That is, for the target user such as a doctor, the voice data may be collected in advance to extract the voiceprint feature, and the voiceprint feature of the target user is used as the reference voiceprint feature, so that for the voice data having the target user, the reference voiceprint feature may be used for each The voice segments are respectively matched to determine whether the voiceprint features in the voice segments are consistent with the reference voiceprint features. If they are consistent, the voice segments are considered to match the reference voiceprint features, and the voice segments are added to the first voice data (ie, the target) User-specific voice data). When the voiceprint feature in the voice segment does not match the reference voiceprint feature, the voice segment does not match the reference voiceprint feature, and the voice segment is added to the second voice data (ie, the voice data corresponding to the non-target user). That is, the first voice data and the second voice data are each composed of corresponding voice segments, wherein each voice segment also has a sequential relationship, thereby facilitating subsequent accurate determination of the consultation information.
本发明另一个可选实施例中,也可通过语音数据中相同声纹特征对应语音片段的数量来进行语音数据的划分。即所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:对各语音片段的声纹特征进行识别;统计各声纹特征对应语音片段的数量;确定具有语音片段的数量最大的声纹特征,采用所述声纹特征对应的语音片段生成第一语音数据,其中,数量最大的声纹特征为目标用户的声纹特征;采用不属于第一语音数据的语音片段生成第二语音数据。基于问诊过程的特性,问诊过程数据可能是一个医生多次门诊的记录数据,因此,在该过程中医生往往会占据比较多的时间与不同的患者及其家属交流问诊,即语音数据中医生(目标用户)的语音数量最多,因此可依据不同用户对应语音片段的数量区分目标用户和其他用户,以及得到第一语音数据和第二语音数据。可对该语音片段中的声纹特征进行识别,确定每个语音片段所包含的声纹特征,然后分别统计每一种声纹特征对应语音片段的数量,确定具有语音片段的数量最大的声纹特征,将该声纹特征确定为目标用户的声纹特征,其他声纹特征为其他用户的声纹特征,从而将具有目标用户的声纹特征的语音片段按照顺序构成第一音频数据,而其他语音片段(即不属于第一语音数据的语音片段)按照顺序构成第二音频数据。In another optional embodiment of the present invention, the division of the voice data may also be performed by the number of voice segments corresponding to the same voiceprint feature in the voice data. That is, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature; The voiceprint feature having the largest number of voice segments generates the first voice data by using the voice segment corresponding to the voiceprint feature, wherein the largest number of voiceprint features are voiceprint features of the target user; and the voice data features that are not belonging to the first voice data are used. The speech segment generates second speech data. Based on the characteristics of the consultation process, the data of the consultation process may be the record data of a doctor's multiple outpatient clinics. Therefore, in this process, doctors often occupy more time to communicate with different patients and their families, that is, voice data. The Chinese doctor (target user) has the largest number of voices, so the target user and other users can be distinguished according to the number of corresponding voice segments of different users, and the first voice data and the second voice data are obtained. The voiceprint features in the voice segment can be identified, the voiceprint features included in each voice segment are determined, and then the number of voice segments corresponding to each voiceprint feature is separately counted, and the voiceprint having the largest number of voice segments is determined. a feature, the voiceprint feature is determined as a voiceprint feature of the target user, and the other voiceprint features are voiceprint features of other users, so that the voice segment having the voiceprint feature of the target user sequentially constitutes the first audio data, and the other The speech segments (i.e., the speech segments that do not belong to the first speech data) constitute the second audio data in order.
本发明实施例中,由于语音数据是在多人会话的场景中采集的,因此一个语音片段中可能包括多个用户的声纹特征。对于从一个语音片段中识别出多个声纹特征的情况:当不同声纹特征是在不同时间出现的,若声纹特征均为其他用户的声纹特征,则可将该语音片段添加到第二语音数据中;而若声纹特征包括目标用户的声纹特征和其他用户的声纹特征,则可将该语音片段再划分子片段后添加到对应的语音数据中。当不同声纹特征是在同一时间出现的,即同一时间有至少两个用户在说话,则若声纹特征均为其他用户的声纹特征,可将该语音片段添加到第二语音数据中,而若声纹特征包括目标用户的声纹特征和其他用户的声纹特征,可依据需求划分,例如将该语音片段归为目标用户的语音片段来得到第一语音数据,或者将该语音片段归为其他用户的语音片段来得到第二语音数据,或者在两种用户的语音数据中分别添加。In the embodiment of the present invention, since voice data is collected in a scene of a multi-person conversation, a voice segment may include voiceprint features of a plurality of users. For the case of identifying multiple voiceprint features from a voice segment: when different voiceprint features appear at different times, if the voiceprint feature is a voiceprint feature of other users, the voice segment may be added to the In the two voice data; if the voiceprint feature includes the voiceprint feature of the target user and the voiceprint feature of other users, the voice segment may be further divided into sub-segments and added to the corresponding voice data. When different voiceprint features appear at the same time, that is, at least two users are speaking at the same time, if the voiceprint feature is a voiceprint feature of another user, the voice segment may be added to the second voice data. If the voiceprint feature includes the voiceprint feature of the target user and the voiceprint feature of other users, it may be divided according to requirements, for example, the voice segment is classified into the voice segment of the target user to obtain the first voice data, or the voice segment is returned. The second voice data is obtained for the voice segments of other users, or added separately in the voice data of the two users.
步骤206,对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。Step 206: Perform voice recognition on the first voice data and the second voice data respectively, and acquire corresponding first text data and second text data.
在获取到第一语音数据和第二语音数据后,可对两种语音数据分别进行识别,从而得到目标用户的第一文本数据,和其他用户的第二文本数据。After the first voice data and the second voice data are acquired, the two voice data can be separately identified, thereby obtaining first text data of the target user and second text data of other users.
一个可选实施例中,对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据,包括:对所述第一语音数据中各语音片段分别进 行语音识别,采用识别得到的文本片段生成第一文本数据;对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据。即可通过第一语音数据对每个语音片段的识别,得到该语音片段对应的文本数据,从而依据语音片段的顺序构成第一文本数据,采用相应的方式也可得到第二文本数据。由于问诊过程中医生的问题和患者的回答都是有顺序的,因此在语音数据划分为语音片段时即记录相应的时间顺序,得到的第一文本数据和第二文本数据也是具有顺序关系的,便于后续准确整理问诊信息。In an optional embodiment, performing voice recognition on the first voice data and the second voice data separately, and acquiring corresponding first text data and second text data, including: performing voice segments in the first voice data Performing speech recognition separately, generating first text data by using the recognized text segment; performing speech recognition on each speech segment in the second speech data, and generating second text data by using the recognized text segment. The text data corresponding to the voice segment can be obtained by the identification of each voice segment by the first voice data, so that the first text data is formed according to the sequence of the voice segments, and the second text data can also be obtained in a corresponding manner. Since the doctor's question and the patient's answer are sequential in the consultation process, when the voice data is divided into voice segments, the corresponding time sequence is recorded, and the obtained first text data and second text data are also in a sequential relationship. To facilitate accurate follow-up of consultation information.
步骤208,依据所述第一文本数据和第二文本数据,得到问诊信息。Step 208: Obtain consultation information according to the first text data and the second text data.
依据第一文本数据和第二文本数据对应语音片段的时间顺序,可将第一文本数据中各文本片段和第二文本数据中各文本片段,按照相应的顺序进行排序,如时间顺序,从而得到相应的问诊信息,该问诊信息中可记录医生在一次问诊中的问题以及相应患者(家属)的回答,以及医生的诊断、医嘱等各种信息。According to the chronological order of the first text data and the second text data corresponding to the voice segments, each text segment in the first text data and each text segment in the second text data may be sorted according to a corresponding order, such as a time sequence, thereby obtaining Corresponding consultation information, the consultation information can record the doctor's question in a consultation and the corresponding patient (family)'s answer, as well as the doctor's diagnosis, medical advice and other information.
步骤210,对所述问诊信息进行分析,得到相应的分析结果,所述分析结果与疾病诊断相关。Step 210: Perform analysis on the consultation information to obtain a corresponding analysis result, and the analysis result is related to disease diagnosis.
在整理出问诊信息后,本发明实施例还可依据需求对问诊信息进行分析,得到相应的分析结果,由于问诊是与疾病诊断相关的,因此该分析结果也与疾病诊断相关,具体依据分析需求确定。After the consultation information is compiled, the embodiment of the present invention can also analyze the consultation information according to the requirements, and obtain the corresponding analysis result. Since the consultation is related to the diagnosis of the disease, the analysis result is also related to the diagnosis of the disease. Determined based on analytical needs.
例如,可以对每种疾病,统计医生的常用问题,提供给经验较少的医生作为参考;可以对问诊信息进行分析,开发一个中医(西医)人工智能问答系统等;还可通过统计、分析等方式确定出每种疾病对应的症状、治疗方法等。For example, for each disease, statistics of doctors' common problems can be provided to doctors with less experience as a reference; analysis of consultation information can be carried out, and a TCM (Western Medicine) artificial intelligence question and answer system can be developed; statistics and analysis can also be performed. The symptoms, treatment methods, etc. corresponding to each disease are determined in the same manner.
参照图3,示出了本发明的又一种基于语音的数据处理方法实施例的步骤流程图,本实施例中,所述问诊过程数据为语音数据识别得到的文本识别结果,具体可以包括如下步骤:Referring to FIG. 3, a flowchart of a step of a voice-based data processing method according to another embodiment of the present invention is shown. In this embodiment, the data of the consultation process is a text recognition result obtained by the voice data identification, and may specifically include The following steps:
步骤302,获取语音数据识别得到的文本识别结果。Step 302: Acquire a text recognition result obtained by the voice data identification.
该语音数据是问诊过程中采集得到等,采集得到的语音数据通过语音识别转换得到该文本识别结果,可直接获取该文本识别结果。The voice data is collected during the consultation process, and the collected voice data is converted into the text recognition result by voice recognition, and the text recognition result can be directly obtained.
其中,上述步骤104依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,可包括如下步骤304。The step 104 is performed according to the data of the consultation process, and the corresponding first text data and the second text data are obtained, which may include the following step 304.
步骤304,对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。Step 304: Perform feature recognition on the text recognition result, and separate the first text data and the second text data according to the language feature.
对于已经识别为文本的数据,由于未知每段话是哪个人说的,并不能直接作为问诊信息,因此,本发明实施例从文本识别结果中识别出不同用户的话并整理问诊信息。其中,在问诊过程中,医生通常会提问症状,而用户会回复症状表现,医生会诊断为相应疾病、所需作的检查、需要的药物等,从而基于这些特征可从文本识别结果中识别出医生和患者语句,进而分离出第一文本数据和第二文本数据。For the data that has been identified as text, since it is unknown who said each paragraph, and can not directly serve as the consultation information, the embodiment of the present invention recognizes the words of different users from the text recognition result and organizes the consultation information. Among them, during the consultation process, the doctor usually asks the symptoms, and the user will reply to the symptoms, and the doctor will diagnose the disease, the required examination, the required medicine, etc., so that the characteristics can be identified from the text recognition result based on these characteristics. The doctor and patient statements are separated, and the first text data and the second text data are separated.
即本发明实施例可预先收集医生问诊的文本以及患者问诊的文本,并且对于每次分析出的问诊信息进行收集,从而统计出医生(即目标用户)的语言特征,以及患者及其家属(即其他用户)的语言特征,并建立相应的模型,便于基于该语言特征区分不同用 户的文本。其中,可通过机器学习、概率统计等方式确定不同用户的语言特征建立预设模型。That is, the embodiment of the present invention can collect the text of the doctor's consultation and the text of the patient's consultation in advance, and collect the information of the examination for each analysis, thereby counting the language characteristics of the doctor (ie, the target user), and the patient and the patient The linguistic features of family members (ie other users) and the establishment of corresponding models to facilitate the differentiation of texts of different users based on the language features. Among them, a predetermined model can be established by determining the language features of different users by means of machine learning, probability statistics, and the like.
其中,本发明实施例可获取大量的已分离的医案文本作为训练数据,已分离的医案文本即标识了目标用户和其他用户的问诊信息,如历史上依据识别得到的文正信息。可对其中包括的医生内容数据(目标用户的第一文本数据)和患者内容数据(其他用户的第二文本数据)分别进行训练,得到医生内容模型和患者内容模型,当然这两种模型可合成一个预设模型,基于该预设模型可识别出医生的语句和患者的语句。The embodiment of the present invention can obtain a large number of separated medical texts as training data, and the separated medical texts identify the medical information of the target user and other users, such as the textual information obtained historically based on the identification. The doctor content data (first text data of the target user) and the patient content data (second text data of other users) may be separately trained to obtain a doctor content model and a patient content model, and of course, the two models may be synthesized. A preset model based on which the doctor's statement and the patient's statement are recognized.
例如,问诊获得的医案信息中,医生内容一般多是带有症状类词汇的问句,例如你感觉怎么样,有什么症状,哪里不舒服等;而患者内容一般多是带有症状表现、疫病类的问句,例如我是不是感冒了,是XX病吗等;医生内容一般多是带有症状和药品的陈述句,例如你这是病毒感冒,你可以吃点XX药等等。从而,医生的语句内容和患者的语句内容均具有比较显著的语言特征,故可以依据已分离的医案信息训练得到医生内容模型和患者内容模型。For example, in the case information obtained from the consultation, the doctor's content is generally a question with a symptomatic vocabulary, such as how you feel, what symptoms, what is uncomfortable, etc.; and the patient's content is generally symptomatic. Question of epidemic disease, for example, is it a cold, is it XX disease, etc. The contents of the doctor are usually statements with symptoms and medicines, for example, you have a cold, you can eat XX medicine and so on. Therefore, both the doctor's sentence content and the patient's sentence content have relatively significant language features, so the doctor content model and the patient content model can be trained according to the separated medical case information.
对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据,包括:对所述文本识别结果进行划分,获取对应的文本片段;采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括第一语言特征和第二语言特征;采用具有第一语言特征的文本片段生成第一文本数据,以及,采用具有第二语言特征的文本片段生成第二文本数据。可先对文本识别结果进行划分,可依据中文语句特征等,将文本识别结果划分为句子,也可依据其他方式划分得到多个文本片段。然后将各文本片段依次输入预设模型,通过预设模型对文本片段进行识别,从而能够识别出每个文本片段所具有的语言特征。当然,该预设模型也可设置为基于识别出的语言特征,为该文本片段划分所属用户。其中,将目标用户的语言该特征作为第一语言特征,将其他用户的语言特征作为第二语言特征,则可采用预设模型确定出文本片段具有第一语言特征或第二语言特征。然后可按照文本片段的划分顺序,将具有第一语言特征的文本片段生成第一文本数据,以及,采用具有第二语言特征的文本片段生成第二文本数据。Performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature, comprising: dividing the text recognition result to obtain a corresponding text segment; and using the preset model to the text Identifying a segment, determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; generating a first text data using the text segment having the first language feature, and employing the second The text segment of the linguistic feature generates second text data. The text recognition result may be first divided, and the text recognition result may be divided into sentences according to Chinese sentence features, and multiple text segments may be divided according to other methods. Then, each text segment is sequentially input into a preset model, and the text segment is identified by the preset model, so that the language features of each text segment can be identified. Of course, the preset model can also be set to divide the user segment into the user based on the recognized language feature. Where the character of the target user is used as the first language feature and the language feature of the other user is used as the second language feature, the preset model may be used to determine that the text segment has the first language feature or the second language feature. The text segment having the first language feature may then be generated into the first text data in accordance with the division order of the text segments, and the second text data may be generated using the text segment having the second language feature.
步骤306,依据所述第一文本数据和第二文本数据,得到问诊信息。Step 306: Obtain consultation information according to the first text data and the second text data.
步骤308,对所述问诊信息进行分析,得到相应的分析结果,所述分析结果与疾病诊断相关。 Step 308, analyzing the consultation information to obtain a corresponding analysis result, and the analysis result is related to the disease diagnosis.
依据第一文本数据和第二文本数据对应语音片段的顺序,可将第一文本数据中各文本片段和第二文本数据中各文本片段,按照相应的顺序进行排序,从而得到相应的问诊信息,该问诊信息中可记录医生在一次问诊中的问题以及相应患者(家属)的回答,以及医生的诊断、医嘱等各种信息。According to the order of the first text data and the second text data corresponding to the voice segments, each text segment in the first text data and each text segment in the second text data may be sorted according to a corresponding order, thereby obtaining corresponding consultation information. The consultation information can record the doctor's question in a consultation and the answer of the corresponding patient (family), as well as the doctor's diagnosis, medical advice and other information.
在整理出问诊信息后,本发明实施例还可依据需求对问诊信息进行分析,得到相应的分析结果,由于问诊是与疾病诊断相关的,因此该分析结果也与疾病诊断相关,具体依据分析需求确定。After the consultation information is compiled, the embodiment of the present invention can also analyze the consultation information according to the requirements, and obtain the corresponding analysis result. Since the consultation is related to the diagnosis of the disease, the analysis result is also related to the diagnosis of the disease. Determined based on analytical needs.
例如,可以对每种疾病,统计医生的常用问题,提供给经验较少的医生作为参考;可以对问诊信息进行分析,开发一个中医(西医)人工智能问答系统等;还可通过统计、 分析等方式确定出每种疾病对应的症状、治疗方法等。For example, for each disease, statistics of doctors' common problems can be provided to doctors with less experience as a reference; analysis of consultation information can be conducted, and a TCM (Western Medicine) artificial intelligence question and answer system can be developed; statistics and analysis can also be performed. The symptoms, treatment methods, etc. corresponding to each disease are determined in the same manner.
对于医生记录医案的习惯、需求,基于上述方案,可通过录音的方式,将与患者的交流过程记录下来,然后分离出医生和患者的语句,进行区分并整理,以对话的形式提供给医生作为医案,能够有效降低医生在医案整理上所话费的时间。For doctors to record the habits and needs of medical records, based on the above scheme, the communication process with the patient can be recorded by means of recording, and then the doctor and patient statements can be separated, differentiated and arranged, and provided to the doctor in the form of dialogue. As a medical case, it can effectively reduce the time spent by doctors on medical records.
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明实施例并不受所描述的动作顺序的限制,因为依据本发明实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明实施例所必须的。It should be noted that, for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present invention are not limited by the described action sequence, because In accordance with embodiments of the invention, certain steps may be performed in other sequences or concurrently. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required by the embodiments of the present invention.
参照图4,示出了本发明的一种基于语音的数据处理装置实施例的结构框图,具体可以包括如下模块:Referring to FIG. 4, a structural block diagram of an embodiment of a voice-based data processing apparatus of the present invention is shown, which may specifically include the following modules:
数据获取模块402,用于获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定。The data acquisition module 402 is configured to obtain the data of the consultation process, and the data of the consultation process is determined according to the voice data collected during the consultation process.
文本识别模块404,用于依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户。The text identification module 404 is configured to perform identification according to the consultation process data, and obtain corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to Other users than the target user.
信息确定模块406,用于依据所述第一文本数据和第二文本数据,得到问诊信息。The information determining module 406 is configured to obtain the consultation information according to the first text data and the second text data.
其中,该问诊过程中可具有至少两个用户进行沟通交互,一个用户是医生,其他用户为患者、患者家属等。例如是依据医生一天门诊采集的,则其中会包括一个医生和多名患者,也可能有一名或多名患者家属。因此对于问诊记录可将医生作为目标用户,则第一文本数据即为医生对应的问诊文本数据,而将至少一个其他用户的文本数据作为第二文本数据,即患者及家属对应的问诊文本数据。由于问诊通常是问答的过程,因此上述第一文本数据和第二文本数据可以是通过多个文本片段构成的,因此可基于文本片段的时间和对应用户得到问诊信息。The consultation process may have at least two users communicating and interacting, one user is a doctor, and other users are patients, family members, and the like. For example, according to the doctor's one-day clinic, it will include a doctor and multiple patients, and may also have one or more family members. Therefore, for the consultation record, the doctor can be the target user, the first text data is the doctor's corresponding consultation text data, and the text data of at least one other user is used as the second text data, that is, the patient and the family corresponding to the consultation. text data. Since the consultation is usually a question and answer process, the first text data and the second text data may be composed of a plurality of text segments, so that the consultation information may be obtained based on the time of the text segment and the corresponding user.
例如问诊信息的一种示例如下:An example of a consultation message is as follows:
2017-4-23 10:23AM医生A:你有什么症状?患者B:我XXX不舒服。医生A:有没有XXX?患者B,有……2017-4-23 10:23AM Doctor A: What are your symptoms? Patient B: I am not comfortable with XXX. Doctor A: Is there XXX? Patient B, there is...
实际处理中,还可结合医院的门诊记录等获取患者信息,从而在问诊信息中区分出不同的患者等。In the actual processing, the patient information can also be obtained in combination with the outpatient records of the hospital, thereby distinguishing different patients and the like in the consultation information.
综上所述,对于在问诊过程中通过采集确定的问诊过程数据,可从问诊过程数据中按照不同用户识别出第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户,即能够自动区分问诊过程中医生、患者的语句,再依据所述第一文本数据和第二文本数据,得到问诊信息,能够完整的记录问诊过程,自动整理得到医案等内容,节省问诊记录的整理时间。In summary, for the consultation process data determined by the collection during the consultation process, the first text data and the second text data may be identified according to different users from the consultation process data, wherein the first text data Having a target user, the second text data belongs to other users than the target user, that is, can automatically distinguish the doctor and the patient's sentence during the consultation, and then according to the first text data and the second text data. Get the consultation information, be able to completely record the consultation process, automatically sort out the medical records, etc., and save the finishing time of the consultation records.
参照图5,示出了本发明的一种基于语音的数据处理装置实施例的结构框图,具体可以包括如下模块:Referring to FIG. 5, a structural block diagram of an embodiment of a voice-based data processing apparatus of the present invention is shown, which may specifically include the following modules:
其中,所述问诊过程数据包括语音数据和/或语音数据识别得到的文本识别结果。The consultation process data includes text recognition results obtained by voice data and/or voice data recognition.
所述问诊过程数据为语音数据;所述文本识别模块404,可以包括:The query process data is voice data; the text recognition module 404 can include:
分离模块40402,用于依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据。The separating module 40402 is configured to separate the first voice data and the second voice data from the voice data according to the voiceprint feature.
语音识别模块40404,用于对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。The voice recognition module 40404 is configured to separately perform voice recognition on the first voice data and the second voice data, and acquire corresponding first text data and second text data.
其中,所述分离模块40402,用于将所述语音数据划分为多个语音片段;依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。The separating module 40402 is configured to divide the voice data into a plurality of voice segments; and according to the voiceprint feature, the voice segment is used to determine the first voice data and the second voice data.
优选的,所述分离模块40402,用于采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;获取与所述基准声纹特征相符的语音片段,得到对应的第一语音数据;获取与所述基准声纹特征不相符的语音片段,得到对应的第二语音数据。Preferably, the separating module 40402 is configured to respectively match each of the voice segments by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user; and acquiring a reference voiceprint feature. The voice segment obtains corresponding first voice data; and acquires a voice segment that does not match the reference voiceprint feature, to obtain corresponding second voice data.
本发明实施例中,在对问诊过程中语音数据的采集前,医生(目标用户)可先采集一段语音作为基准数据,以便于从该基准数据中识别出医生的声纹特征即基准声纹特征。本发明实施例中还可以设置语音识别模型,将语音数据输入该语音识别模型后,可将符合基准声纹数据的语音片段与其他声纹特征的语音片段分离,从而得到目标用户的各语音片段和其他用户的语音片段。医生门诊过程中,构成的医案信息中通常只包括一个医生,而患者可能有多个,从而通过上述方式可针对某个特定医生获取其对应的大量医案样本。In the embodiment of the present invention, before collecting the voice data in the consultation process, the doctor (target user) may first collect a piece of voice as the reference data, so as to identify the voiceprint feature of the doctor from the reference data, that is, the reference voiceprint. feature. In the embodiment of the present invention, a voice recognition model may also be set. After the voice data is input into the voice recognition model, the voice segment conforming to the reference voiceprint data may be separated from the voice segment of other voiceprint features, thereby obtaining each voice segment of the target user. And other user's voice clips. In the doctor's outpatient process, the medical record information is usually only included in one doctor, and there may be more than one patient, so that a corresponding large number of medical samples can be obtained for a specific doctor in the above manner.
优选的,所述分离模块40402,用于对各语音片段的声纹特征进行识别;分别统计各声纹特征对应语音片段的数量,确定具有语音片段的数量最大的声纹特征,采用所述声纹特征对应的语音片段生成第一语音数据,其中,数量最大的声纹特征为目标用户的声纹特征;采用不属于第一语音数据的语音片段生成第二语音数据。Preferably, the separating module 40402 is configured to identify voiceprint features of each voice segment; separately count the number of voice segments corresponding to each voiceprint feature, and determine the voiceprint feature having the largest number of voice segments, using the sound The voice segment corresponding to the pattern feature generates first voice data, wherein the largest number of voiceprint features are voiceprint features of the target user; and the second voice data is generated using voice segments that do not belong to the first voice data.
基于问诊过程的特性,通过问诊过程数据可能是一个医生多次门诊的记录数据,因此,在该过程中医生往往会占据比较多的时间与不同的患者及其家属交流问诊,即语音数据中医生(目标用户)的语音数量最多,因此可依据不同用户对应语音片段的数量区分目标用户和其他用户,以及得到第一语音数据和第二语音数据。Based on the characteristics of the consultation process, the data through the consultation process may be the record data of a doctor's multiple outpatient clinics. Therefore, in this process, doctors often occupy more time to communicate with different patients and their families, that is, voice. The doctor (target user) has the largest number of voices in the data, so the target user and other users can be distinguished according to the number of corresponding voice segments of different users, and the first voice data and the second voice data are obtained.
本发明实施例中,由于语音数据是在多人会话的场景中采集的,因此一个语音片段中可能包括多个用户的声纹特征。分离模块40402对于从一个语音片段中识别出多个声纹特征的情况,可执行如下处理:在不同声纹特征是在不同时间出现的,若声纹特征均为其他用户的声纹特征,则可将该语音片段添加到第二语音数据中;而若声纹特征包括目标用户的声纹特征和其他用户的声纹特征,则可将该语音片段再划分子片段后添加到对应的语音数据中。当不同声纹特征是在同一时间出现的,即同一时间有至少两个用户在说话,则若声纹特征均为其他用户的声纹特征,可将该语音片段添加到第二语音数据中,而若声纹特征包括目标用户的声纹特征和其他用户的声纹特征,可依据需求划分,例如将该语音片段归为目标用户的语音片段来得到第一语音数据,或者将该语音片段归为其他用户的语音片段来得到第二语音数据,或者在两种用户的语音数据中分别添加。In the embodiment of the present invention, since voice data is collected in a scene of a multi-person conversation, a voice segment may include voiceprint features of a plurality of users. The separation module 40402 may perform the following processing for identifying a plurality of voiceprint features from a voice segment: when different voiceprint features occur at different times, and if the voiceprint features are voiceprint features of other users, The voice segment may be added to the second voice data; if the voiceprint feature includes the voiceprint feature of the target user and other user's voiceprint features, the voice segment may be subdivided into sub-segments and added to the corresponding voice data. in. When different voiceprint features appear at the same time, that is, at least two users are speaking at the same time, if the voiceprint feature is a voiceprint feature of another user, the voice segment may be added to the second voice data. If the voiceprint feature includes the voiceprint feature of the target user and the voiceprint feature of other users, it may be divided according to requirements, for example, the voice segment is classified into the voice segment of the target user to obtain the first voice data, or the voice segment is returned. The second voice data is obtained for the voice segments of other users, or added separately in the voice data of the two users.
优选的,所述语音识别模块40404,用于对所述第一语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第一文本数据;对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据。则所述信息确定 模块406,用于依据所述第一文本数据中各文本片段和所述第二文本数据中各文本片段分别对应语音片段的时间顺序,对各文本片段进行排序,得到问诊信息。Preferably, the voice recognition module 40404 is configured to perform voice recognition on each voice segment in the first voice data, and generate first text data by using the recognized text segment; and voices in the second voice data. The segments respectively perform speech recognition, and the second text data is generated by using the recognized text segments. The information determining module 406 is configured to sort each text segment according to the time sequence of each text segment in the first text data and each text segment in the second text data, and obtain an inquiry. information.
优选的,所述问诊过程数据为语音数据识别得到的文本识别结果;所述文本识别模块404,用于对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。Preferably, the inquiry process data is a text recognition result obtained by the voice data identification; the text recognition module 404 is configured to perform feature recognition on the text recognition result, and separate the first text data and the second according to the language feature. text data.
所述文本识别模块404,包括:The text recognition module 404 includes:
片段划分模块40406,用于对所述文本识别结果进行划分,获取对应的文本片段。The segment dividing module 40406 is configured to divide the text recognition result to obtain a corresponding text segment.
片段识别模块40408,用于采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括第一语言特征和第二语言特征。The segment identification module 40408 is configured to identify the text segment by using a preset model, and determine a language feature that the text segment has, the language feature including a first language feature and a second language feature.
其中,本发明实施例可获取大量的已分离的医案文本作为训练数据,已分离的医案文本即标识了目标用户和其他用户的问诊信息,如历史上依据识别得到的文正信息。可对其中包括的医生内容数据(目标用户的第一文本数据)和患者内容数据(其他用户的第二文本数据)分别进行训练,得到医生内容模型和患者内容模型,当然这两种模型可合成一个预设模型,基于该预设模型可识别出医生的语句和患者的语句。例如,问诊获得的医案信息中,医生内容一般多是带有症状类词汇的问句,例如你感觉怎么样,有什么症状,哪里不舒服等;而患者内容一般多是带有症状表现、疫病类的问句,例如我是不是感冒了,是XX病吗等;医生内容一般多是带有症状和药品的陈述句,例如你这是病毒感冒,你可以吃点XX药等等。从而,医生的语句内容和患者的语句内容均具有比较显著的语言特征,故可以依据已分离的医案信息训练得到医生内容模型和患者内容模型。The embodiment of the present invention can obtain a large number of separated medical texts as training data, and the separated medical texts identify the medical information of the target user and other users, such as the textual information obtained historically based on the identification. The doctor content data (first text data of the target user) and the patient content data (second text data of other users) may be separately trained to obtain a doctor content model and a patient content model, and of course, the two models may be synthesized. A preset model based on which the doctor's statement and the patient's statement are recognized. For example, in the case information obtained from the consultation, the doctor's content is generally a question with a symptomatic vocabulary, such as how you feel, what symptoms, what is uncomfortable, etc.; and the patient's content is generally symptomatic. Question of epidemic disease, for example, is it a cold, is it XX disease, etc. The contents of the doctor are usually statements with symptoms and medicines, for example, you have a cold, you can eat XX medicine and so on. Therefore, both the doctor's sentence content and the patient's sentence content have relatively significant language features, so the doctor content model and the patient content model can be trained according to the separated medical case information.
优选的,文本生成模块40410,用于采用具有第一语言特征的文本片段生成第一文本数据,以及,采用具有第二语言特征的文本片段生成第二文本数据。Preferably, the text generating module 40410 is configured to generate the first text data by using the text segment having the first language feature, and generate the second text data by using the text segment having the second language feature.
优选的,所述的装置还包括:分析模块408,用于对所述问诊信息进行分析,得到相应的分析结果,所述分析结果与疾病诊断相关。Preferably, the device further includes: an analysis module 408, configured to analyze the consultation information, and obtain a corresponding analysis result, where the analysis result is related to a disease diagnosis.
依据第一文本数据和第二文本数据对应语音片段的顺序,可将第一文本数据中各文本片段和第二文本数据中各文本片段,按照相应的顺序进行排序,从而得到相应的问诊信息,该问诊信息中可记录医生在一次问诊中的问题以及相应患者(家属)的回答,以及医生的诊断、医嘱等各种信息。According to the order of the first text data and the second text data corresponding to the voice segments, each text segment in the first text data and each text segment in the second text data may be sorted according to a corresponding order, thereby obtaining corresponding consultation information. The consultation information can record the doctor's question in a consultation and the answer of the corresponding patient (family), as well as the doctor's diagnosis, medical advice and other information.
在整理出问诊信息后,本发明实施例还可依据需求对问诊信息进行分析,得到相应的分析结果,由于问诊是与疾病诊断相关的,因此该分析结果也与疾病诊断相关,具体依据分析需求确定。After the consultation information is compiled, the embodiment of the present invention can also analyze the consultation information according to the requirements, and obtain the corresponding analysis result. Since the consultation is related to the diagnosis of the disease, the analysis result is also related to the diagnosis of the disease. Determined based on analytical needs.
例如,可以对每种疾病,统计医生的常用问题,提供给经验较少的医生作为参考;可以对问诊信息进行分析,开发一个中医(西医)人工智能问答系统等;还可通过统计、分析等方式确定出每种疾病对应的症状、治疗方法等。For example, for each disease, statistics of doctors' common problems can be provided to doctors with less experience as a reference; analysis of consultation information can be carried out, and a TCM (Western Medicine) artificial intelligence question and answer system can be developed; statistics and analysis can also be performed. The symptoms, treatment methods, etc. corresponding to each disease are determined in the same manner.
对于医生记录医案的习惯、需求,基于上述方案,可通过录音的方式,将与患者的交流过程记录下来,然后分离出医生和患者的语句,进行区分并整理,以对话的形式提供给医生作为医案,能够有效降低医生在医案整理上所话费的时间。For doctors to record the habits and needs of medical records, based on the above scheme, the communication process with the patient can be recorded by means of recording, and then the doctor and patient statements can be separated, differentiated and arranged, and provided to the doctor in the form of dialogue. As a medical case, it can effectively reduce the time spent by doctors on medical records.
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关 之处参见方法实施例的部分说明即可。For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
图6是根据一示例性实施例示出的一种用于基于语音的数据处理的电子设备600的结构框图。例如,电子设备600可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等;也可以是服务端设备,如服务器。FIG. 6 is a structural block diagram of an electronic device 600 for voice-based data processing, according to an exemplary embodiment. For example, the electronic device 600 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc., or a server device such as a server.
参照图6,电子设备600可以包括以下一个或多个组件:处理组件602,存储器604,电源组件606,多媒体组件608,音频组件610,输入/输出(I/O)的接口612,传感器组件614,以及通信组件616。Referring to FIG. 6, the electronic device 600 can include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, and a sensor component 614. And communication component 616.
处理组件602通常控制电子设备600的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理元件602可以包括一个或多个处理器620来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件602可以包括一个或多个模块,便于处理组件602和其他组件之间的交互。例如,处理部件602可以包括多媒体模块,以方便多媒体组件608和处理组件602之间的交互。 Processing component 602 typically controls the overall operation of electronic device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 602 can include one or more processors 620 to execute instructions to perform all or part of the steps described above. Moreover, processing component 602 can include one or more modules to facilitate interaction between component 602 and other components. For example, processing component 602 can include a multimedia module to facilitate interaction between multimedia component 608 and processing component 602.
存储器604被配置为存储各种类型的数据以支持在设备600的操作。这些数据的示例包括用于在电子设备600上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器604可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。 Memory 604 is configured to store various types of data to support operation at device 600. Examples of such data include instructions for any application or method operating on electronic device 600, contact data, phone book data, messages, pictures, videos, and the like. The memory 604 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
电力组件604为电子设备600的各种组件提供电力。电力组件604可以包括电源管理系统,一个或多个电源,及其他与为电子设备600生成、管理和分配电力相关联的组件。 Power component 604 provides power to various components of electronic device 600. Power component 604 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 600.
多媒体组件608包括在所述电子设备600和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件608包括一个前置摄像头和/或后置摄像头。当电子设备600处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 608 includes a screen between the electronic device 600 and a user that provides an output interface. In some embodiments, the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front camera and/or a rear camera. When the electronic device 600 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
音频组件610被配置为输出和/或输入音频信号。例如,音频组件610包括一个麦克风(MIC),当电子设备600处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器604或经由通信组件616发送。在一些实施例中,音频组件610还包括一个扬声器,用于输出音频信号。The audio component 610 is configured to output and/or input an audio signal. For example, the audio component 610 includes a microphone (MIC) that is configured to receive an external audio signal when the electronic device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in memory 604 or transmitted via communication component 616. In some embodiments, audio component 610 also includes a speaker for outputting an audio signal.
I/O接口612为处理组件402和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按 钮和锁定按钮。The I/O interface 612 provides an interface between the processing component 402 and the peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a launch button, and a lock button.
传感器组件614包括一个或多个传感器,用于为电子设备600提供各个方面的状态评估。例如,传感器组件614可以检测到设备600的打开/关闭状态,组件的相对定位,例如所述组件为电子设备600的显示器和小键盘,传感器组件614还可以检测电子设备600或电子设备600一个组件的位置改变,用户与电子设备600接触的存在或不存在,电子设备600方位或加速/减速和电子设备600的温度变化。传感器组件614可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件614还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件614还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。 Sensor assembly 614 includes one or more sensors for providing electronic device 600 with a status assessment of various aspects. For example, sensor component 614 can detect an open/closed state of device 600, a relative positioning of components, such as the display and keypad of electronic device 600, and sensor component 614 can also detect a component of electronic device 600 or electronic device 600. The position changes, the presence or absence of contact of the user with the electronic device 600, the orientation or acceleration/deceleration of the electronic device 600, and the temperature change of the electronic device 600. Sensor assembly 614 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 614 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
通信组件616被配置为便于电子设备600和其他设备之间有线或无线方式的通信。电子设备400可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信部件614经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信部件614还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。 Communication component 616 is configured to facilitate wired or wireless communication between electronic device 600 and other devices. The electronic device 400 can access a wireless network based on a communication standard such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 614 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 614 also includes a near field communication (NFC) module to facilitate short range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
在示例性实施例中,电子设备600可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, electronic device 600 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), A gated array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the above methods.
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器604,上述指令可由电子设备400的处理器620执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium comprising instructions, such as a memory 604 comprising instructions executable by processor 620 of electronic device 400 to perform the above method. For example, the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
一种非临时性计算机可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行一种基于语音的数据处理方法,所述方法包括:获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定;依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户;依据所述第一文本数据和第二文本数据,得到问诊信息。A non-transitory computer readable storage medium, when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform a voice-based data processing method, the method comprising: obtaining a consultation Process data, the consultation process data is determined according to the voice data collected during the consultation process; the identification is performed according to the consultation process data, and the corresponding first text data and second text data are acquired, wherein the first text The data belongs to a target user, and the second text data belongs to other users than the target user; according to the first text data and the second text data, the consultation information is obtained.
可选地,所述问诊过程数据包括语音数据和/或语音数据识别得到的文本识别结果。Optionally, the consultation process data includes a text recognition result obtained by the voice data and/or the voice data identification.
可选地,所述问诊过程数据为语音数据;所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据;对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。Optionally, the consultation process data is voice data; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: according to the voiceprint feature, from the voice Separating the first voice data and the second voice data from the data; respectively performing voice recognition on the first voice data and the second voice data, and acquiring corresponding first text data and second text data.
可选地,所述依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据,包括:将所述语音数据划分为多个语音片段;依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。Optionally, the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data, including: dividing the voice data into multiple voice segments; according to the voiceprint feature, adopting The speech segment determines the first speech data and the second speech data.
可选地,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;获取与所述基准声纹特征相符的语音片段,得到对应的第一语音数据;获取与所述基准声纹特征不相符的语音片段,得到对应的第二语音数据。Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature a voiceprint feature of the target user; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; acquiring a voice segment that does not match the reference voiceprint feature, and obtaining a corresponding second voice data .
可选地,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:对各语音片段的声纹特征进行识别;统计各声纹特征对应语音片段的数量;确定具有语音片段的数量最大的声纹特征,采用所述声纹特征对应的语音片段生成第一语音数据;采用不属于第一语音数据的语音片段生成第二语音数据。Optionally, determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining the voiceprint feature having the largest number of voice segments, generating the first voice data using the voice segment corresponding to the voiceprint feature, and generating the second voice data using the voice segment not belonging to the first voice data.
可选地,对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据,包括:对所述第一语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第一文本数据;对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据。Optionally, performing voice recognition on the first voice data and the second voice data to obtain corresponding first text data and second text data, including: performing voice separately on each voice segment in the first voice data Identifying, generating the first text data by using the recognized text segment; separately performing speech recognition on each of the second speech data segments, and generating the second text data by using the recognized text segment.
可选地,所述问诊过程数据为语音数据识别得到的文本识别结果;所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。Optionally, the consultation process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.
可选地,对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据,包括:对所述文本识别结果进行划分,获取对应的文本片段;采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括第一语言特征和第二语言特征;采用具有第一语言特征的文本片段生成第一文本数据,以及,采用具有第二语言特征的文本片段生成第二文本数据。Optionally, performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature, including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; generating a first text data using the text segment having the first language feature, and The second text data is generated using the text segment having the second language feature.
可选地,还包括:对所述问诊信息进行分析,得到相应的分析结果,所述分析结果与疾病诊断相关。Optionally, the method further includes: analyzing the consultation information, and obtaining a corresponding analysis result, where the analysis result is related to the disease diagnosis.
图7是本发明根据另一示例性实施例示出的一种用于基于语音的数据处理的电子设备700的结构示意图。该电子设备700可以是服务器,该服务器可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)722(例如,一个或一个以上处理器)和存储器732,一个或一个以上存储应用程序742或数据744的存储介质730(例如一个或一个以上海量存储设备)。其中,存储器732和存储介质730可以是短暂存储或持久存储。存储在存储介质730的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器722可以设置为与存储介质730通信,在服务器上执行存储介质730中的一系列指令操作。FIG. 7 is a schematic structural diagram of an electronic device 700 for voice-based data processing according to another exemplary embodiment of the present invention. The electronic device 700 can be a server that can vary considerably depending on configuration or performance, and can include one or more central processing units (CPUs) 722 (eg, one or more processors) And memory 732, one or more storage media 730 storing application 742 or data 744 (eg, one or one storage device in Shanghai). Among them, the memory 732 and the storage medium 730 may be short-term storage or persistent storage. The program stored on storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations in the server. Still further, central processor 722 can be configured to communicate with storage medium 730, executing a series of instruction operations in storage medium 730 on the server.
服务器还可以包括一个或一个以上电源726,一个或一个以上有线或无线网络接口750,一个或一个以上输入输出接口758,一个或一个以上键盘756,和/或,一个或一个以上操作系统741,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The server may also include one or more power sources 726, one or more wired or wireless network interfaces 750, one or more input and output interfaces 758, one or more keyboards 756, and/or one or more operating systems 741, For example, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
在示例性实施例中,服务器经配置以由一个或者一个以上中央处理器722执行一个或者一个以上程序包含用于进行以下操作的指令:获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定;依据所述问诊过程数据进行识别,获取对应的 第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户;依据所述第一文本数据和第二文本数据,得到问诊信息。In an exemplary embodiment, the server is configured to execute, by one or more central processors 722, one or more programs including instructions for: obtaining consultation process data, the consultation process data being based on the consultation Determining the voice data collected in the process; performing identification according to the consultation process data, and acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data It belongs to other users than the target user; according to the first text data and the second text data, the consultation information is obtained.
可选的,所述问诊过程数据包括语音数据和/或语音数据识别得到的文本识别结果。Optionally, the consultation process data includes a text recognition result obtained by the voice data and/or the voice data identification.
可选的,所述问诊过程数据为语音数据;所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据;对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。Optionally, the consultation process data is voice data; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: according to the voiceprint feature, from the voice Separating the first voice data and the second voice data from the data; respectively performing voice recognition on the first voice data and the second voice data, and acquiring corresponding first text data and second text data.
可选的,所述依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据,包括:将所述语音数据划分为多个语音片段;依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。Optionally, the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data, including: dividing the voice data into multiple voice segments; The speech segment determines the first speech data and the second speech data.
可选的,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;获取与所述基准声纹特征相符的语音片段,得到对应的第一语音数据;获取与所述基准声纹特征不相符的语音片段,得到对应的第二语音数据。Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature a voiceprint feature of the target user; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; acquiring a voice segment that does not match the reference voiceprint feature, and obtaining a corresponding second voice data .
可选的,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:对各语音片段的声纹特征进行识别;统计各声纹特征对应语音片段的数量;确定具有语音片段的数量最大的声纹特征,采用所述声纹特征对应的语音片段生成第一语音数据;采用不属于第一语音数据的语音片段生成第二语音数据。Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining the voiceprint feature having the largest number of voice segments, generating the first voice data using the voice segment corresponding to the voiceprint feature, and generating the second voice data using the voice segment not belonging to the first voice data.
可选的,对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据,包括:对所述第一语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第一文本数据;对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据。Optionally, performing voice recognition on the first voice data and the second voice data to obtain corresponding first text data and second text data, including: performing voice separately on each voice segment in the first voice data Identifying, generating the first text data by using the recognized text segment; separately performing speech recognition on each of the second speech data segments, and generating the second text data by using the recognized text segment.
可选的,所述问诊过程数据为语音数据识别得到的文本识别结果;所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。Optionally, the inquiry process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.
可选的,对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据,包括:对所述文本识别结果进行划分,获取对应的文本片段;采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括第一语言特征和第二语言特征;采用具有第一语言特征的文本片段生成第一文本数据,以及,采用具有第二语言特征的文本片段生成第二文本数据。Optionally, performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature, including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; generating a first text data using the text segment having the first language feature, and The second text data is generated using the text segment having the second language feature.
可选地,服务器由一个或者一个以上处理器522执行所述一个或者一个以上程序包含还用于进行以下操作的指令:对所述问诊信息进行分析,得到相应的分析结果,所述分析结果与疾病诊断相关。Optionally, executing, by the one or more processors 522, the one or more programs includes instructions for performing the following operations: analyzing the medical consultation information to obtain a corresponding analysis result, the analysis result Related to the diagnosis of the disease.
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.
本领域内的技术人员应明白,本发明实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本发明实施例可采用完全硬件实施例、完全软件实施例、或结合软 件和硬件方面的实施例的形式。而且,本发明实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the embodiments of the invention may be provided as a method, apparatus, or computer program product. Thus, embodiments of the invention may be in the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, embodiments of the invention may take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本发明实施例是参照根据本发明实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。Embodiments of the invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed above provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.
尽管已描述了本发明实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明实施例范围的所有变更和修改。While a preferred embodiment of the present invention has been described, it will be apparent that those skilled in the art can make further changes and modifications to the embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, or terminal device that includes a plurality of elements includes not only those elements but also Other elements that are included, or include elements inherent to such a process, method, article, or terminal device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.
以上对本发明所提供的一种语料抽取方法、一种语料抽取装置和一种电子设备,进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The above is a detailed description of a corpus extraction method, a corpus extraction device and an electronic device provided by the present invention. The principle and implementation of the present invention are described in the following. The description is only for helping to understand the method of the present invention and its core idea; at the same time, for those skilled in the art, according to the idea of the present invention, there will be changes in specific embodiments and application scopes. The contents of this specification are not to be construed as limiting the invention.

Claims (25)

  1. 一种基于语音的数据处理方法,其特征在于,包括:A voice-based data processing method, comprising:
    获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定;Obtaining the data of the consultation process, and the data of the consultation process is determined according to the voice data collected during the consultation process;
    依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户;Identifying, according to the consultation process data, acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target user Other users;
    依据所述第一文本数据和第二文本数据,得到问诊信息。According to the first text data and the second text data, the consultation information is obtained.
  2. 根据权利要求1所述的方法,其特征在于,所述问诊过程数据为语音数据;The method according to claim 1, wherein the inquiry process data is voice data;
    所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:And determining, according to the data of the consultation process, the corresponding first text data and the second text data, including:
    依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据;Separating the first voice data and the second voice data from the voice data according to the voiceprint feature;
    对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。Performing voice recognition on the first voice data and the second voice data respectively, and acquiring corresponding first text data and second text data.
  3. 根据权利要求2所述的方法,其特征在于,所述依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据,包括:The method according to claim 2, wherein the separating the first voice data and the second voice data from the voice data according to the voiceprint feature comprises:
    将所述语音数据划分为多个语音片段;Dividing the voice data into a plurality of voice segments;
    依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。The first voice data and the second voice data are determined using the voice segment according to the voiceprint feature.
  4. 根据权利要求3所述的方法,其特征在于,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:The method according to claim 3, wherein the determining the first voice data and the second voice data by using the voice segment according to the voiceprint feature comprises:
    采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;Each of the voice segments is matched by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user;
    获取与所述基准声纹特征相符的语音片段,得到对应的第一语音数据;Acquiring a voice segment corresponding to the reference voiceprint feature to obtain corresponding first voice data;
    获取与所述基准声纹特征不相符的语音片段,得到对应的第二语音数据。Obtaining a voice segment that does not match the reference voiceprint feature, and obtaining corresponding second voice data.
  5. 根据权利要求3所述的方法,其特征在于,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:The method according to claim 3, wherein the determining the first voice data and the second voice data by using the voice segment according to the voiceprint feature comprises:
    对各语音片段的声纹特征进行识别;Identifying the voiceprint features of each voice segment;
    统计各声纹特征分别对应语音片段的数量;Counting the respective voiceprint features corresponding to the number of voice segments;
    确定具有语音片段的数量最大的声纹特征,采用所述声纹特征对应的语音片段生成第一语音数据;Determining a voiceprint feature having the largest number of voice segments, and generating first voice data by using the voice segment corresponding to the voiceprint feature;
    采用不属于所述第一语音数据的语音片段生成第二语音数据。The second voice data is generated using a voice segment that does not belong to the first voice data.
  6. 根据权利要求2所述的方法,其特征在于,所述对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据,包括:The method according to claim 2, wherein the performing the voice recognition on the first voice data and the second voice data respectively, and acquiring the corresponding first text data and second text data, includes:
    对所述第一语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第一文本数据;Performing voice recognition on each voice segment in the first voice data, and generating first text data by using the recognized text segment;
    对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据;Performing voice recognition on each voice segment in the second voice data, and generating second text data by using the recognized text segment;
    则,所述依据所述第一文本数据和第二文本数据,得到问诊信息,包括:Then, according to the first text data and the second text data, obtaining the consultation information, including:
    依据所述第一文本数据中各文本片段和所述第二文本数据中各文本片段分别对应语 音片段的时间顺序,对各文本片段进行排序,得到问诊信息。And according to the time sequence of each of the text segments in the first text data and the text segments in the second text data respectively corresponding to the voice segments, the text segments are sorted to obtain the consultation information.
  7. 根据权利要求1所述的方法,其特征在于,所述问诊过程数据为语音数据识别得到的文本识别结果;The method according to claim 1, wherein the inquiry process data is a text recognition result obtained by voice data recognition;
    所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:And determining, according to the data of the consultation process, the corresponding first text data and the second text data, including:
    对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。Characterizing the text recognition result, separating the first text data and the second text data according to the language feature.
  8. 根据权利要求7所述的方法,其特征在于,对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据,包括:The method according to claim 7, wherein the character recognition result is characterized, and the first text data and the second text data are separated according to the language feature, including:
    对所述文本识别结果进行划分,获取对应的文本片段;Dividing the text recognition result to obtain a corresponding text segment;
    采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括目标用户语言特征和非目标用户语言特征;Identifying the text segment by using a preset model, and determining a language feature of the text segment, the language feature including a target user language feature and a non-target user language feature;
    采用具有目标用户语言特征的文本片段生成第一文本数据,以及,采用具有非目标用户语言特征的文本片段生成第二文本数据。The first text data is generated using a text segment having a target user language feature, and the second text data is generated using a text segment having a non-target user language feature.
  9. 一种基于语音的数据处理装置,其特征在于,包括:A voice-based data processing device, comprising:
    数据获取模块,用于获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定;a data acquisition module, configured to obtain the data of the consultation process, wherein the data of the consultation process is determined according to the voice data collected during the consultation process;
    文本识别模块,用于依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户;a text recognition module, configured to perform identification according to the consultation process data, and obtain corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to Other users than the target user;
    信息确定模块,用于依据所述第一文本数据和第二文本数据,得到问诊信息。The information determining module is configured to obtain the consultation information according to the first text data and the second text data.
  10. 根据权利要求9所述的装置,其特征在于,所述问诊过程数据为语音数据;The device according to claim 9, wherein the consultation process data is voice data;
    所述文本识别模块,包括:The text recognition module includes:
    分离模块,用于依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据;a separating module, configured to separate the first voice data and the second voice data from the voice data according to the voiceprint feature;
    语音识别模块,用于对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。The voice recognition module is configured to separately perform voice recognition on the first voice data and the second voice data, and acquire corresponding first text data and second text data.
  11. 根据权利要求10所述的装置,其特征在于,The device of claim 10 wherein:
    所述分离模块,用于将所述语音数据划分为多个语音片段;依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。The separating module is configured to divide the voice data into a plurality of voice segments; and according to the voiceprint feature, the voice segment is used to determine the first voice data and the second voice data.
  12. 根据权利要求11所述的装置,其特征在于,The device of claim 11 wherein:
    所述分离模块,用于采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;获取与所述基准声纹特征相符的音频片段,得到对应的第一语音数据;获取与所述基准声纹特征不相符的音频片段,得到对应的第二语音数据。The separating module is configured to respectively match each of the voice segments by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user; and acquiring an audio segment corresponding to the reference voiceprint feature, Corresponding first voice data; acquiring an audio segment that does not match the reference voiceprint feature, to obtain corresponding second voice data.
  13. 根据权利要求11所述的装置,其特征在于,The device of claim 11 wherein:
    所述分离模块,用于对各语音片段的声纹特征进行识别;分别统计具有相同声纹特征的语音片段及其数量,采用数量最大语音片段生成第二语音数据,其中,数量最大的 声纹特征为目标用户的声纹特征;采用剩余的语音片段生成第二语音数据。The separating module is configured to identify the voiceprint features of each voice segment; separately count the voice segments having the same voiceprint feature and their numbers, and generate the second voice data by using the largest number of voice segments, wherein the largest number of voiceprints The feature is a voiceprint feature of the target user; the second voice data is generated using the remaining voice segments.
  14. 根据权利要求10所述的装置,其特征在于,The device of claim 10 wherein:
    所述语音识别模块,用于对所述第一语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第一文本数据;对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据;The voice recognition module is configured to separately perform voice recognition on each voice segment in the first voice data, generate first text data by using the recognized text segment, and perform voice separately on each voice segment in the second voice data. Identifying, generating the second text data by using the recognized text segment;
    所述信息确定模块,用于依据所述第一文本数据中各文本片段和所述第二文本数据中各文本片段分别对应语音片段的时间顺序,对各文本片段进行排序,得到问诊信息。The information determining module is configured to sort the text segments according to the time sequence of each of the text segments in the first text data and the text segments in the second text data, respectively, to obtain the consultation information.
  15. 根据权利要求9所述的装置,其特征在于,所述问诊过程数据为语音数据识别得到的文本识别结果;The device according to claim 9, wherein the inquiry process data is a text recognition result obtained by voice data recognition;
    所述文本识别模块,用于对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。The text recognition module is configured to perform feature recognition on the text recognition result, and separate the first text data and the second text data according to the language feature.
  16. 根据权利要求15所述的装置,其特征在于,所述文本识别模块,包括:The device according to claim 15, wherein the text recognition module comprises:
    片段划分模块,用于对所述文本识别结果进行划分,获取对应的文本片段;a segment dividing module, configured to divide the text recognition result to obtain a corresponding text segment;
    片段识别模块,用于采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括第一语言特征和第二语言特征;a segment identification module, configured to identify the text segment by using a preset model, and determine a language feature that the text segment has, the language feature including a first language feature and a second language feature;
    文本生成模块,用于采用具有第一语言特征的文本片段生成第一文本数据,以及,采用具有第二语言特征的文本片段生成第二文本数据。And a text generating module, configured to generate the first text data by using the text segment having the first language feature, and generate the second text data by using the text segment having the second language feature.
  17. 一种可读存储介质,其特征在于,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如方法权利要求1-8中一个或多个所述的基于语音的数据处理方法。A readable storage medium, wherein when the instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform voice-based as described in one or more of the method claims 1-8 Data processing method.
  18. 一种电子设备,其特征在于,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:An electronic device, comprising: a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to execute the one or more by one or more processors The program contains instructions for doing the following:
    获取问诊过程数据,所述问诊过程数据依据问诊过程中采集的语音数据确定;Obtaining the data of the consultation process, and the data of the consultation process is determined according to the voice data collected during the consultation process;
    依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,其中,所述第一文本数据属于一个目标用户,所述第二文本数据属于除所述目标用户之外的其他用户;Identifying, according to the consultation process data, acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target user Other users;
    依据所述第一文本数据和第二文本数据,得到问诊信息。According to the first text data and the second text data, the consultation information is obtained.
  19. 根据权利要求18所述的电子设备,其特征在于,所述问诊过程数据为语音数据;The electronic device according to claim 18, wherein the inquiry process data is voice data;
    所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:And determining, according to the data of the consultation process, the corresponding first text data and the second text data, including:
    依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据;Separating the first voice data and the second voice data from the voice data according to the voiceprint feature;
    对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据。Performing voice recognition on the first voice data and the second voice data respectively, and acquiring corresponding first text data and second text data.
  20. 根据权利要求19所述的电子设备,其特征在于,所述依据声纹特征,从所述语音数据中分离出第一语音数据和第二语音数据,包括:The electronic device according to claim 19, wherein the separating the first voice data and the second voice data from the voice data according to the voiceprint feature comprises:
    将所述语音数据划分为多个语音片段;Dividing the voice data into a plurality of voice segments;
    依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据。The first voice data and the second voice data are determined using the voice segment according to the voiceprint feature.
  21. 根据权利要求20所述的电子设备,其特征在于,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:The electronic device according to claim 20, wherein the determining the first voice data and the second voice data by using the voice segment according to the voiceprint feature comprises:
    采用基准声纹特征对各语音片段分别进行匹配,其中,所述基准声纹特征为目标用户的声纹特征;Each of the voice segments is matched by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user;
    获取与所述基准声纹特征相符的语音片段,得到对应的第一语音数据;Acquiring a voice segment corresponding to the reference voiceprint feature to obtain corresponding first voice data;
    获取与所述基准声纹特征不相符的语音片段,得到对应的第二语音数据。Obtaining a voice segment that does not match the reference voiceprint feature, and obtaining corresponding second voice data.
  22. 根据权利要求20所述的电子设备,其特征在于,所述依据声纹特征,采用所述语音片段确定第一语音数据和第二语音数据,包括:The electronic device according to claim 20, wherein the determining the first voice data and the second voice data by using the voice segment according to the voiceprint feature comprises:
    对各语音片段的声纹特征进行识别;Identifying the voiceprint features of each voice segment;
    统计各声纹特征对应语音片段的数量;Counting the number of voice segments corresponding to each voiceprint feature;
    确定具有语音片段的数量最大的声纹特征,采用所述声纹特征对应的语音片段生成第一语音数据;Determining a voiceprint feature having the largest number of voice segments, and generating first voice data by using the voice segment corresponding to the voiceprint feature;
    采用不属于所述第一语音数据的语音片段生成第二语音数据。The second voice data is generated using a voice segment that does not belong to the first voice data.
  23. 根据权利要求19所述的电子设备,其特征在于,所述对所述第一语音数据和第二语音数据分别进行语音识别,获取对应的第一文本数据和第二文本数据,包括:The electronic device according to claim 19, wherein the performing the voice recognition on the first voice data and the second voice data to obtain the corresponding first text data and the second text data comprises:
    对所述第一语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第一文本数据;Performing voice recognition on each voice segment in the first voice data, and generating first text data by using the recognized text segment;
    对所述第二语音数据中各语音片段分别进行语音识别,采用识别得到的文本片段生成第二文本数据;Performing voice recognition on each voice segment in the second voice data, and generating second text data by using the recognized text segment;
    则,所述依据所述第一文本数据和第二文本数据,得到问诊信息,包括:Then, according to the first text data and the second text data, obtaining the consultation information, including:
    依据所述第一文本数据中各文本片段和所述第二文本数据中各文本片段分别对应语音片段的时间顺序,对各文本片段进行排序,得到问诊信息。And according to the time sequence of each of the text segments in the first text data and the text segments in the second text data respectively corresponding to the voice segments, the text segments are sorted to obtain the consultation information.
  24. 根据权利要求18所述的电子设备,其特征在于,所述问诊过程数据为语音数据识别得到的文本识别结果;The electronic device according to claim 18, wherein the inquiry process data is a text recognition result obtained by the voice data recognition;
    所述依据所述问诊过程数据进行识别,获取对应的第一文本数据和第二文本数据,包括:And determining, according to the data of the consultation process, the corresponding first text data and the second text data, including:
    对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据。Characterizing the text recognition result, separating the first text data and the second text data according to the language feature.
  25. 根据权利要求24所述的电子设备,其特征在于,对所述文本识别结果进行特征识别,依据语言特征分离出第一文本数据和第二文本数据,包括:The electronic device according to claim 24, wherein the character recognition result is subjected to feature recognition, and the first text data and the second text data are separated according to the language feature, including:
    对所述文本识别结果进行划分,获取对应的文本片段;Dividing the text recognition result to obtain a corresponding text segment;
    采用预设模型对所述文本片段进行识别,确定所述文本片段具有的语言特征,所述语言特征包括目标用户语言特征和非目标用户语言特征;Identifying the text segment by using a preset model, and determining a language feature of the text segment, the language feature including a target user language feature and a non-target user language feature;
    采用具有目标用户语言特征的文本片段生成第一文本数据,以及,采用具有非目标用户语言特征的文本片段生成第二文本数据。The first text data is generated using a text segment having a target user language feature, and the second text data is generated using a text segment having a non-target user language feature.
PCT/CN2018/082702 2017-05-26 2018-04-11 Voice-based data processing method and apparatus, and electronic device WO2018214663A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710384412.3A CN108962253A (en) 2017-05-26 2017-05-26 A kind of voice-based data processing method, device and electronic equipment
CN201710384412.3 2017-05-26

Publications (1)

Publication Number Publication Date
WO2018214663A1 true WO2018214663A1 (en) 2018-11-29

Family

ID=64395285

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/082702 WO2018214663A1 (en) 2017-05-26 2018-04-11 Voice-based data processing method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN108962253A (en)
WO (1) WO2018214663A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582708A (en) * 2020-04-30 2020-08-25 北京声智科技有限公司 Medical information detection method, system, electronic device and computer-readable storage medium
CN112118415B (en) * 2020-09-18 2023-02-10 瑞然(天津)科技有限公司 Remote diagnosis and treatment method and device, patient side terminal and doctor side terminal
CN114520062B (en) * 2022-04-20 2022-07-22 杭州马兰头医学科技有限公司 Medical cloud communication system based on AI and letter creation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268279A (en) * 2014-10-16 2015-01-07 魔方天空科技(北京)有限公司 Query method and device of corpus data
CN104427292A (en) * 2013-08-22 2015-03-18 中兴通讯股份有限公司 Method and device for extracting a conference summary
CN105469790A (en) * 2014-08-29 2016-04-06 上海联影医疗科技有限公司 Consultation information processing method and device
CN106326640A (en) * 2016-08-12 2017-01-11 上海交通大学医学院附属瑞金医院卢湾分院 Medical speech control system and control method thereof
CN106328124A (en) * 2016-08-24 2017-01-11 安徽咪鼠科技有限公司 Voice recognition method based on user behavior characteristics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104427292A (en) * 2013-08-22 2015-03-18 中兴通讯股份有限公司 Method and device for extracting a conference summary
CN105469790A (en) * 2014-08-29 2016-04-06 上海联影医疗科技有限公司 Consultation information processing method and device
CN104268279A (en) * 2014-10-16 2015-01-07 魔方天空科技(北京)有限公司 Query method and device of corpus data
CN106326640A (en) * 2016-08-12 2017-01-11 上海交通大学医学院附属瑞金医院卢湾分院 Medical speech control system and control method thereof
CN106328124A (en) * 2016-08-24 2017-01-11 安徽咪鼠科技有限公司 Voice recognition method based on user behavior characteristics

Also Published As

Publication number Publication date
CN108962253A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN108899037B (en) Animal voiceprint feature extraction method and device and electronic equipment
US11580983B2 (en) Sign language information processing method and apparatus, electronic device and readable storage medium
US10270736B2 (en) Account adding method, terminal, server, and computer storage medium
KR102603466B1 (en) Method for training a voiceprint extraction model and method for voiceprint recognition, and device and medium thereof, program
WO2018120447A1 (en) Method, device and equipment for processing medical record information
US20090326937A1 (en) Using personalized health information to improve speech recognition
CN108536669B (en) Literal information processing method, device and terminal
WO2018214663A1 (en) Voice-based data processing method and apparatus, and electronic device
TW201426362A (en) Input processing method and apparatus
CN109558599B (en) Conversion method and device and electronic equipment
CN106202150A (en) Method for information display and device
WO2021031308A1 (en) Audio processing method and device, and storage medium
CN107168958A (en) A kind of interpretation method and device
CN109585001A (en) A kind of data analysing method, device, electronic equipment and storage medium
CN108628819A (en) Treating method and apparatus, the device for processing
CN105447109A (en) Key word searching method and apparatus
CN105550643A (en) Medical term recognition method and device
WO2021208531A1 (en) Speech processing method and apparatus, and electronic device
CN116166843B (en) Text video cross-modal retrieval method and device based on fine granularity perception
CN111898382A (en) Named entity recognition method and device for named entity recognition
CN108665889A (en) The Method of Speech Endpoint Detection, device, equipment and storage medium
JP2022510660A (en) Data processing methods and their devices, electronic devices, and storage media
WO2018018912A1 (en) Search method and apparatus, and electronic device
CN109002184A (en) A kind of association method and device of input method candidate word
CN112133295B (en) Speech recognition method, device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18806292

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18806292

Country of ref document: EP

Kind code of ref document: A1