WO2018214663A1

WO2018214663A1 - Voice-based data processing method and apparatus, and electronic device

Info

Publication number: WO2018214663A1
Application number: PCT/CN2018/082702
Authority: WO
Inventors: 李明修; 银磊; 卜海亮
Original assignee: 北京搜狗科技发展有限公司
Priority date: 2017-05-26
Filing date: 2018-04-11
Publication date: 2018-11-29
Also published as: CN108962253A

Abstract

A voice-based data processing method and apparatus, and an electronic device, used for completely recording a diagnostic inquiry process. The method comprises: acquiring diagnostic inquiry process data, the diagnostic inquiry process data being determined according to voice data collected in a diagnostic inquiry process (102); performing recognition according to the diagnostic inquiry process data to acquire corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to another user other than the target user (104); and obtaining diagnostic inquiry information according to the first text data and the second text data (106). By means of the method, the apparatus and the electronic device, the statements of a doctor and a patient in a diagnostic inquiry process can be automatically distinguished, the diagnostic inquiry process is completely recorded, and same is automatically organised to obtain content such as a medical record, thus saving on time for organising diagnostic inquiry records.

Description

Voice-based data processing method, device and electronic device

The present application claims the entire priority of the invention patent application filed in the Chinese application No. 201710384412.3, the filing date is May 26, 2017, and the invention is entitled "a voice-based data processing method, apparatus and electronic device".

Technical field

The present invention relates to the technical field, and in particular, to a voice-based data processing method, apparatus, and electronic device.

Background technique

Speech recognition usually converts speech into text. Traditional speech recognition recording tools can only convert speech data into corresponding text, but cannot distinguish between speakers. Therefore, in the case of multi-person speech, recording cannot be performed efficiently by speech recognition.

For example, in the actual hospital treatment process, at least two people will communicate, that is, at least there will be communication between the doctor and the patient, and sometimes the patient's family, etc., and the voice diagnosis record obtained by the existing speech recognition tool cannot be realized. The corresponding voice producers are distinguished, and the entire consultation process cannot be comprehensively recorded.

Summary of the invention

Embodiments of the present invention provide a voice-based data processing method to completely record a consultation process.

Correspondingly, the embodiment of the present invention further provides a voice-based data processing device, an electronic device, and a readable storage medium, to ensure implementation and application of the foregoing method.

In order to solve the above problem, the embodiment of the present invention discloses a voice-based data processing method, including: obtaining an inquiry process data, where the consultation process data is determined according to voice data collected during the consultation process; The process data is identified, and the corresponding first text data and the second text data are acquired, wherein the first text data belongs to a target user, and the second text data belongs to other users than the target user; The first text data and the second text data obtain the consultation information.

Optionally, the consultation process data is voice data; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: according to the voiceprint feature, from the voice Separating the first voice data and the second voice data from the data; respectively performing voice recognition on the first voice data and the second voice data, and acquiring corresponding first text data and second text data.

Optionally, the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data, including: dividing the voice data into multiple voice segments; The speech segment determines the first speech data and the second speech data.

Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature a voiceprint feature of the target user; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; acquiring a voice segment that does not match the reference voiceprint feature, and obtaining a corresponding second voice data .

Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining a voiceprint feature having the largest number of voice segments, generating first voice data using the voice segment corresponding to the voiceprint feature, and generating second voice data using the voice segment not belonging to the first voice data.

Optionally, the performing the voice recognition on the first voice data and the second voice data separately, and acquiring the corresponding first text data and the second text data, including: respectively, respectively, each voice segment in the first voice data Performing speech recognition, generating first text data by using the recognized text segment; performing speech recognition on each speech segment in the second speech data, and generating second text data by using the recognized text segment; The first text data and the second text data are used to obtain the consultation information, including: according to the time sequence of each of the text segments in the first text data and the text segments in the second text data respectively corresponding to the voice segments, The text segments are sorted to get the consultation information.

Optionally, the inquiry process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.

Optionally, performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature, including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature including a target user language feature and a non-target user language feature; generating the first text data by using the text segment having the target user language feature, and Generating second text data using text segments having non-target user language features.

The embodiment of the invention further discloses a voice-based data processing device, comprising: a data acquisition module, configured to acquire the data of the consultation process, wherein the data of the consultation process is determined according to the voice data collected during the consultation process; the text recognition module And for identifying, according to the consultation process data, acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target Other users than the user; the information determining module is configured to obtain the consultation information according to the first text data and the second text data.

Optionally, the query process data is voice data; the text recognition module includes: a separation module, configured to separate the first voice data and the second voice data from the voice data according to the voiceprint feature; The voice recognition module is configured to separately perform voice recognition on the first voice data and the second voice data, and acquire corresponding first text data and second text data.

Optionally, the separating module is configured to divide the voice data into multiple voice segments; and according to the voiceprint feature, the voice segment is used to determine the first voice data and the second voice data.

Optionally, the separating module is configured to respectively match each of the voice segments by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user; and acquiring a reference voiceprint feature. The audio segment obtains corresponding first voice data; and acquires an audio segment that does not match the reference voiceprint feature to obtain corresponding second voice data.

Optionally, the separating module is configured to identify voiceprint features of each voice segment; separately count voice segments having the same voiceprint feature and their numbers, and generate second voice data by using the largest number of voice segments, where The largest voiceprint feature is the voiceprint feature of the target user; the second voice data is generated using the remaining voice segments.

Optionally, the voice recognition module is configured to perform voice recognition on each voice segment in the first voice data, and generate first text data by using the recognized text segment; and voices in the second voice data. The segment respectively performs speech recognition, and generates second text data by using the recognized text segment; the information determining module is configured to respectively correspond to each text segment in the first text data and each text segment in the second text data The chronological order of the speech segments, sorting each text segment to obtain the consultation information.

Optionally, the inquiry process data is a text recognition result obtained by the voice data identification; the text recognition module is configured to perform feature recognition on the text recognition result, and separate the first text data and the second according to the language feature. text data.

Optionally, the text recognition module includes: a segment dividing module, configured to divide the text recognition result to obtain a corresponding text segment; and a segment identification module, configured to identify the text segment by using a preset model Determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; a text generation module, configured to generate the first text data by using the text segment having the first language feature, and adopting The text segment having the second language feature generates second text data.

Embodiments of the present invention also disclose a readable storage medium, when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform execution based on one or more of the embodiments of the present invention. Voice data processing method.

Optionally, an electronic device includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to execute the one or more by one or more processors The program includes instructions for: obtaining the consultation process data, the diagnosis process data is determined according to the voice data collected during the consultation process; identifying according to the consultation process data, and acquiring the corresponding first text data And second text data, wherein the first text data belongs to a target user, and the second text data belongs to other users than the target user; according to the first text data and the second text data, Get the consultation information.

Embodiments of the invention include the following advantages:

In the embodiment of the present invention, the first text data and the second text data may be identified according to different users by collecting voice data of the consultation process determined by the voice during the consultation process, wherein the first text data is Having a target user, the second text data belongs to other users than the target user, that is, can automatically distinguish the doctor and the patient's sentence during the consultation, and then according to the first text data and the second text data. Get the consultation information, be able to completely record the consultation process, automatically sort out the medical records, etc., and save the finishing time of the consultation records.

DRAWINGS

1 is a flow chart showing the steps of an embodiment of a voice-based data processing method of the present invention;

2 is a flow chart showing the steps of another embodiment of the voice-based data processing method of the present invention;

3 is a flow chart showing the steps of another embodiment of a voice-based data processing method of the present invention;

4 is a structural block diagram of an embodiment of a voice-based data processing apparatus of the present invention;

5 is a structural block diagram of another embodiment of a voice-based data processing apparatus of the present invention;

FIG. 6 is a structural block diagram of an electronic device for voice-based data processing according to an exemplary embodiment of the present invention; FIG.

FIG. 7 is a schematic structural diagram of an electronic device according to a voice-based data processing according to another exemplary embodiment of the present invention.

detailed description

The present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.

Referring to Figure 1, there is shown a flow chart of the steps of an embodiment of a speech-based data processing method of the present invention, which may include the following steps:

In step 102, the data of the consultation process is obtained, and the data of the consultation process is determined according to the voice data collected during the consultation process.

During the consultation process, the consultation process can be collected by various electronic devices, and the data of the consultation process can be obtained based on the collected voice data, that is, the data of the consultation process can be collected voice data, or can be collected based on The speech data is converted to a text recognition result. Thus, embodiments of the present invention can be identified using data collected by various consultation processes.

Step 104: Perform identification according to the consultation process data, and obtain corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target Other users than the user.

The data of the consultation process can be identified, and different identification methods are adopted according to different data types. For example, the voice data can be processed by voiceprint features, voice recognition, etc., and the text data can be identified by text features, thereby obtaining a basis for the user. The first text data and the second text data are distinguished. The consultation process may have at least two users communicating and interacting, one user is a doctor, and other users are patients, family members, and the like. For example, according to the doctor's one-day clinic, it will include a doctor and multiple patients, and may also have one or more family members. Therefore, for the consultation record, the doctor can be the target user, the first text data is the doctor's corresponding consultation text data, and the text data of at least one other user is used as the second text data, that is, the patient and the family corresponding to the consultation. text data.

Step 106: Obtain consultation information according to the first text data and the second text data.

Since the consultation is usually a question and answer process, the first text data and the second text data may be composed of a plurality of text segments, so that the consultation information may be obtained based on the time of the text segment and the corresponding user.

An example of a consultation message is as follows:

2017-4-23 10:23AM

Doctor A: What are your symptoms?

Patient B: I am not comfortable with XXX.

Doctor A: Is there XXX?

Patient B: Yes.

......

In the actual processing, the patient information can also be obtained in combination with the outpatient records of the hospital, thereby distinguishing different patients and the like in the consultation information.

In summary, the first text data and the second text data may be identified from different data in the consultation process data for the consultation process data determined by collecting the voice during the consultation process, wherein the first text The data belongs to a target user, and the second text data belongs to other users than the target user, that is, the statement that can automatically distinguish the doctor and the patient during the consultation, and then according to the first text data and the second text. The data, get the consultation information, can completely record the consultation process, automatically sort out the medical records and other content, and save the finishing time of the consultation records.

In the embodiment of the present invention, the consultation process data includes a text recognition result obtained by the voice data and/or the voice data identification. The identification methods of different types of consultation process data are different. Therefore, the embodiments of the present invention respectively discuss the processing process of different types of consultation process data.

Referring to FIG. 2, a flow chart of the steps of another embodiment of the voice-based data processing method of the present invention is shown. In this embodiment, the data of the consultation process is voice data. Specifically, the method may include the following steps:

Step 202: Obtain a consultation process data, where the consultation process data is voice data collected during the consultation process.

During the consultation process, voice data can be collected through the electronic device through various electronic devices, for example, recording audio data through a recording pen, a mobile phone, a computer, etc., and obtaining voice data collected during the consultation process, the voice data. The voice data that can be collected for one outpatient clinic, or the voice data collected by a doctor in multiple clinics, is not limited in this embodiment of the present invention. Therefore, the voice data includes voice data of a doctor, and voice data of at least one patient, and may also include voice data of at least one patient's family.

The step 104 is performed according to the data of the consultation process, and the corresponding first text data and the second text data are obtained, which may include the following steps 204-206.

Step 204: Separate the first voice data and the second voice data from the voice data according to the voiceprint feature.

Voiceprint refers to the spectrum of sound waves carrying speech information displayed by electroacoustic instruments. Voiceprints are characterized by specificity and stability. After adulthood, human voiceprints can remain relatively stable for a long time, so different people can be identified through voiceprints. Therefore, for the voice data, the voiceprint feature can be identified, and the voice segment corresponding to different users (voiceprint features) in the voice data is determined, thereby obtaining the first voice data of the target user and the second voice data of the other user.

The separating the first voice data and the second voice data from the voice data according to the voiceprint feature, comprising: dividing the voice data into a plurality of voice segments; and using the voice according to the voiceprint feature The segment determines the first voice data and the second voice data.

Specifically, the voice data can be divided into a plurality of voice segments. Wherein, according to the voice division rule, for example, the pause interval of the sound segment is divided; or according to the voiceprint feature, that is, the voiceprint feature corresponding to each sound is determined, thereby dividing the voice segment according to different voiceprint features. Therefore, one voice data can divide a plurality of voice segments, and each voice segment has a sequence of front and back, and different voice segments can have the same or different voiceprint features. Therefore, based on the voiceprint feature, whether each voice segment belongs to the first voice data or the second voice data is determined, and the voiceprint feature of each voice segment can be determined, and then multiple voices having the voiceprint feature of the target user are determined. The segments constitute the first voice data, and the other remaining voice segments constitute the second voice data.

In the embodiment of the present invention, before collecting the voice data in the consultation process, the doctor (target user) may first collect a piece of voice as the reference data, so as to identify the voiceprint feature of the doctor from the reference data, that is, the reference voiceprint. feature. In the embodiment of the present invention, a voice recognition model may also be set. After the voice data is input into the voice recognition model, the voice segment conforming to the reference voiceprint data may be separated from the voice segment of other voiceprint features, thereby obtaining each voice segment of the target user. And other user's voice clips. In the doctor's outpatient process, the medical record information is usually only included in one doctor, and there may be more than one patient, so that a corresponding large number of medical samples can be obtained for a specific doctor in the above manner.

In an optional embodiment of the present invention, the voiceprint feature of the target user may be collected in advance as a reference voiceprint feature, thereby dividing the voice data. That is, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: respectively matching each voice segment by using a reference voiceprint feature, wherein the reference voiceprint feature is a target user a voiceprint feature; acquiring a voice segment corresponding to the reference voiceprint feature to obtain a corresponding first voice data; and acquiring a voice segment that does not conform to the reference voiceprint feature, to obtain corresponding second voice data. That is, for the target user such as a doctor, the voice data may be collected in advance to extract the voiceprint feature, and the voiceprint feature of the target user is used as the reference voiceprint feature, so that for the voice data having the target user, the reference voiceprint feature may be used for each The voice segments are respectively matched to determine whether the voiceprint features in the voice segments are consistent with the reference voiceprint features. If they are consistent, the voice segments are considered to match the reference voiceprint features, and the voice segments are added to the first voice data (ie, the target) User-specific voice data). When the voiceprint feature in the voice segment does not match the reference voiceprint feature, the voice segment does not match the reference voiceprint feature, and the voice segment is added to the second voice data (ie, the voice data corresponding to the non-target user). That is, the first voice data and the second voice data are each composed of corresponding voice segments, wherein each voice segment also has a sequential relationship, thereby facilitating subsequent accurate determination of the consultation information.

In another optional embodiment of the present invention, the division of the voice data may also be performed by the number of voice segments corresponding to the same voiceprint feature in the voice data. That is, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, comprising: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature; The voiceprint feature having the largest number of voice segments generates the first voice data by using the voice segment corresponding to the voiceprint feature, wherein the largest number of voiceprint features are voiceprint features of the target user; and the voice data features that are not belonging to the first voice data are used. The speech segment generates second speech data. Based on the characteristics of the consultation process, the data of the consultation process may be the record data of a doctor's multiple outpatient clinics. Therefore, in this process, doctors often occupy more time to communicate with different patients and their families, that is, voice data. The Chinese doctor (target user) has the largest number of voices, so the target user and other users can be distinguished according to the number of corresponding voice segments of different users, and the first voice data and the second voice data are obtained. The voiceprint features in the voice segment can be identified, the voiceprint features included in each voice segment are determined, and then the number of voice segments corresponding to each voiceprint feature is separately counted, and the voiceprint having the largest number of voice segments is determined. a feature, the voiceprint feature is determined as a voiceprint feature of the target user, and the other voiceprint features are voiceprint features of other users, so that the voice segment having the voiceprint feature of the target user sequentially constitutes the first audio data, and the other The speech segments (i.e., the speech segments that do not belong to the first speech data) constitute the second audio data in order.

In the embodiment of the present invention, since voice data is collected in a scene of a multi-person conversation, a voice segment may include voiceprint features of a plurality of users. For the case of identifying multiple voiceprint features from a voice segment: when different voiceprint features appear at different times, if the voiceprint feature is a voiceprint feature of other users, the voice segment may be added to the In the two voice data; if the voiceprint feature includes the voiceprint feature of the target user and the voiceprint feature of other users, the voice segment may be further divided into sub-segments and added to the corresponding voice data. When different voiceprint features appear at the same time, that is, at least two users are speaking at the same time, if the voiceprint feature is a voiceprint feature of another user, the voice segment may be added to the second voice data. If the voiceprint feature includes the voiceprint feature of the target user and the voiceprint feature of other users, it may be divided according to requirements, for example, the voice segment is classified into the voice segment of the target user to obtain the first voice data, or the voice segment is returned. The second voice data is obtained for the voice segments of other users, or added separately in the voice data of the two users.

Step 206: Perform voice recognition on the first voice data and the second voice data respectively, and acquire corresponding first text data and second text data.

After the first voice data and the second voice data are acquired, the two voice data can be separately identified, thereby obtaining first text data of the target user and second text data of other users.

In an optional embodiment, performing voice recognition on the first voice data and the second voice data separately, and acquiring corresponding first text data and second text data, including: performing voice segments in the first voice data Performing speech recognition separately, generating first text data by using the recognized text segment; performing speech recognition on each speech segment in the second speech data, and generating second text data by using the recognized text segment. The text data corresponding to the voice segment can be obtained by the identification of each voice segment by the first voice data, so that the first text data is formed according to the sequence of the voice segments, and the second text data can also be obtained in a corresponding manner. Since the doctor's question and the patient's answer are sequential in the consultation process, when the voice data is divided into voice segments, the corresponding time sequence is recorded, and the obtained first text data and second text data are also in a sequential relationship. To facilitate accurate follow-up of consultation information.

Step 208: Obtain consultation information according to the first text data and the second text data.

According to the chronological order of the first text data and the second text data corresponding to the voice segments, each text segment in the first text data and each text segment in the second text data may be sorted according to a corresponding order, such as a time sequence, thereby obtaining Corresponding consultation information, the consultation information can record the doctor's question in a consultation and the corresponding patient (family)'s answer, as well as the doctor's diagnosis, medical advice and other information.

Step 210: Perform analysis on the consultation information to obtain a corresponding analysis result, and the analysis result is related to disease diagnosis.

After the consultation information is compiled, the embodiment of the present invention can also analyze the consultation information according to the requirements, and obtain the corresponding analysis result. Since the consultation is related to the diagnosis of the disease, the analysis result is also related to the diagnosis of the disease. Determined based on analytical needs.

For example, for each disease, statistics of doctors' common problems can be provided to doctors with less experience as a reference; analysis of consultation information can be carried out, and a TCM (Western Medicine) artificial intelligence question and answer system can be developed; statistics and analysis can also be performed. The symptoms, treatment methods, etc. corresponding to each disease are determined in the same manner.

Referring to FIG. 3, a flowchart of a step of a voice-based data processing method according to another embodiment of the present invention is shown. In this embodiment, the data of the consultation process is a text recognition result obtained by the voice data identification, and may specifically include The following steps:

Step 302: Acquire a text recognition result obtained by the voice data identification.

The voice data is collected during the consultation process, and the collected voice data is converted into the text recognition result by voice recognition, and the text recognition result can be directly obtained.

The step 104 is performed according to the data of the consultation process, and the corresponding first text data and the second text data are obtained, which may include the following step 304.

Step 304: Perform feature recognition on the text recognition result, and separate the first text data and the second text data according to the language feature.

For the data that has been identified as text, since it is unknown who said each paragraph, and can not directly serve as the consultation information, the embodiment of the present invention recognizes the words of different users from the text recognition result and organizes the consultation information. Among them, during the consultation process, the doctor usually asks the symptoms, and the user will reply to the symptoms, and the doctor will diagnose the disease, the required examination, the required medicine, etc., so that the characteristics can be identified from the text recognition result based on these characteristics. The doctor and patient statements are separated, and the first text data and the second text data are separated.

That is, the embodiment of the present invention can collect the text of the doctor's consultation and the text of the patient's consultation in advance, and collect the information of the examination for each analysis, thereby counting the language characteristics of the doctor (ie, the target user), and the patient and the patient The linguistic features of family members (ie other users) and the establishment of corresponding models to facilitate the differentiation of texts of different users based on the language features. Among them, a predetermined model can be established by determining the language features of different users by means of machine learning, probability statistics, and the like.

The embodiment of the present invention can obtain a large number of separated medical texts as training data, and the separated medical texts identify the medical information of the target user and other users, such as the textual information obtained historically based on the identification. The doctor content data (first text data of the target user) and the patient content data (second text data of other users) may be separately trained to obtain a doctor content model and a patient content model, and of course, the two models may be synthesized. A preset model based on which the doctor's statement and the patient's statement are recognized.

For example, in the case information obtained from the consultation, the doctor's content is generally a question with a symptomatic vocabulary, such as how you feel, what symptoms, what is uncomfortable, etc.; and the patient's content is generally symptomatic. Question of epidemic disease, for example, is it a cold, is it XX disease, etc. The contents of the doctor are usually statements with symptoms and medicines, for example, you have a cold, you can eat XX medicine and so on. Therefore, both the doctor's sentence content and the patient's sentence content have relatively significant language features, so the doctor content model and the patient content model can be trained according to the separated medical case information.

Performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature, comprising: dividing the text recognition result to obtain a corresponding text segment; and using the preset model to the text Identifying a segment, determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; generating a first text data using the text segment having the first language feature, and employing the second The text segment of the linguistic feature generates second text data. The text recognition result may be first divided, and the text recognition result may be divided into sentences according to Chinese sentence features, and multiple text segments may be divided according to other methods. Then, each text segment is sequentially input into a preset model, and the text segment is identified by the preset model, so that the language features of each text segment can be identified. Of course, the preset model can also be set to divide the user segment into the user based on the recognized language feature. Where the character of the target user is used as the first language feature and the language feature of the other user is used as the second language feature, the preset model may be used to determine that the text segment has the first language feature or the second language feature. The text segment having the first language feature may then be generated into the first text data in accordance with the division order of the text segments, and the second text data may be generated using the text segment having the second language feature.

Step 306: Obtain consultation information according to the first text data and the second text data.

Step 308, analyzing the consultation information to obtain a corresponding analysis result, and the analysis result is related to the disease diagnosis.

According to the order of the first text data and the second text data corresponding to the voice segments, each text segment in the first text data and each text segment in the second text data may be sorted according to a corresponding order, thereby obtaining corresponding consultation information. The consultation information can record the doctor's question in a consultation and the answer of the corresponding patient (family), as well as the doctor's diagnosis, medical advice and other information.

For example, for each disease, statistics of doctors' common problems can be provided to doctors with less experience as a reference; analysis of consultation information can be conducted, and a TCM (Western Medicine) artificial intelligence question and answer system can be developed; statistics and analysis can also be performed. The symptoms, treatment methods, etc. corresponding to each disease are determined in the same manner.

For doctors to record the habits and needs of medical records, based on the above scheme, the communication process with the patient can be recorded by means of recording, and then the doctor and patient statements can be separated, differentiated and arranged, and provided to the doctor in the form of dialogue. As a medical case, it can effectively reduce the time spent by doctors on medical records.

It should be noted that, for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present invention are not limited by the described action sequence, because In accordance with embodiments of the invention, certain steps may be performed in other sequences or concurrently. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required by the embodiments of the present invention.

Referring to FIG. 4, a structural block diagram of an embodiment of a voice-based data processing apparatus of the present invention is shown, which may specifically include the following modules:

The data acquisition module 402 is configured to obtain the data of the consultation process, and the data of the consultation process is determined according to the voice data collected during the consultation process.

The text identification module 404 is configured to perform identification according to the consultation process data, and obtain corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to Other users than the target user.

The information determining module 406 is configured to obtain the consultation information according to the first text data and the second text data.

The consultation process may have at least two users communicating and interacting, one user is a doctor, and other users are patients, family members, and the like. For example, according to the doctor's one-day clinic, it will include a doctor and multiple patients, and may also have one or more family members. Therefore, for the consultation record, the doctor can be the target user, the first text data is the doctor's corresponding consultation text data, and the text data of at least one other user is used as the second text data, that is, the patient and the family corresponding to the consultation. text data. Since the consultation is usually a question and answer process, the first text data and the second text data may be composed of a plurality of text segments, so that the consultation information may be obtained based on the time of the text segment and the corresponding user.

An example of a consultation message is as follows:

2017-4-23 10:23AM Doctor A: What are your symptoms? Patient B: I am not comfortable with XXX. Doctor A: Is there XXX? Patient B, there is...

In summary, for the consultation process data determined by the collection during the consultation process, the first text data and the second text data may be identified according to different users from the consultation process data, wherein the first text data Having a target user, the second text data belongs to other users than the target user, that is, can automatically distinguish the doctor and the patient's sentence during the consultation, and then according to the first text data and the second text data. Get the consultation information, be able to completely record the consultation process, automatically sort out the medical records, etc., and save the finishing time of the consultation records.

Referring to FIG. 5, a structural block diagram of an embodiment of a voice-based data processing apparatus of the present invention is shown, which may specifically include the following modules:

The consultation process data includes text recognition results obtained by voice data and/or voice data recognition.

The query process data is voice data; the text recognition module 404 can include:

The separating module 40402 is configured to separate the first voice data and the second voice data from the voice data according to the voiceprint feature.

The voice recognition module 40404 is configured to separately perform voice recognition on the first voice data and the second voice data, and acquire corresponding first text data and second text data.

The separating module 40402 is configured to divide the voice data into a plurality of voice segments; and according to the voiceprint feature, the voice segment is used to determine the first voice data and the second voice data.

Preferably, the separating module 40402 is configured to respectively match each of the voice segments by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user; and acquiring a reference voiceprint feature. The voice segment obtains corresponding first voice data; and acquires a voice segment that does not match the reference voiceprint feature, to obtain corresponding second voice data.

Preferably, the separating module 40402 is configured to identify voiceprint features of each voice segment; separately count the number of voice segments corresponding to each voiceprint feature, and determine the voiceprint feature having the largest number of voice segments, using the sound The voice segment corresponding to the pattern feature generates first voice data, wherein the largest number of voiceprint features are voiceprint features of the target user; and the second voice data is generated using voice segments that do not belong to the first voice data.

Based on the characteristics of the consultation process, the data through the consultation process may be the record data of a doctor's multiple outpatient clinics. Therefore, in this process, doctors often occupy more time to communicate with different patients and their families, that is, voice. The doctor (target user) has the largest number of voices in the data, so the target user and other users can be distinguished according to the number of corresponding voice segments of different users, and the first voice data and the second voice data are obtained.

In the embodiment of the present invention, since voice data is collected in a scene of a multi-person conversation, a voice segment may include voiceprint features of a plurality of users. The separation module 40402 may perform the following processing for identifying a plurality of voiceprint features from a voice segment: when different voiceprint features occur at different times, and if the voiceprint features are voiceprint features of other users, The voice segment may be added to the second voice data; if the voiceprint feature includes the voiceprint feature of the target user and other user's voiceprint features, the voice segment may be subdivided into sub-segments and added to the corresponding voice data. in. When different voiceprint features appear at the same time, that is, at least two users are speaking at the same time, if the voiceprint feature is a voiceprint feature of another user, the voice segment may be added to the second voice data. If the voiceprint feature includes the voiceprint feature of the target user and the voiceprint feature of other users, it may be divided according to requirements, for example, the voice segment is classified into the voice segment of the target user to obtain the first voice data, or the voice segment is returned. The second voice data is obtained for the voice segments of other users, or added separately in the voice data of the two users.

Preferably, the voice recognition module 40404 is configured to perform voice recognition on each voice segment in the first voice data, and generate first text data by using the recognized text segment; and voices in the second voice data. The segments respectively perform speech recognition, and the second text data is generated by using the recognized text segments. The information determining module 406 is configured to sort each text segment according to the time sequence of each text segment in the first text data and each text segment in the second text data, and obtain an inquiry. information.

Preferably, the inquiry process data is a text recognition result obtained by the voice data identification; the text recognition module 404 is configured to perform feature recognition on the text recognition result, and separate the first text data and the second according to the language feature. text data.

The text recognition module 404 includes:

The segment dividing module 40406 is configured to divide the text recognition result to obtain a corresponding text segment.

The segment identification module 40408 is configured to identify the text segment by using a preset model, and determine a language feature that the text segment has, the language feature including a first language feature and a second language feature.

The embodiment of the present invention can obtain a large number of separated medical texts as training data, and the separated medical texts identify the medical information of the target user and other users, such as the textual information obtained historically based on the identification. The doctor content data (first text data of the target user) and the patient content data (second text data of other users) may be separately trained to obtain a doctor content model and a patient content model, and of course, the two models may be synthesized. A preset model based on which the doctor's statement and the patient's statement are recognized. For example, in the case information obtained from the consultation, the doctor's content is generally a question with a symptomatic vocabulary, such as how you feel, what symptoms, what is uncomfortable, etc.; and the patient's content is generally symptomatic. Question of epidemic disease, for example, is it a cold, is it XX disease, etc. The contents of the doctor are usually statements with symptoms and medicines, for example, you have a cold, you can eat XX medicine and so on. Therefore, both the doctor's sentence content and the patient's sentence content have relatively significant language features, so the doctor content model and the patient content model can be trained according to the separated medical case information.

Preferably, the text generating module 40410 is configured to generate the first text data by using the text segment having the first language feature, and generate the second text data by using the text segment having the second language feature.

Preferably, the device further includes: an analysis module 408, configured to analyze the consultation information, and obtain a corresponding analysis result, where the analysis result is related to a disease diagnosis.

For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

FIG. 6 is a structural block diagram of an electronic device 600 for voice-based data processing, according to an exemplary embodiment. For example, the electronic device 600 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc., or a server device such as a server.

Referring to FIG. 6, the electronic device 600 can include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, and a sensor component 614. And communication component 616.

Processing component 602 typically controls the overall operation of electronic device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 602 can include one or more processors 620 to execute instructions to perform all or part of the steps described above. Moreover, processing component 602 can include one or more modules to facilitate interaction between component 602 and other components. For example, processing component 602 can include a multimedia module to facilitate interaction between multimedia component 608 and processing component 602.

Memory 604 is configured to store various types of data to support operation at device 600. Examples of such data include instructions for any application or method operating on electronic device 600, contact data, phone book data, messages, pictures, videos, and the like. The memory 604 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.

Power component 604 provides power to various components of electronic device 600. Power component 604 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 600.

The multimedia component 608 includes a screen between the electronic device 600 and a user that provides an output interface. In some embodiments, the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front camera and/or a rear camera. When the electronic device 600 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 610 is configured to output and/or input an audio signal. For example, the audio component 610 includes a microphone (MIC) that is configured to receive an external audio signal when the electronic device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in memory 604 or transmitted via communication component 616. In some embodiments, audio component 610 also includes a speaker for outputting an audio signal.

The I/O interface 612 provides an interface between the processing component 402 and the peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a launch button, and a lock button.

Sensor assembly 614 includes one or more sensors for providing electronic device 600 with a status assessment of various aspects. For example, sensor component 614 can detect an open/closed state of device 600, a relative positioning of components, such as the display and keypad of electronic device 600, and sensor component 614 can also detect a component of electronic device 600 or electronic device 600. The position changes, the presence or absence of contact of the user with the electronic device 600, the orientation or acceleration/deceleration of the electronic device 600, and the temperature change of the electronic device 600. Sensor assembly 614 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 614 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 616 is configured to facilitate wired or wireless communication between electronic device 600 and other devices. The electronic device 400 can access a wireless network based on a communication standard such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 614 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 614 also includes a near field communication (NFC) module to facilitate short range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, electronic device 600 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), A gated array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the above methods.

In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium comprising instructions, such as a memory 604 comprising instructions executable by processor 620 of electronic device 400 to perform the above method. For example, the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.

A non-transitory computer readable storage medium, when instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform a voice-based data processing method, the method comprising: obtaining a consultation Process data, the consultation process data is determined according to the voice data collected during the consultation process; the identification is performed according to the consultation process data, and the corresponding first text data and second text data are acquired, wherein the first text The data belongs to a target user, and the second text data belongs to other users than the target user; according to the first text data and the second text data, the consultation information is obtained.

Optionally, the consultation process data includes a text recognition result obtained by the voice data and/or the voice data identification.

Optionally, the separating, according to the voiceprint feature, the first voice data and the second voice data from the voice data, including: dividing the voice data into multiple voice segments; according to the voiceprint feature, adopting The speech segment determines the first speech data and the second speech data.

Optionally, determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining the voiceprint feature having the largest number of voice segments, generating the first voice data using the voice segment corresponding to the voiceprint feature, and generating the second voice data using the voice segment not belonging to the first voice data.

Optionally, performing voice recognition on the first voice data and the second voice data to obtain corresponding first text data and second text data, including: performing voice separately on each voice segment in the first voice data Identifying, generating the first text data by using the recognized text segment; separately performing speech recognition on each of the second speech data segments, and generating the second text data by using the recognized text segment.

Optionally, the consultation process data is a text recognition result obtained by the voice data identification; the identifying according to the consultation process data, acquiring corresponding first text data and second text data, including: The text recognition result performs feature recognition, and the first text data and the second text data are separated according to the language feature.

Optionally, performing feature recognition on the text recognition result, and separating the first text data and the second text data according to the language feature, including: dividing the text recognition result to obtain a corresponding text segment; using a preset model Identifying the text segment, determining a language feature of the text segment, the language feature comprising a first language feature and a second language feature; generating a first text data using the text segment having the first language feature, and The second text data is generated using the text segment having the second language feature.

Optionally, the method further includes: analyzing the consultation information, and obtaining a corresponding analysis result, where the analysis result is related to the disease diagnosis.

FIG. 7 is a schematic structural diagram of an electronic device 700 for voice-based data processing according to another exemplary embodiment of the present invention. The electronic device 700 can be a server that can vary considerably depending on configuration or performance, and can include one or more central processing units (CPUs) 722 (eg, one or more processors) And memory 732, one or more storage media 730 storing application 742 or data 744 (eg, one or one storage device in Shanghai). Among them, the memory 732 and the storage medium 730 may be short-term storage or persistent storage. The program stored on storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations in the server. Still further, central processor 722 can be configured to communicate with storage medium 730, executing a series of instruction operations in storage medium 730 on the server.

The server may also include one or more power sources 726, one or more wired or wireless network interfaces 750, one or more input and output interfaces 758, one or more keyboards 756, and/or one or more operating systems 741, For example, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

In an exemplary embodiment, the server is configured to execute, by one or more central processors 722, one or more programs including instructions for: obtaining consultation process data, the consultation process data being based on the consultation Determining the voice data collected in the process; performing identification according to the consultation process data, and acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data It belongs to other users than the target user; according to the first text data and the second text data, the consultation information is obtained.

Optionally, the determining, according to the voiceprint feature, the first voice data and the second voice data by using the voice segment, including: identifying voiceprint features of each voice segment; and counting the number of voice segments corresponding to each voiceprint feature Determining the voiceprint feature having the largest number of voice segments, generating the first voice data using the voice segment corresponding to the voiceprint feature, and generating the second voice data using the voice segment not belonging to the first voice data.

Optionally, executing, by the one or more processors 522, the one or more programs includes instructions for performing the following operations: analyzing the medical consultation information to obtain a corresponding analysis result, the analysis result Related to the diagnosis of the disease.

The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.

Those skilled in the art will appreciate that embodiments of the embodiments of the invention may be provided as a method, apparatus, or computer program product. Thus, embodiments of the invention may be in the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, embodiments of the invention may take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Embodiments of the invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed above provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.

While a preferred embodiment of the present invention has been described, it will be apparent that those skilled in the art can make further changes and modifications to the embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and

Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, or terminal device that includes a plurality of elements includes not only those elements but also Other elements that are included, or include elements inherent to such a process, method, article, or terminal device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.

The above is a detailed description of a corpus extraction method, a corpus extraction device and an electronic device provided by the present invention. The principle and implementation of the present invention are described in the following. The description is only for helping to understand the method of the present invention and its core idea; at the same time, for those skilled in the art, according to the idea of the present invention, there will be changes in specific embodiments and application scopes. The contents of this specification are not to be construed as limiting the invention.

Claims

A voice-based data processing method, comprising:

Obtaining the data of the consultation process, and the data of the consultation process is determined according to the voice data collected during the consultation process;

Identifying, according to the consultation process data, acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target user Other users;

According to the first text data and the second text data, the consultation information is obtained.
The method according to claim 1, wherein the inquiry process data is voice data;

And determining, according to the data of the consultation process, the corresponding first text data and the second text data, including:

Separating the first voice data and the second voice data from the voice data according to the voiceprint feature;

Performing voice recognition on the first voice data and the second voice data respectively, and acquiring corresponding first text data and second text data.
The method according to claim 2, wherein the separating the first voice data and the second voice data from the voice data according to the voiceprint feature comprises:

Dividing the voice data into a plurality of voice segments;

The first voice data and the second voice data are determined using the voice segment according to the voiceprint feature.
The method according to claim 3, wherein the determining the first voice data and the second voice data by using the voice segment according to the voiceprint feature comprises:

Each of the voice segments is matched by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user;

Acquiring a voice segment corresponding to the reference voiceprint feature to obtain corresponding first voice data;

Obtaining a voice segment that does not match the reference voiceprint feature, and obtaining corresponding second voice data.
The method according to claim 3, wherein the determining the first voice data and the second voice data by using the voice segment according to the voiceprint feature comprises:

Identifying the voiceprint features of each voice segment;

Counting the respective voiceprint features corresponding to the number of voice segments;

Determining a voiceprint feature having the largest number of voice segments, and generating first voice data by using the voice segment corresponding to the voiceprint feature;

The second voice data is generated using a voice segment that does not belong to the first voice data.
The method according to claim 2, wherein the performing the voice recognition on the first voice data and the second voice data respectively, and acquiring the corresponding first text data and second text data, includes:

Performing voice recognition on each voice segment in the first voice data, and generating first text data by using the recognized text segment;

Performing voice recognition on each voice segment in the second voice data, and generating second text data by using the recognized text segment;

Then, according to the first text data and the second text data, obtaining the consultation information, including:

And according to the time sequence of each of the text segments in the first text data and the text segments in the second text data respectively corresponding to the voice segments, the text segments are sorted to obtain the consultation information.
The method according to claim 1, wherein the inquiry process data is a text recognition result obtained by voice data recognition;

And determining, according to the data of the consultation process, the corresponding first text data and the second text data, including:

Characterizing the text recognition result, separating the first text data and the second text data according to the language feature.
The method according to claim 7, wherein the character recognition result is characterized, and the first text data and the second text data are separated according to the language feature, including:

Dividing the text recognition result to obtain a corresponding text segment;

Identifying the text segment by using a preset model, and determining a language feature of the text segment, the language feature including a target user language feature and a non-target user language feature;

The first text data is generated using a text segment having a target user language feature, and the second text data is generated using a text segment having a non-target user language feature.
A voice-based data processing device, comprising:

a data acquisition module, configured to obtain the data of the consultation process, wherein the data of the consultation process is determined according to the voice data collected during the consultation process;

a text recognition module, configured to perform identification according to the consultation process data, and obtain corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to Other users than the target user;

The information determining module is configured to obtain the consultation information according to the first text data and the second text data.
The device according to claim 9, wherein the consultation process data is voice data;

The text recognition module includes:

a separating module, configured to separate the first voice data and the second voice data from the voice data according to the voiceprint feature;

The voice recognition module is configured to separately perform voice recognition on the first voice data and the second voice data, and acquire corresponding first text data and second text data.
The device of claim 10 wherein:

The separating module is configured to divide the voice data into a plurality of voice segments; and according to the voiceprint feature, the voice segment is used to determine the first voice data and the second voice data.
The device of claim 11 wherein:

The separating module is configured to respectively match each of the voice segments by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user; and acquiring an audio segment corresponding to the reference voiceprint feature, Corresponding first voice data; acquiring an audio segment that does not match the reference voiceprint feature, to obtain corresponding second voice data.
The device of claim 11 wherein:

The separating module is configured to identify the voiceprint features of each voice segment; separately count the voice segments having the same voiceprint feature and their numbers, and generate the second voice data by using the largest number of voice segments, wherein the largest number of voiceprints The feature is a voiceprint feature of the target user; the second voice data is generated using the remaining voice segments.
The device of claim 10 wherein:

The voice recognition module is configured to separately perform voice recognition on each voice segment in the first voice data, generate first text data by using the recognized text segment, and perform voice separately on each voice segment in the second voice data. Identifying, generating the second text data by using the recognized text segment;

The information determining module is configured to sort the text segments according to the time sequence of each of the text segments in the first text data and the text segments in the second text data, respectively, to obtain the consultation information.
The device according to claim 9, wherein the inquiry process data is a text recognition result obtained by voice data recognition;

The text recognition module is configured to perform feature recognition on the text recognition result, and separate the first text data and the second text data according to the language feature.
The device according to claim 15, wherein the text recognition module comprises:

a segment dividing module, configured to divide the text recognition result to obtain a corresponding text segment;

a segment identification module, configured to identify the text segment by using a preset model, and determine a language feature that the text segment has, the language feature including a first language feature and a second language feature;

And a text generating module, configured to generate the first text data by using the text segment having the first language feature, and generate the second text data by using the text segment having the second language feature.
A readable storage medium, wherein when the instructions in the storage medium are executed by a processor of an electronic device, enabling the electronic device to perform voice-based as described in one or more of the method claims 1-8 Data processing method.
An electronic device, comprising: a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to execute the one or more by one or more processors The program contains instructions for doing the following:

Obtaining the data of the consultation process, and the data of the consultation process is determined according to the voice data collected during the consultation process;

Identifying, according to the consultation process data, acquiring corresponding first text data and second text data, wherein the first text data belongs to a target user, and the second text data belongs to the target user Other users;

According to the first text data and the second text data, the consultation information is obtained.
The electronic device according to claim 18, wherein the inquiry process data is voice data;

And determining, according to the data of the consultation process, the corresponding first text data and the second text data, including:

Separating the first voice data and the second voice data from the voice data according to the voiceprint feature;

Performing voice recognition on the first voice data and the second voice data respectively, and acquiring corresponding first text data and second text data.
The electronic device according to claim 19, wherein the separating the first voice data and the second voice data from the voice data according to the voiceprint feature comprises:

Dividing the voice data into a plurality of voice segments;

The first voice data and the second voice data are determined using the voice segment according to the voiceprint feature.
The electronic device according to claim 20, wherein the determining the first voice data and the second voice data by using the voice segment according to the voiceprint feature comprises:

Each of the voice segments is matched by using a reference voiceprint feature, wherein the reference voiceprint feature is a voiceprint feature of the target user;

Acquiring a voice segment corresponding to the reference voiceprint feature to obtain corresponding first voice data;

Obtaining a voice segment that does not match the reference voiceprint feature, and obtaining corresponding second voice data.
The electronic device according to claim 20, wherein the determining the first voice data and the second voice data by using the voice segment according to the voiceprint feature comprises:

Identifying the voiceprint features of each voice segment;

Counting the number of voice segments corresponding to each voiceprint feature;

Determining a voiceprint feature having the largest number of voice segments, and generating first voice data by using the voice segment corresponding to the voiceprint feature;

The second voice data is generated using a voice segment that does not belong to the first voice data.
The electronic device according to claim 19, wherein the performing the voice recognition on the first voice data and the second voice data to obtain the corresponding first text data and the second text data comprises:

Performing voice recognition on each voice segment in the first voice data, and generating first text data by using the recognized text segment;

Performing voice recognition on each voice segment in the second voice data, and generating second text data by using the recognized text segment;

Then, according to the first text data and the second text data, obtaining the consultation information, including:

And according to the time sequence of each of the text segments in the first text data and the text segments in the second text data respectively corresponding to the voice segments, the text segments are sorted to obtain the consultation information.
The electronic device according to claim 18, wherein the inquiry process data is a text recognition result obtained by the voice data recognition;

And determining, according to the data of the consultation process, the corresponding first text data and the second text data, including:

Characterizing the text recognition result, separating the first text data and the second text data according to the language feature.
The electronic device according to claim 24, wherein the character recognition result is subjected to feature recognition, and the first text data and the second text data are separated according to the language feature, including:

Dividing the text recognition result to obtain a corresponding text segment;

Identifying the text segment by using a preset model, and determining a language feature of the text segment, the language feature including a target user language feature and a non-target user language feature;

The first text data is generated using a text segment having a target user language feature, and the second text data is generated using a text segment having a non-target user language feature.