CN108962253A

CN108962253A - A kind of voice-based data processing method, device and electronic equipment

Info

Publication number: CN108962253A
Application number: CN201710384412.3A
Authority: CN
Inventors: 李明修; 银磊; 卜海亮
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2018-12-07
Also published as: WO2018214663A1

Abstract

The embodiment of the present invention provides a kind of voice-based data processing method, device and electronic equipment, completely to record interrogation process.The method includes: to obtain interrogation process data, and the interrogation process data is determined according to the voice data acquired during interrogation；It is identified according to the interrogation process data, obtains corresponding first text data and the second text data, wherein first text data belongs to a target user, and second text data belongs to the other users in addition to the target user；According to first text data and the second text data, interrogation information is obtained.Using the embodiment of the present invention, can during automatic distinguishing interrogation doctor, patient sentence, complete to record interrogation process, automatic arranging obtains the contents such as case, saves the finishing time of interrogation record.

Description

A kind of voice-based data processing method, device and electronic equipment

Technical field

The present invention relates to technical fields, set more particularly to a kind of voice-based data processing method, device and electronics It is standby.

Background technique

Speech recognition is usually to convert speech into text, and traditional speech recognition equipments of recording can only turn voice data It is changed to corresponding text, and speaker cannot be distinguished.It therefore, can not be effective by speech recognition in the case where multi-person speech It is recorded.

Such as in the practical diagnosis and treatment process of hospital, at least have two people and exchange, i.e., at least have doctor and patient into Row exchange, is also possible that family numbers of patients etc. sometimes, and cannot achieve by existing voice identification facility and ask the voice of acquisition It examines and records corresponding voice producer and distinguish, can not comprehensively record entire interrogation process.

Summary of the invention

The embodiment of the present invention provides a kind of voice-based data processing method, completely to record interrogation process.

Correspondingly, the embodiment of the invention also provides a kind of voice-based data processing equipments, a kind of electronic equipment, one Kind readable storage medium storing program for executing, to guarantee the implementation and application of the above method.

To solve the above-mentioned problems, the embodiment of the invention discloses a kind of voice-based data processing methods, comprising: obtains Interrogation process data is taken, the interrogation process data is determined according to the voice data acquired during interrogation；According to the interrogation Process data is identified, obtains corresponding first text data and the second text data, wherein the first text data category In a target user, second text data belongs to the other users in addition to the target user；According to described first Text data and the second text data, obtain interrogation information.

Optionally, the interrogation process data is voice data；It is described to be identified according to the interrogation process data, it obtains Take corresponding first text data and the second text data, comprising: according to vocal print feature, the is isolated from the voice data One voice data and second speech data；Speech recognition is carried out to first voice data and second speech data respectively, is obtained Take corresponding first text data and the second text data.

Optionally, described according to vocal print feature, the first voice data and the second voice are isolated from the voice data Data, comprising: the voice data is divided into multiple sound bites；According to vocal print feature, determined using the sound bite First voice data and second speech data.

Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite According to, comprising: each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print feature is target The vocal print feature of user；The sound bite being consistent with the benchmark vocal print feature is obtained, corresponding first voice data is obtained；It obtains The sound bite not being consistent with the benchmark vocal print feature is taken, corresponding second speech data is obtained.

Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite According to, comprising: the vocal print feature of each sound bite is identified；Count the quantity that each vocal print feature corresponds to sound bite；It determines The maximum vocal print feature of quantity with sound bite generates the first voice number using the corresponding sound bite of the vocal print feature According to；Second speech data is generated using the sound bite for being not belonging to first voice data.

Optionally, described that speech recognition is carried out respectively to first voice data and second speech data, it obtains and corresponds to The first text data and the second text data, comprising: voice is carried out respectively to each sound bite in first voice data Identification generates the first text data using the text fragments that identification obtains；To each sound bite in the second speech data point Not carry out speech recognition, generate the second text data using the obtained text fragments of identification；Then, described according to first text Data and the second text data, obtain interrogation information, comprising: according to each text fragments in first text data and described Each text fragments respectively correspond the time sequencing of sound bite in two text datas, are ranked up, are asked to each text fragments Examine information.

Optionally, the interrogation process data is the text identification result that voice data identifies；Described in the foundation Interrogation process data is identified, obtains corresponding first text data and the second text data, comprising: to the text identification As a result feature identification is carried out, isolates the first text data and the second text data according to language feature.

Optionally, feature identification is carried out to the text identification result, isolates the first text data according to language feature With the second text data, comprising: divided to the text identification result, obtain corresponding text fragments；Using default mould Type identifies the text fragments, determines the language feature that the text fragments have, the language feature includes target User language feature and non-targeted user language feature；The first text is generated using the text fragments with target user's language feature Notebook data, and, the second text data is generated using the text fragments with non-targeted user language feature.

The embodiment of the invention also discloses a kind of voice-based data processing equipments, comprising: data acquisition module is used for Interrogation process data is obtained, the interrogation process data is determined according to the voice data acquired during interrogation；Text identification mould Block, for being identified according to the interrogation process data, corresponding first text data of acquisition and the second text data, In, first text data belongs to a target user, and second text data belongs in addition to the target user Other users；Information determination module, for obtaining interrogation information according to first text data and the second text data.

Optionally, the interrogation process data is voice data；The text identification module, comprising: separation submodule is used According to vocal print feature, the first voice data and second speech data are isolated from the voice data；Speech recognition submodule Block obtains corresponding first textual data for carrying out speech recognition respectively to first voice data and second speech data According to the second text data.

Optionally, the separation submodule, for the voice data to be divided into multiple sound bites；It is special according to vocal print Sign, determines the first voice data and second speech data using the sound bite.

Optionally, the separation submodule, for being matched respectively using benchmark vocal print feature to each sound bite, In, the benchmark vocal print feature is the vocal print feature of target user；The audio fragment being consistent with the benchmark vocal print feature is obtained, Obtain corresponding first voice data；The audio fragment not being consistent with the benchmark vocal print feature is obtained, obtains corresponding second Voice data.

Optionally, the separation submodule, identifies for the vocal print feature to each sound bite；Statistics has respectively The sound bite and its quantity of identical vocal print feature generate second speech data using quantity maximum sound bite, wherein quantity Maximum vocal print feature is the vocal print feature of target user；Second speech data is generated using remaining sound bite.

Optionally, the speech recognition submodule, for being carried out respectively to each sound bite in first voice data Speech recognition generates the first text data using the text fragments that identification obtains；To each voice sheet in the second speech data Section carries out speech recognition respectively, generates the second text data using the text fragments that identification obtains；The information determination module is used Each text fragments respectively correspond voice in each text fragments and second text data according to first text data The time sequencing of segment is ranked up each text fragments, obtains interrogation information.

Optionally, the interrogation process data is the text identification result that voice data identifies；The text identification Module isolates the first text data and second according to language feature for carrying out feature identification to the text identification result Text data.

Optionally, the text identification module, comprising: segment changes molecular modules, for the text identification result into Row divides, and obtains corresponding text fragments；Segment identifies submodule, for being known using preset model to the text fragments Not, determine that the language feature that the text fragments have, the language feature include first language feature and second language feature； Text generation submodule for use there are the text fragments of first language feature to generate the first text data, and, using tool There are the text fragments of second language feature to generate the second text data.

The embodiment of the invention also discloses a kind of readable storage medium storing program for executing, when the instruction in the storage medium is by electronic equipment Processor execute when so that electronic equipment be able to carry out it is voice-based as described in one or more in the embodiment of the present invention Data processing method.

Optionally, a kind of electronic equipment, includes memory and one or more than one program, one of them Perhaps more than one program is stored in memory and is configured to be executed by one or more than one processor one Or more than one program includes the instruction for performing the following operation: obtaining interrogation process data, the interrogation process data It is determined according to the voice data acquired during interrogation；It is identified according to the interrogation process data, obtains corresponding first Text data and the second text data, wherein first text data belongs to a target user, second text data Belong to the other users in addition to the target user；According to first text data and the second text data, interrogation is obtained Information.

The embodiment of the present invention includes following advantages:

The interrogation process data that the embodiment of the present invention can be determined during interrogation by acquisition voice, can be from interrogation Number of passes identifies the first text data and the second text data according to different user in, wherein the first text data category In a target user, second text data belongs to the other users in addition to the target user, can automatic area The sentence of doctor, patient during point interrogation, then according to first text data and the second text data, obtain interrogation letter Breath can completely record interrogation process, and automatic arranging obtains the contents such as case, save the finishing time of interrogation record.

Detailed description of the invention

Fig. 1 is a kind of step flow chart of voice-based data processing method embodiment of the invention；

Fig. 2 is the step flow chart of the voice-based data processing method embodiment of another kind of the invention；

Fig. 3 is the step flow chart of another voice-based data processing method embodiment of the invention；

Fig. 4 is a kind of structural block diagram of voice-based data processing equipment embodiment of the invention；

Fig. 5 is the structural block diagram of the voice-based data processing equipment embodiment of another kind of the invention；

Fig. 6 is that a kind of present invention electronics for voice-based data processing shown according to an exemplary embodiment is set Standby structural block diagram；

Fig. 7 is a kind of electronic equipment for voice-based data processing that the present invention is shown according to another exemplary embodiment Structural schematic diagram.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.

Referring to Fig.1, a kind of step flow chart of voice-based data processing method embodiment of the invention is shown, is had Body may include steps of:

Step 102, interrogation process data is obtained, the interrogation process data is according to the voice data acquired during interrogation It determines.

During interrogation, voice collecting, the language based on acquisition can be carried out to the interrogation process by various electronic equipments Sound data obtain interrogation process data, i.e. the interrogation process data can be the voice data of acquisition, also may be based on the language of acquisition The text identification result that sound data conversion obtains.To the embodiment of the present invention can using various interrogation processes acquisition data into Row identification.

Step 104, it is identified according to the interrogation process data, obtains corresponding first text data and the second text Data, wherein first text data belongs to a target user, and second text data belongs to except the target user Except other users.

The interrogation process data can be identified, the difference according to data type uses different recognition methods, such as Voice data can be handled by modes such as vocal print feature, speech recognitions, text data can be identified by text feature, To obtain the first text data and the second text data distinguished according to user.Wherein, can have at least during the interrogation Two users carry out communication interaction, and a user is doctor, and other users are patient, family numbers of patients etc..E.g. according to doctor The acquisition of outpatient service in one day, then it wherein will include a doctor and several patients, it is also possible to have one or several family numbers of patients.Therefore Can be by doctor's behaviours target user for interrogation record, then the first text data is the corresponding interrogation text data of doctor, and Using the text data of at least one other users as the second text data, the i.e. corresponding interrogation text data of patient and family members.

Step 106, according to first text data and the second text data, interrogation information is obtained.

Since interrogation is usually the process of question and answer, above-mentioned first text data and the second text data can be and pass through What multiple text fragments were constituted, therefore the time based on text fragments interrogation information can be obtained with corresponding user.

Such as a kind of example of interrogation information is as follows:

2017-4-23 10:23AM

What symptom do doctor A: you have?

Patient B: my XXX is uncomfortable.

Doctor A: either with or without XXX?

Patient B: have.

……

In actual treatment, it may also be combined with outpatient service record of hospital etc. and obtain patient information, to be distinguished in interrogation information Different patient etc. out.

In conclusion for the interrogation process data determined during interrogation by acquisition voice, it can be from interrogation process The first text data and the second text data are identified according to different user in data, wherein first text data belongs to One target user, second text data belong to the other users in addition to the target user, can automatic distinguishing The sentence of doctor, patient during interrogation, then according to first text data and the second text data, interrogation information is obtained, Interrogation process can be completely recorded, automatic arranging obtains the contents such as case, saves the finishing time of interrogation record.

In the embodiment of the present invention, interrogation process data includes the text knowledge that voice data and/or voice data identify Other result.The recognition methods of different types of interrogation process data is different, therefore the embodiment of the present invention discusses different type respectively The treatment process of interrogation process data.

Referring to Fig. 2, the step flow chart of the voice-based data processing method embodiment of another kind of the invention is shown, In the embodiment, the interrogation process data is voice data；It can specifically include following steps:

Step 202, interrogation process data is obtained, the interrogation process data is the voice data of acquisition during interrogation.

During interrogation, the acquisition of voice data can be carried out to the interrogation process by various electronic equipments, such as logical The equipment recording audio data such as recording pen, mobile phone, computer are crossed, the voice data acquired during interrogation, the voice number are obtained It can also be the voice data that a doctor acquires in multiple outpatient service, the present invention according to the voice data that can be primary outpatient service acquisition Embodiment to this with no restriction.It therefore include the voice data of a doctor and the language of at least one patient in the voice data Sound data may also include the voice data of at least one family numbers of patients.

Wherein, above-mentioned steps 104 are identified according to the interrogation process data, obtain corresponding first text data and Second text data, it may include following steps 204-206.

Step 204, according to vocal print feature, the first voice data and the second voice number are isolated from the voice data According to.

Vocal print (Voiceprint) refers to the sound wave spectrum for the carrying verbal information that electricity consumption acoustic instrument is shown.Vocal print tool There is the feature of specificity and stability.After adult, the vocal print of people can keep stablizing relatively for a long time constant, therefore can pass through vocal print Identify different people.Therefore, it for voice data, can be identified by vocal print feature, determine different user in the voice data (vocal print feature) corresponding sound bite, to obtain the first voice data of target user and the second voice number of other users According to.

Wherein, described according to vocal print feature, the first voice data and the second voice number are isolated from the voice data According to, comprising: the voice data is divided into multiple sound bites；According to vocal print feature, the is determined using the sound bite One voice data and second speech data.

Specifically, voice data can be divided into multiple sound bites.It wherein, can be according to voice division rule, such as sound Dwell interval between segment is divided；The corresponding vocal print feature of each sound can also be determined, thus foundation according to vocal print feature Different vocal print features divides sound bite.Therefore a voice data can mark off multiple sound bites, between each sound bite With tandem, different sound bites can have identical or different vocal print feature.Therefore also true based on vocal print feature Fixed each sound bite belongs to the first voice data or second speech data, that is, can determine that sound possessed by each sound bite Then multiple sound bites of vocal print feature with target user are constituted the first voice data by line feature, by other residues Sound bite constitute second speech data.

In the embodiment of the present invention, during to interrogation before the acquisition of voice data, doctor (target user) can first be acquired One section of voice is as reference data, in order to identify the vocal print feature i.e. benchmark vocal print feature of doctor from the reference data. Speech recognition modeling can also be set in the embodiment of the present invention, after voice data is inputted the speech recognition modeling, can will be met The sound bite of benchmark voice print database is separated with the sound bite of other vocal print features, to obtain each voice sheet of target user The sound bite of section and other users.In doctor's outpatient procedures, a doctor is usually only included in the case information of composition, and is suffered from Person may have it is multiple, so that its corresponding a large amount of case sample can be obtained for some particular doctor through the above way.

In an alternative embodiment of the invention, the vocal print feature of target user can be acquired in advance, as benchmark vocal print feature, To carry out the division of voice data.It is i.e. described according to vocal print feature, using the sound bite determine the first voice data and Second speech data, comprising: each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print Feature is the vocal print feature of target user；The sound bite being consistent with the benchmark vocal print feature is obtained, obtains corresponding first Voice data；The sound bite not being consistent with the benchmark vocal print feature is obtained, corresponding second speech data is obtained.I.e. for Target user such as doctor can acquire in advance its voice data to extract vocal print feature, using the vocal print feature of target user as base Quasi- vocal print feature, so that the benchmark vocal print feature can be used to each sound bite point for the voice data with target user It is not matched, determines that vocal print feature and benchmark vocal print are characterized in no consistent in each sound bite, think the language if consistent Tablet section and benchmark vocal print characteristic matching, are added to the first voice data (as corresponding language of target user for the sound bite Sound data) in.After vocal print feature in sound bite and benchmark vocal print feature are inconsistent, the sound bite and benchmark vocal print feature It mismatches, which is added in second speech data (as non-targeted user corresponding voice data).I.e. first Voice data and second speech data are made of corresponding sound bite, wherein each sound bite also has ordinal relation, from And it is convenient for subsequent accurate determining interrogation information.

In another alternative embodiment of the invention, sound bite can also be corresponded to by vocal print feature identical in voice data Quantity carries out the division of voice data.The i.e. described foundation vocal print feature, determines the first voice data using the sound bite And second speech data, comprising: the vocal print feature of each sound bite is identified；It counts each vocal print feature and corresponds to sound bite Quantity；Determine the maximum vocal print feature of quantity with sound bite, it is raw using the corresponding sound bite of the vocal print feature At the first voice data, wherein the maximum vocal print feature of quantity is the vocal print feature of target user；Using being not belonging to the first voice The sound bite of data generates second speech data.Based on the characteristic of interrogation process, interrogation process data may be a doctor The record data of multiple outpatient service, therefore, in this process doctor often occupy more time and different patients and its Family members exchange interrogation, i.e., the voice quantity of doctor (target user) is most in voice data, therefore can be corresponding according to different user The amount field partial objectives for user of sound bite and other users, and obtain the first voice data and second speech data.It can be right Vocal print feature in the sound bite is identified, is determined the vocal print feature that each sound bite is included, is then counted respectively Each vocal print feature corresponds to the quantity of sound bite, and determining has the maximum vocal print feature of quantity of sound bite, by the sound Line feature is determined as the vocal print feature of target user, other vocal print features are the vocal print feature of other users, to will have mesh The sound bite for marking the vocal print feature of user constitutes the first audio data in sequence, and other sound bites (are not belonging to the The sound bite of one voice data) second audio data is constituted in sequence.

In the embodiment of the present invention, since voice data is acquired in the scene of multi-conference, a voice sheet It may include the vocal print feature of multiple users in section.The case where for identifying multiple vocal print features from a sound bite: It, can be by the language if vocal print feature is the vocal print feature of other users when different vocal print features are that occur in different time Tablet section is added in second speech data；And if vocal print feature includes the vocal print feature of target user and the vocal print of other users Feature can will then be added in corresponding voice data after the subdivided sub-piece of the sound bite.When different vocal print features be What the same time occurred, i.e., the same time has at least two users speaking, if then vocal print feature is the vocal print of other users The sound bite can be added in second speech data by feature, and if vocal print feature include target user vocal print feature and The vocal print feature of other users can divide according to demand, such as the sound bite that the sound bite is classified as target user is come To the first voice data, which is perhaps classified as to the sound bite of other users come obtain second speech data or It is added respectively in the voice data of two kinds of users.

Step 206, speech recognition carried out to first voice data and second speech data respectively, obtains corresponding the One text data and the second text data.

After getting the first voice data and second speech data, two kinds of voice data can be identified respectively, from And obtain the first text data of target user and the second text data of other users.

In one alternative embodiment, speech recognition is carried out to first voice data and second speech data respectively, is obtained Take corresponding first text data and the second text data, comprising: to each sound bite in first voice data respectively into Row speech recognition generates the first text data using the text fragments that identification obtains；To each voice in the second speech data Segment carries out speech recognition respectively, generates the second text data using the text fragments that identification obtains.The first voice can be passed through Identification of the data to each sound bite obtains the corresponding text data of the sound bite, thus the sequence according to sound bite The first text data is constituted, the second text data also can be obtained using corresponding mode.Due to during interrogation the problem of doctor Answer with patient is all sequential, therefore corresponding time sequencing is recorded when voice data is divided into sound bite, Obtained the first text data and the second text data is also to have ordinal relation, is convenient for subsequent accurate arrangement interrogation information.

Step 208, according to first text data and the second text data, interrogation information is obtained.

The time sequencing that sound bite is corresponded to according to the first text data and the second text data, can be by the first text data In each text fragments in each text fragments and the second text data, be ranked up according to corresponding sequence, such as time sequencing, thus Corresponding interrogation information is obtained, can record doctor in the interrogation information in the problems in interrogation and respective patient (family members) Answer and doctor the various information such as diagnosis, doctor's advice.

Step 210, the interrogation information is analyzed, is analyzed accordingly as a result, the analysis result and disease Diagnosis is related.

After sorting out interrogation information, the embodiment of the present invention can also be analyzed interrogation information according to demand, obtain phase The analysis answered as a result, due to interrogation be it is relevant to medical diagnosis on disease, the analysis result is also related to medical diagnosis on disease, specifically according to It is determined according to analysis demand.

For example, the common problem of doctor can be counted to every kind of disease, it is supplied to the less doctor's behaviours reference of experience； Interrogation information can be analyzed, develop Chinese medicine (doctor trained in Western medicine) artificial intelligence question answering system etc.；It can also be by counting, analyzing Etc. modes determine the corresponding symptom of every kind of disease, treatment method etc..

Referring to Fig. 3, the step flow chart of another voice-based data processing method embodiment of the invention is shown, In the present embodiment, the interrogation process data is the text identification that identifies of voice data as a result, can specifically include as follows Step:

Step 302, the text identification result that voice data identifies is obtained.

The voice data is that interrogation collects in the process, and the voice data collected is obtained by speech recognition conversion To text recognition result, text recognition result can be directly acquired.

Wherein, above-mentioned steps 104 are identified according to the interrogation process data, obtain corresponding first text data and Second text data, it may include following steps 304.

Step 304, feature identification is carried out to the text identification result, isolates the first text data according to language feature With the second text data.

It, can not be directly as asking since unknown every section words are which people says for being identified as the data of text Information is examined, therefore, if the embodiment of the present invention identifies different user from text identification result and arranges interrogation information.Its In, during interrogation, doctor would generally put question to symptom, and user can reply Symptoms, consultation of doctor break for corresponding disease, Inspection, drug of needs of required work etc., to can identify doctor and patient from text identification result based on these features Sentence, and then isolate the first text data and the second text data.

I.e. the embodiment of the present invention can collect the text of doctor's interrogation and the text of patient's interrogation in advance, and for each The interrogation information analyzed is collected, to count language feature and patient and its family of doctor (i.e. target user) Belong to the language feature of (i.e. other users), and establish corresponding model, convenient for distinguishing the text of different user based on the language feature This.Wherein, it can determine that the language feature of different user establishes preset model by modes such as machine learning, probability statistics.

Wherein, the embodiment of the present invention can obtain a large amount of separated case text as training data, separated doctor Case text is the interrogation information for identifying target user and other users, the positive information of text such as obtained in history according to identification.It can To including doctor's content-data (the first text data of target user) and patient content's data (second of other users Text data) it is trained respectively, doctor's content model and patient content's model are obtained, both certain models can synthesize one Preset model may recognize that the sentence of doctor and the sentence of patient based on the preset model.

For example, it is the question sentence with symptom class vocabulary that doctor's content is generally mostly in the case information that interrogation obtains, such as you Feel how, have what symptom, it is where uncomfortable etc.；And patient content is generally mostly to be asked with Symptoms, epidemic disease class Sentence, such as whether I catch a cold, and is XX disease etc.；It is the declarative sentence with symptom and drug that doctor's content is generally mostly, such as You are viral influenza, you can have some XX medicine etc..To which the sentence content of doctor and the sentence content of patient all have ratio More significant language feature, therefore doctor's content model and patient content's mould can be obtained according to the training of separated case information Type.

Feature identification is carried out to the text identification result, isolates the first text data and the second text according to language feature Notebook data, comprising: the text identification result is divided, corresponding text fragments are obtained；Using preset model to described Text fragments are identified, determine that the language feature that the text fragments have, the language feature include first language feature With second language feature；The first text data is generated using the text fragments with first language feature, and, using having the The text fragments of two language features generate the second text data.First text identification result can be divided, it can be according to Chinese Sentence feature etc., is divided into sentence for text identification result, can also divide to obtain multiple text fragments according to other modes.Then will Each text fragments sequentially input preset model, are identified by preset model to text fragments, each so as to identify Language feature possessed by text fragments.Certainly, which may be alternatively provided as based on the language feature identified, be this article This segment divides owning user.Wherein, using language this feature of target user as first language feature, by the language of other users Feature is sayed as second language feature, then preset model can be used and determine that text fragments have first language feature or the second language Say feature.Then the first text will can be generated with the text fragments of first language feature according to the stripe sequence of text fragments Data, and, the second text data is generated using the text fragments with second language feature.

Step 306, according to first text data and the second text data, interrogation information is obtained.

Step 308, the interrogation information is analyzed, is analyzed accordingly as a result, the analysis result and disease Diagnosis is related.

Correspond to the sequence of sound bite according to the first text data and the second text data, it can will be in the first text data respectively Each text fragments in text fragments and the second text data, are ranked up according to corresponding sequence, to obtain corresponding interrogation Information can record doctor in the interrogation information in the problems in interrogation and the answer of respective patient (family members), and doctor The various information such as raw diagnosis, doctor's advice.

Habit, the demand of case are recorded for doctor, are based on above scheme, can be by way of recording, it will be with patient's Communication process is recorded, and the sentence of doctor and patient is then demultiplex out, and is distinguished and is arranged, is supplied in the form of a dialog Doctor's behaviours case can be effectively reduced the time of doctor's institute's telephone expenses in case arrangement.

It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.

Referring to Fig. 4, a kind of structural block diagram of voice-based data processing equipment embodiment of the invention is shown, specifically May include following module:

Data acquisition module 402, for obtaining interrogation process data, the interrogation process data is adopted in the process according to interrogation The voice data of collection determines.

Text identification module 404 obtains corresponding first textual data for being identified according to the interrogation process data According to the second text data, wherein first text data belongs to a target user, and second text data, which belongs to, to be removed Other users except the target user.

Information determination module 406, for obtaining interrogation information according to first text data and the second text data.

Wherein, at least two users can carry out communication interaction during the interrogation, a user is doctor, other use Family is patient, family numbers of patients etc..E.g. according to doctor's outpatient service in one day acquisition, then it wherein will include a doctor and several trouble Person, it is also possible to have one or several family numbers of patients.Therefore can be by doctor's behaviours target user for interrogation record, then the first text Data are the corresponding interrogation text data of doctor, and using the text data of at least one other users as the second textual data According to the i.e. corresponding interrogation text data of patient and family members.Since interrogation is usually the process of question and answer, above-mentioned first textual data It is made up of according to can be with the second text data multiple text fragments, therefore can time based on text fragments and to application Family obtains interrogation information.

Such as a kind of example of interrogation information is as follows:

What symptom do 2017-4-23 10:23AM doctor A: you have? patient B: my XXX is uncomfortable.Doctor A: either with or without XXX? patient B, has ...

In conclusion for passing through acquisition determining interrogation process data during interrogation, it can be from interrogation process data According to different user identify the first text data and the second text data, wherein first text data belongs to one Target user, second text data belong to the other users in addition to the target user, can automatic distinguishing interrogation The sentence of doctor, patient in the process, then according to first text data and the second text data, interrogation information is obtained, it can Complete record interrogation process, automatic arranging obtain the contents such as case, save the finishing time of interrogation record.

Referring to Fig. 5, a kind of structural block diagram of voice-based data processing equipment embodiment of the invention is shown, specifically May include following module:

Wherein, the interrogation process data includes the text identification result that voice data and/or voice data identify.

The interrogation process data is voice data；The text identification module 404 may include:

Separate submodule 40402, for according to vocal print feature, isolated from the voice data the first voice data and Second speech data.

Speech recognition submodule 40404, for carrying out voice respectively to first voice data and second speech data Identification obtains corresponding first text data and the second text data.

Wherein, the separation submodule 40402, for the voice data to be divided into multiple sound bites；According to sound Line feature determines the first voice data and second speech data using the sound bite.

Preferably, the separation submodule 40402, for being carried out respectively to each sound bite using benchmark vocal print feature Match, wherein the benchmark vocal print feature is the vocal print feature of target user；Obtain the voice being consistent with the benchmark vocal print feature Segment obtains corresponding first voice data；The sound bite not being consistent with the benchmark vocal print feature is obtained, is obtained corresponding Second speech data.

Preferably, the separation submodule 40402, identifies for the vocal print feature to each sound bite；It unites respectively The quantity that each vocal print feature corresponds to sound bite is counted, determining has the maximum vocal print feature of quantity of sound bite, using described The corresponding sound bite of vocal print feature generates the first voice data, wherein the maximum vocal print feature of quantity is the sound of target user Line feature；Second speech data is generated using the sound bite for being not belonging to the first voice data.

It may be the record data of a multiple outpatient service of doctor by interrogation process data based on the characteristic of interrogation process, Therefore, doctor often occupies more time and different patients and its family members' exchange interrogation, i.e. voice in this process The voice quantity of doctor (target user) is most in data, therefore the amount field subhead of sound bite can be corresponded to according to different user User and other users are marked, and obtain the first voice data and second speech data.

In the embodiment of the present invention, since voice data is acquired in the scene of multi-conference, a voice sheet It may include the vocal print feature of multiple users in section.It is multiple for identifying from a sound bite to separate submodule 40402 The case where vocal print feature, can be performed following processing: in different vocal print features be occurred in different time, if vocal print feature is The sound bite can then be added in second speech data by the vocal print feature of other users；And if vocal print feature includes target The vocal print feature of user and the vocal print feature of other users can will then be added to corresponding after the subdivided sub-piece of the sound bite In voice data.When different vocal print features are that occur in the same time, i.e., the same time has at least two users speaking, then If vocal print feature is the vocal print feature of other users, which can be added in second speech data, and if vocal print Feature includes the vocal print feature of target user and the vocal print feature of other users, can be divided according to demand, such as by the voice sheet Section is classified as the sound bite of target user to obtain the first voice data, or the sound bite is classified as to the voice of other users Segment is added respectively to obtain second speech data, or in the voice data of two kinds of users.

Preferably, the speech recognition submodule 40404, for distinguishing sound bite each in first voice data Speech recognition is carried out, generates the first text data using the text fragments that identification obtains；To each language in the second speech data Tablet section carries out speech recognition respectively, generates the second text data using the text fragments that identification obtains.Then the information determines Module 406, for according to each text fragments in each text fragments in first text data and second text data point The time sequencing for not corresponding to sound bite is ranked up each text fragments, obtains interrogation information.

Preferably, the interrogation process data is the text identification result that voice data identifies；The text identification Module 404 isolates the first text data and the according to language feature for carrying out feature identification to the text identification result Two text datas.

The text identification module 404, comprising:

Segment changes molecular modules 40406, for dividing to the text identification result, obtains corresponding text piece Section.

Segment identifies submodule 40408, for identifying using preset model to the text fragments, determines the text The language feature that this segment has, the language feature include first language feature and second language feature.

Wherein, the embodiment of the present invention can obtain a large amount of separated case text as training data, separated doctor Case text is the interrogation information for identifying target user and other users, the positive information of text such as obtained in history according to identification.It can To including doctor's content-data (the first text data of target user) and patient content's data (second of other users Text data) it is trained respectively, doctor's content model and patient content's model are obtained, both certain models can synthesize one Preset model may recognize that the sentence of doctor and the sentence of patient based on the preset model.For example, the case information that interrogation obtains In, it is the question sentence with symptom class vocabulary that doctor's content is generally mostly, such as you feel how, have what symptom, where do not relax Clothes etc.；And it is the question sentence with Symptoms, epidemic disease class that patient content is generally mostly, such as whether I catch a cold, and is XX disease Deng；It is the declarative sentence with symptom and drug that doctor's content is generally mostly, such as you are viral influenza, you can have some XX medicine etc. Deng.To which, the sentence content of doctor and the sentence content of patient all have the significant language feature of comparison, therefore can be according to having divided From case information training obtain doctor's content model and patient content's model.

Preferably, text generation submodule 40410, for generating first using the text fragments with first language feature Text data, and, the second text data is generated using the text fragments with second language feature.

Preferably, the device further include: analysis module 408 obtains phase for analyzing the interrogation information The analysis answered is as a result, the analysis result is related to medical diagnosis on disease.

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.

Fig. 6 is a kind of electronic equipment 600 for voice-based data processing shown according to an exemplary embodiment Structural block diagram.For example, electronic equipment 600 can be mobile phone, computer, digital broadcasting terminal, messaging device, trip Play console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc.；It is also possible to server device, such as services Device.

Referring to Fig. 6, electronic equipment 600 may include following one or more components: processing component 602, memory 604, Power supply module 606, multimedia component 608, audio component 610, the interface 612 of input/output (I/O), sensor module 614, And communication component 616.

The integrated operation of the usual controlling electronic devices 600 of processing component 602, such as with display, call, data are logical Letter, camera operation and record operate associated operation.Processing element 602 may include one or more processors 620 to hold Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 602 may include one or more moulds Block, convenient for the interaction between processing component 602 and other assemblies.For example, processing component 602 may include multi-media module, with Facilitate the interaction between multimedia component 608 and processing component 602.

Memory 604 is configured as storing various types of data to support the operation in equipment 600.These data are shown Example includes the instruction of any application or method for operating on electronic equipment 600, contact data, telephone directory number According to, message, picture, video etc..Memory 604 can by any kind of volatibility or non-volatile memory device or they Combination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable Programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, quick flashing Memory, disk or CD.

Electric power assembly 604 provides electric power for the various assemblies of electronic equipment 600.Electric power assembly 604 may include power supply pipe Reason system, one or more power supplys and other with for electronic equipment 600 generate, manage, and distribute the associated component of electric power.

Multimedia component 608 includes the screen of one output interface of offer between the electronic equipment 600 and user. In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 608 includes a front camera and/or rear camera.When electronic equipment 600 is in operation mode, as clapped When taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a Mike Wind (MIC), when electronic equipment 600 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone It is configured as receiving external audio signal.The received audio signal can be further stored in memory 604 or via logical Believe that component 616 is sent.In some embodiments, audio component 610 further includes a loudspeaker, is used for output audio signal.

I/O interface 612 provides interface between processing component 402 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 614 includes one or more sensors, for providing the state of various aspects for electronic equipment 600 Assessment.For example, sensor module 614 can detecte the state that opens/closes of equipment 600, the relative positioning of component, such as institute The display and keypad that component is electronic equipment 600 are stated, sensor module 614 can also detect electronic equipment 600 or electronics The position change of 600 1 components of equipment, the existence or non-existence that user contacts with electronic equipment 600,600 orientation of electronic equipment Or the temperature change of acceleration/deceleration and electronic equipment 600.Sensor module 614 may include proximity sensor, be configured to It detects the presence of nearby objects without any physical contact.Sensor module 614 can also include optical sensor, such as CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which can be with Including acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 616 is configured to facilitate the communication of wired or wireless way between electronic equipment 600 and other equipment. Electronic equipment 400 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one In example property embodiment, communication component 614 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel Relevant information.In one exemplary embodiment, the communication component 614 further includes near-field communication (NFC) module, short to promote Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module (UWB) technology, bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, electronic equipment 600 can be by one or more application specific integrated circuit (ASIC), number Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 604 of instruction, above-metioned instruction can be executed by the processor 620 of electronic equipment 400 to complete the above method.Example Such as, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, soft Disk and optical data storage devices etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipment When device executes, so that electronic equipment is able to carry out a kind of voice-based data processing method, which comprises obtain interrogation Process data, the interrogation process data are determined according to the voice data acquired during interrogation；Number of passes is crossed according to the interrogation According to being identified, corresponding first text data and the second text data are obtained, wherein first text data belongs to one Target user, second text data belong to the other users in addition to the target user；According to first textual data According to the second text data, obtain interrogation information.

Optionally, the interrogation process data includes the text identification knot that voice data and/or voice data identify Fruit.

Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite According to, comprising: the vocal print feature of each sound bite is identified；Count the quantity that each vocal print feature corresponds to sound bite；It determines The maximum vocal print feature of quantity with sound bite generates the first voice number using the corresponding sound bite of the vocal print feature According to；Second speech data is generated using the sound bite for being not belonging to the first voice data.

Optionally, speech recognition carried out to first voice data and second speech data respectively, obtains corresponding the One text data and the second text data, comprising: speech recognition is carried out respectively to each sound bite in first voice data, The first text data is generated using the text fragments that identification obtains；Each sound bite in the second speech data is carried out respectively Speech recognition generates the second text data using the text fragments that identification obtains.

Optionally, feature identification is carried out to the text identification result, isolates the first text data according to language feature With the second text data, comprising: divided to the text identification result, obtain corresponding text fragments；Using default mould Type identifies the text fragments, determines that the language feature that the text fragments have, the language feature include first Language feature and second language feature；The first text data is generated using the text fragments with first language feature, and, it adopts The second text data is generated with the text fragments with second language feature.

Optionally, further includes: the interrogation information is analyzed, is analyzed accordingly as a result, the analysis result It is related to medical diagnosis on disease.

Fig. 7 is a kind of electronics for voice-based data processing that the present invention is shown according to another exemplary embodiment The structural schematic diagram of equipment 700.The electronic equipment 700 can be server, which can produce because configuration or performance are different Raw bigger difference, may include one or more central processing units (central processing units, CPU) 722 (for example, one or more processors) and memory 732, one or more storage application programs 742 or data 744 storage medium 730 (such as one or more mass memory units).Wherein, memory 732 and storage medium 730 It can be of short duration storage or persistent storage.The program for being stored in storage medium 730 may include one or more module (figures Show and do not mark), each module may include to the series of instructions operation in server.Further, central processing unit 722 It can be set to communicate with storage medium 730, execute the series of instructions operation in storage medium 730 on the server.

Server can also include one or more power supplys 726, one or more wired or wireless networks connect Mouthfuls 750, one or more input/output interfaces 758, one or more keyboards 756, and/or, one or one with Upper operating system 741, such as Windows ServerTM, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM etc..

In the exemplary embodiment, server is configured to by one or more than one central processing unit 722 executes one A or more than one program includes the instruction for performing the following operation: obtaining interrogation process data, number of passes is crossed in the interrogation It is determined according to according to the voice data acquired during interrogation；It is identified according to the interrogation process data, obtains corresponding the One text data and the second text data, wherein first text data belongs to a target user, second textual data According to the other users belonged in addition to the target user；According to first text data and the second text data, asked Examine information.

Optionally, server is by one or more than one processor 522 executes the one or more programs Include the instruction for being also used to perform the following operation: the interrogation information is analyzed, is analyzed accordingly as a result, described point It is related to medical diagnosis on disease to analyse result.

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.

The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of specified function.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram The function of being specified in frame or multiple boxes.

These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart And/or in one or more blocks of the block diagram specify function the step of.

Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.

Above to a kind of corpus abstracting method provided by the present invention, a kind of corpus draw-out device and a kind of electronic equipment, It is described in detail, used herein a specific example illustrates the principle and implementation of the invention, the above reality The explanation for applying example is merely used to help understand method and its core concept of the invention；Meanwhile for the general technology of this field Personnel, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this theory Bright book content should not be construed as limiting the invention.

Claims

1. a kind of voice-based data processing method characterized by comprising

Interrogation process data is obtained, the interrogation process data is determined according to the voice data acquired during interrogation；

It is identified according to the interrogation process data, obtains corresponding first text data and the second text data, wherein institute It states the first text data and belongs to a target user, second text data belongs to other use in addition to the target user Family；

According to first text data and the second text data, interrogation information is obtained.

2. the method according to claim 1, wherein the interrogation process data is voice data；

It is described to be identified according to the interrogation process data, obtain corresponding first text data and the second text data, packet It includes:

According to vocal print feature, the first voice data and second speech data are isolated from the voice data；

Speech recognition carried out respectively to first voice data and second speech data, obtain corresponding first text data and Second text data.

3. according to the method described in claim 2, it is characterized in that, the foundation vocal print feature, divides from the voice data Separate out the first voice data and second speech data, comprising:

The voice data is divided into multiple sound bites；

According to vocal print feature, the first voice data and second speech data are determined using the sound bite.

4. according to the method described in claim 3, it is characterized in that, the foundation vocal print feature, true using the sound bite Fixed first voice data and second speech data, comprising:

Each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print feature is target user Vocal print feature；

The sound bite being consistent with the benchmark vocal print feature is obtained, corresponding first voice data is obtained；

The sound bite not being consistent with the benchmark vocal print feature is obtained, corresponding second speech data is obtained.

5. according to the method described in claim 3, it is characterized in that, the foundation vocal print feature, true using the sound bite Fixed first voice data and second speech data, comprising:

The vocal print feature of each sound bite is identified；

Count the quantity that each vocal print feature respectively corresponds sound bite；

It determines the maximum vocal print feature of quantity with sound bite, generates the using the corresponding sound bite of the vocal print feature One voice data；

Second speech data is generated using the sound bite for being not belonging to first voice data.

6. according to the method described in claim 2, it is characterized in that, described to first voice data and second speech data Speech recognition is carried out respectively, obtains corresponding first text data and the second text data, comprising:

Speech recognition is carried out to each sound bite in first voice data respectively, is generated using the text fragments that identification obtains First text data；

Speech recognition is carried out to each sound bite in the second speech data respectively, is generated using the text fragments that identification obtains Second text data；

Then, described according to first text data and the second text data, obtain interrogation information, comprising:

Language is respectively corresponded according to each text fragments in each text fragments in first text data and second text data The time sequencing of tablet section is ranked up each text fragments, obtains interrogation information.

7. the method according to claim 1, wherein the interrogation process data is what voice data identified Text identification result；

Feature identification is carried out to the text identification result, isolates the first text data and the second textual data according to language feature According to.

8. the method according to the description of claim 7 is characterized in that carrying out feature identification, foundation to the text identification result Language feature isolates the first text data and the second text data, comprising:

The text identification result is divided, corresponding text fragments are obtained；

The text fragments are identified using preset model, determine the language feature that the text fragments have, institute's predicate Say that feature includes target user's language feature and non-targeted user language feature；

The first text data is generated using the text fragments with target user's language feature, and, using with non-targeted use The text fragments of family language feature generate the second text data.

9. a kind of voice-based data processing equipment characterized by comprising

Data acquisition module, for obtaining interrogation process data, the interrogation process data is according to the language acquired during interrogation Sound data determine；

Text identification module obtains corresponding first text data and for being identified according to the interrogation process data Two text datas, wherein first text data belongs to a target user, and second text data belongs to except the mesh Mark the other users except user；

Information determination module, for obtaining interrogation information according to first text data and the second text data.

10. a kind of readable storage medium storing program for executing, which is characterized in that when the instruction in the storage medium is held by the processor of electronic equipment When row, so that electronic equipment is able to carry out at the voice-based data as described in one or more in claim to a method 1-8 Reason method.

11. a kind of electronic equipment, which is characterized in that include memory and one or more than one program, wherein one A perhaps more than one program is stored in memory and is configured to execute described one by one or more than one processor A or more than one program includes the instruction for performing the following operation: