CN108962253A - A kind of voice-based data processing method, device and electronic equipment - Google Patents
A kind of voice-based data processing method, device and electronic equipment Download PDFInfo
- Publication number
- CN108962253A CN108962253A CN201710384412.3A CN201710384412A CN108962253A CN 108962253 A CN108962253 A CN 108962253A CN 201710384412 A CN201710384412 A CN 201710384412A CN 108962253 A CN108962253 A CN 108962253A
- Authority
- CN
- China
- Prior art keywords
- data
- text
- voice
- interrogation
- text data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 238000000034 method Methods 0.000 claims abstract description 135
- 230000008569 process Effects 0.000 claims abstract description 102
- 230000001755 vocal effect Effects 0.000 claims description 177
- 239000012634 fragment Substances 0.000 claims description 89
- 238000012545 processing Methods 0.000 claims description 31
- 238000012163 sequencing technique Methods 0.000 claims description 8
- 241001269238 Data Species 0.000 claims description 4
- 201000010099 disease Diseases 0.000 description 22
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 22
- 238000004458 analytical method Methods 0.000 description 16
- 238000003745 diagnosis Methods 0.000 description 15
- 208000024891 symptom Diseases 0.000 description 15
- 238000010586 diagram Methods 0.000 description 14
- 230000006854 communication Effects 0.000 description 11
- 239000003814 drug Substances 0.000 description 11
- 238000004891 communication Methods 0.000 description 9
- 230000006399 behavior Effects 0.000 description 7
- 238000000926 separation method Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000000712 assembly Effects 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 206010022000 influenza Diseases 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000003612 virological effect Effects 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the present invention provides a kind of voice-based data processing method, device and electronic equipment, completely to record interrogation process.The method includes: to obtain interrogation process data, and the interrogation process data is determined according to the voice data acquired during interrogation;It is identified according to the interrogation process data, obtains corresponding first text data and the second text data, wherein first text data belongs to a target user, and second text data belongs to the other users in addition to the target user;According to first text data and the second text data, interrogation information is obtained.Using the embodiment of the present invention, can during automatic distinguishing interrogation doctor, patient sentence, complete to record interrogation process, automatic arranging obtains the contents such as case, saves the finishing time of interrogation record.
Description
Technical field
The present invention relates to technical fields, set more particularly to a kind of voice-based data processing method, device and electronics
It is standby.
Background technique
Speech recognition is usually to convert speech into text, and traditional speech recognition equipments of recording can only turn voice data
It is changed to corresponding text, and speaker cannot be distinguished.It therefore, can not be effective by speech recognition in the case where multi-person speech
It is recorded.
Such as in the practical diagnosis and treatment process of hospital, at least have two people and exchange, i.e., at least have doctor and patient into
Row exchange, is also possible that family numbers of patients etc. sometimes, and cannot achieve by existing voice identification facility and ask the voice of acquisition
It examines and records corresponding voice producer and distinguish, can not comprehensively record entire interrogation process.
Summary of the invention
The embodiment of the present invention provides a kind of voice-based data processing method, completely to record interrogation process.
Correspondingly, the embodiment of the invention also provides a kind of voice-based data processing equipments, a kind of electronic equipment, one
Kind readable storage medium storing program for executing, to guarantee the implementation and application of the above method.
To solve the above-mentioned problems, the embodiment of the invention discloses a kind of voice-based data processing methods, comprising: obtains
Interrogation process data is taken, the interrogation process data is determined according to the voice data acquired during interrogation;According to the interrogation
Process data is identified, obtains corresponding first text data and the second text data, wherein the first text data category
In a target user, second text data belongs to the other users in addition to the target user;According to described first
Text data and the second text data, obtain interrogation information.
Optionally, the interrogation process data is voice data;It is described to be identified according to the interrogation process data, it obtains
Take corresponding first text data and the second text data, comprising: according to vocal print feature, the is isolated from the voice data
One voice data and second speech data;Speech recognition is carried out to first voice data and second speech data respectively, is obtained
Take corresponding first text data and the second text data.
Optionally, described according to vocal print feature, the first voice data and the second voice are isolated from the voice data
Data, comprising: the voice data is divided into multiple sound bites;According to vocal print feature, determined using the sound bite
First voice data and second speech data.
Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite
According to, comprising: each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print feature is target
The vocal print feature of user;The sound bite being consistent with the benchmark vocal print feature is obtained, corresponding first voice data is obtained;It obtains
The sound bite not being consistent with the benchmark vocal print feature is taken, corresponding second speech data is obtained.
Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite
According to, comprising: the vocal print feature of each sound bite is identified;Count the quantity that each vocal print feature corresponds to sound bite;It determines
The maximum vocal print feature of quantity with sound bite generates the first voice number using the corresponding sound bite of the vocal print feature
According to;Second speech data is generated using the sound bite for being not belonging to first voice data.
Optionally, described that speech recognition is carried out respectively to first voice data and second speech data, it obtains and corresponds to
The first text data and the second text data, comprising: voice is carried out respectively to each sound bite in first voice data
Identification generates the first text data using the text fragments that identification obtains;To each sound bite in the second speech data point
Not carry out speech recognition, generate the second text data using the obtained text fragments of identification;Then, described according to first text
Data and the second text data, obtain interrogation information, comprising: according to each text fragments in first text data and described
Each text fragments respectively correspond the time sequencing of sound bite in two text datas, are ranked up, are asked to each text fragments
Examine information.
Optionally, the interrogation process data is the text identification result that voice data identifies;Described in the foundation
Interrogation process data is identified, obtains corresponding first text data and the second text data, comprising: to the text identification
As a result feature identification is carried out, isolates the first text data and the second text data according to language feature.
Optionally, feature identification is carried out to the text identification result, isolates the first text data according to language feature
With the second text data, comprising: divided to the text identification result, obtain corresponding text fragments;Using default mould
Type identifies the text fragments, determines the language feature that the text fragments have, the language feature includes target
User language feature and non-targeted user language feature;The first text is generated using the text fragments with target user's language feature
Notebook data, and, the second text data is generated using the text fragments with non-targeted user language feature.
The embodiment of the invention also discloses a kind of voice-based data processing equipments, comprising: data acquisition module is used for
Interrogation process data is obtained, the interrogation process data is determined according to the voice data acquired during interrogation;Text identification mould
Block, for being identified according to the interrogation process data, corresponding first text data of acquisition and the second text data,
In, first text data belongs to a target user, and second text data belongs in addition to the target user
Other users;Information determination module, for obtaining interrogation information according to first text data and the second text data.
Optionally, the interrogation process data is voice data;The text identification module, comprising: separation submodule is used
According to vocal print feature, the first voice data and second speech data are isolated from the voice data;Speech recognition submodule
Block obtains corresponding first textual data for carrying out speech recognition respectively to first voice data and second speech data
According to the second text data.
Optionally, the separation submodule, for the voice data to be divided into multiple sound bites;It is special according to vocal print
Sign, determines the first voice data and second speech data using the sound bite.
Optionally, the separation submodule, for being matched respectively using benchmark vocal print feature to each sound bite,
In, the benchmark vocal print feature is the vocal print feature of target user;The audio fragment being consistent with the benchmark vocal print feature is obtained,
Obtain corresponding first voice data;The audio fragment not being consistent with the benchmark vocal print feature is obtained, obtains corresponding second
Voice data.
Optionally, the separation submodule, identifies for the vocal print feature to each sound bite;Statistics has respectively
The sound bite and its quantity of identical vocal print feature generate second speech data using quantity maximum sound bite, wherein quantity
Maximum vocal print feature is the vocal print feature of target user;Second speech data is generated using remaining sound bite.
Optionally, the speech recognition submodule, for being carried out respectively to each sound bite in first voice data
Speech recognition generates the first text data using the text fragments that identification obtains;To each voice sheet in the second speech data
Section carries out speech recognition respectively, generates the second text data using the text fragments that identification obtains;The information determination module is used
Each text fragments respectively correspond voice in each text fragments and second text data according to first text data
The time sequencing of segment is ranked up each text fragments, obtains interrogation information.
Optionally, the interrogation process data is the text identification result that voice data identifies;The text identification
Module isolates the first text data and second according to language feature for carrying out feature identification to the text identification result
Text data.
Optionally, the text identification module, comprising: segment changes molecular modules, for the text identification result into
Row divides, and obtains corresponding text fragments;Segment identifies submodule, for being known using preset model to the text fragments
Not, determine that the language feature that the text fragments have, the language feature include first language feature and second language feature;
Text generation submodule for use there are the text fragments of first language feature to generate the first text data, and, using tool
There are the text fragments of second language feature to generate the second text data.
The embodiment of the invention also discloses a kind of readable storage medium storing program for executing, when the instruction in the storage medium is by electronic equipment
Processor execute when so that electronic equipment be able to carry out it is voice-based as described in one or more in the embodiment of the present invention
Data processing method.
Optionally, a kind of electronic equipment, includes memory and one or more than one program, one of them
Perhaps more than one program is stored in memory and is configured to be executed by one or more than one processor one
Or more than one program includes the instruction for performing the following operation: obtaining interrogation process data, the interrogation process data
It is determined according to the voice data acquired during interrogation;It is identified according to the interrogation process data, obtains corresponding first
Text data and the second text data, wherein first text data belongs to a target user, second text data
Belong to the other users in addition to the target user;According to first text data and the second text data, interrogation is obtained
Information.
Optionally, the interrogation process data is voice data;It is described to be identified according to the interrogation process data, it obtains
Take corresponding first text data and the second text data, comprising: according to vocal print feature, the is isolated from the voice data
One voice data and second speech data;Speech recognition is carried out to first voice data and second speech data respectively, is obtained
Take corresponding first text data and the second text data.
Optionally, described according to vocal print feature, the first voice data and the second voice are isolated from the voice data
Data, comprising: the voice data is divided into multiple sound bites;According to vocal print feature, determined using the sound bite
First voice data and second speech data.
Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite
According to, comprising: each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print feature is target
The vocal print feature of user;The sound bite being consistent with the benchmark vocal print feature is obtained, corresponding first voice data is obtained;It obtains
The sound bite not being consistent with the benchmark vocal print feature is taken, corresponding second speech data is obtained.
Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite
According to, comprising: the vocal print feature of each sound bite is identified;Count the quantity that each vocal print feature corresponds to sound bite;It determines
The maximum vocal print feature of quantity with sound bite generates the first voice number using the corresponding sound bite of the vocal print feature
According to;Second speech data is generated using the sound bite for being not belonging to first voice data.
Optionally, described that speech recognition is carried out respectively to first voice data and second speech data, it obtains and corresponds to
The first text data and the second text data, comprising: voice is carried out respectively to each sound bite in first voice data
Identification generates the first text data using the text fragments that identification obtains;To each sound bite in the second speech data point
Not carry out speech recognition, generate the second text data using the obtained text fragments of identification;Then, described according to first text
Data and the second text data, obtain interrogation information, comprising: according to each text fragments in first text data and described
Each text fragments respectively correspond the time sequencing of sound bite in two text datas, are ranked up, are asked to each text fragments
Examine information.
Optionally, the interrogation process data is the text identification result that voice data identifies;Described in the foundation
Interrogation process data is identified, obtains corresponding first text data and the second text data, comprising: to the text identification
As a result feature identification is carried out, isolates the first text data and the second text data according to language feature.
Optionally, feature identification is carried out to the text identification result, isolates the first text data according to language feature
With the second text data, comprising: divided to the text identification result, obtain corresponding text fragments;Using default mould
Type identifies the text fragments, determines the language feature that the text fragments have, the language feature includes target
User language feature and non-targeted user language feature;The first text is generated using the text fragments with target user's language feature
Notebook data, and, the second text data is generated using the text fragments with non-targeted user language feature.
The embodiment of the present invention includes following advantages:
The interrogation process data that the embodiment of the present invention can be determined during interrogation by acquisition voice, can be from interrogation
Number of passes identifies the first text data and the second text data according to different user in, wherein the first text data category
In a target user, second text data belongs to the other users in addition to the target user, can automatic area
The sentence of doctor, patient during point interrogation, then according to first text data and the second text data, obtain interrogation letter
Breath can completely record interrogation process, and automatic arranging obtains the contents such as case, save the finishing time of interrogation record.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of voice-based data processing method embodiment of the invention;
Fig. 2 is the step flow chart of the voice-based data processing method embodiment of another kind of the invention;
Fig. 3 is the step flow chart of another voice-based data processing method embodiment of the invention;
Fig. 4 is a kind of structural block diagram of voice-based data processing equipment embodiment of the invention;
Fig. 5 is the structural block diagram of the voice-based data processing equipment embodiment of another kind of the invention;
Fig. 6 is that a kind of present invention electronics for voice-based data processing shown according to an exemplary embodiment is set
Standby structural block diagram;
Fig. 7 is a kind of electronic equipment for voice-based data processing that the present invention is shown according to another exemplary embodiment
Structural schematic diagram.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
Referring to Fig.1, a kind of step flow chart of voice-based data processing method embodiment of the invention is shown, is had
Body may include steps of:
Step 102, interrogation process data is obtained, the interrogation process data is according to the voice data acquired during interrogation
It determines.
During interrogation, voice collecting, the language based on acquisition can be carried out to the interrogation process by various electronic equipments
Sound data obtain interrogation process data, i.e. the interrogation process data can be the voice data of acquisition, also may be based on the language of acquisition
The text identification result that sound data conversion obtains.To the embodiment of the present invention can using various interrogation processes acquisition data into
Row identification.
Step 104, it is identified according to the interrogation process data, obtains corresponding first text data and the second text
Data, wherein first text data belongs to a target user, and second text data belongs to except the target user
Except other users.
The interrogation process data can be identified, the difference according to data type uses different recognition methods, such as
Voice data can be handled by modes such as vocal print feature, speech recognitions, text data can be identified by text feature,
To obtain the first text data and the second text data distinguished according to user.Wherein, can have at least during the interrogation
Two users carry out communication interaction, and a user is doctor, and other users are patient, family numbers of patients etc..E.g. according to doctor
The acquisition of outpatient service in one day, then it wherein will include a doctor and several patients, it is also possible to have one or several family numbers of patients.Therefore
Can be by doctor's behaviours target user for interrogation record, then the first text data is the corresponding interrogation text data of doctor, and
Using the text data of at least one other users as the second text data, the i.e. corresponding interrogation text data of patient and family members.
Step 106, according to first text data and the second text data, interrogation information is obtained.
Since interrogation is usually the process of question and answer, above-mentioned first text data and the second text data can be and pass through
What multiple text fragments were constituted, therefore the time based on text fragments interrogation information can be obtained with corresponding user.
Such as a kind of example of interrogation information is as follows:
2017-4-23 10:23AM
What symptom do doctor A: you have?
Patient B: my XXX is uncomfortable.
Doctor A: either with or without XXX?
Patient B: have.
……
In actual treatment, it may also be combined with outpatient service record of hospital etc. and obtain patient information, to be distinguished in interrogation information
Different patient etc. out.
In conclusion for the interrogation process data determined during interrogation by acquisition voice, it can be from interrogation process
The first text data and the second text data are identified according to different user in data, wherein first text data belongs to
One target user, second text data belong to the other users in addition to the target user, can automatic distinguishing
The sentence of doctor, patient during interrogation, then according to first text data and the second text data, interrogation information is obtained,
Interrogation process can be completely recorded, automatic arranging obtains the contents such as case, saves the finishing time of interrogation record.
In the embodiment of the present invention, interrogation process data includes the text knowledge that voice data and/or voice data identify
Other result.The recognition methods of different types of interrogation process data is different, therefore the embodiment of the present invention discusses different type respectively
The treatment process of interrogation process data.
Referring to Fig. 2, the step flow chart of the voice-based data processing method embodiment of another kind of the invention is shown,
In the embodiment, the interrogation process data is voice data;It can specifically include following steps:
Step 202, interrogation process data is obtained, the interrogation process data is the voice data of acquisition during interrogation.
During interrogation, the acquisition of voice data can be carried out to the interrogation process by various electronic equipments, such as logical
The equipment recording audio data such as recording pen, mobile phone, computer are crossed, the voice data acquired during interrogation, the voice number are obtained
It can also be the voice data that a doctor acquires in multiple outpatient service, the present invention according to the voice data that can be primary outpatient service acquisition
Embodiment to this with no restriction.It therefore include the voice data of a doctor and the language of at least one patient in the voice data
Sound data may also include the voice data of at least one family numbers of patients.
Wherein, above-mentioned steps 104 are identified according to the interrogation process data, obtain corresponding first text data and
Second text data, it may include following steps 204-206.
Step 204, according to vocal print feature, the first voice data and the second voice number are isolated from the voice data
According to.
Vocal print (Voiceprint) refers to the sound wave spectrum for the carrying verbal information that electricity consumption acoustic instrument is shown.Vocal print tool
There is the feature of specificity and stability.After adult, the vocal print of people can keep stablizing relatively for a long time constant, therefore can pass through vocal print
Identify different people.Therefore, it for voice data, can be identified by vocal print feature, determine different user in the voice data
(vocal print feature) corresponding sound bite, to obtain the first voice data of target user and the second voice number of other users
According to.
Wherein, described according to vocal print feature, the first voice data and the second voice number are isolated from the voice data
According to, comprising: the voice data is divided into multiple sound bites;According to vocal print feature, the is determined using the sound bite
One voice data and second speech data.
Specifically, voice data can be divided into multiple sound bites.It wherein, can be according to voice division rule, such as sound
Dwell interval between segment is divided;The corresponding vocal print feature of each sound can also be determined, thus foundation according to vocal print feature
Different vocal print features divides sound bite.Therefore a voice data can mark off multiple sound bites, between each sound bite
With tandem, different sound bites can have identical or different vocal print feature.Therefore also true based on vocal print feature
Fixed each sound bite belongs to the first voice data or second speech data, that is, can determine that sound possessed by each sound bite
Then multiple sound bites of vocal print feature with target user are constituted the first voice data by line feature, by other residues
Sound bite constitute second speech data.
In the embodiment of the present invention, during to interrogation before the acquisition of voice data, doctor (target user) can first be acquired
One section of voice is as reference data, in order to identify the vocal print feature i.e. benchmark vocal print feature of doctor from the reference data.
Speech recognition modeling can also be set in the embodiment of the present invention, after voice data is inputted the speech recognition modeling, can will be met
The sound bite of benchmark voice print database is separated with the sound bite of other vocal print features, to obtain each voice sheet of target user
The sound bite of section and other users.In doctor's outpatient procedures, a doctor is usually only included in the case information of composition, and is suffered from
Person may have it is multiple, so that its corresponding a large amount of case sample can be obtained for some particular doctor through the above way.
In an alternative embodiment of the invention, the vocal print feature of target user can be acquired in advance, as benchmark vocal print feature,
To carry out the division of voice data.It is i.e. described according to vocal print feature, using the sound bite determine the first voice data and
Second speech data, comprising: each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print
Feature is the vocal print feature of target user;The sound bite being consistent with the benchmark vocal print feature is obtained, obtains corresponding first
Voice data;The sound bite not being consistent with the benchmark vocal print feature is obtained, corresponding second speech data is obtained.I.e. for
Target user such as doctor can acquire in advance its voice data to extract vocal print feature, using the vocal print feature of target user as base
Quasi- vocal print feature, so that the benchmark vocal print feature can be used to each sound bite point for the voice data with target user
It is not matched, determines that vocal print feature and benchmark vocal print are characterized in no consistent in each sound bite, think the language if consistent
Tablet section and benchmark vocal print characteristic matching, are added to the first voice data (as corresponding language of target user for the sound bite
Sound data) in.After vocal print feature in sound bite and benchmark vocal print feature are inconsistent, the sound bite and benchmark vocal print feature
It mismatches, which is added in second speech data (as non-targeted user corresponding voice data).I.e. first
Voice data and second speech data are made of corresponding sound bite, wherein each sound bite also has ordinal relation, from
And it is convenient for subsequent accurate determining interrogation information.
In another alternative embodiment of the invention, sound bite can also be corresponded to by vocal print feature identical in voice data
Quantity carries out the division of voice data.The i.e. described foundation vocal print feature, determines the first voice data using the sound bite
And second speech data, comprising: the vocal print feature of each sound bite is identified;It counts each vocal print feature and corresponds to sound bite
Quantity;Determine the maximum vocal print feature of quantity with sound bite, it is raw using the corresponding sound bite of the vocal print feature
At the first voice data, wherein the maximum vocal print feature of quantity is the vocal print feature of target user;Using being not belonging to the first voice
The sound bite of data generates second speech data.Based on the characteristic of interrogation process, interrogation process data may be a doctor
The record data of multiple outpatient service, therefore, in this process doctor often occupy more time and different patients and its
Family members exchange interrogation, i.e., the voice quantity of doctor (target user) is most in voice data, therefore can be corresponding according to different user
The amount field partial objectives for user of sound bite and other users, and obtain the first voice data and second speech data.It can be right
Vocal print feature in the sound bite is identified, is determined the vocal print feature that each sound bite is included, is then counted respectively
Each vocal print feature corresponds to the quantity of sound bite, and determining has the maximum vocal print feature of quantity of sound bite, by the sound
Line feature is determined as the vocal print feature of target user, other vocal print features are the vocal print feature of other users, to will have mesh
The sound bite for marking the vocal print feature of user constitutes the first audio data in sequence, and other sound bites (are not belonging to the
The sound bite of one voice data) second audio data is constituted in sequence.
In the embodiment of the present invention, since voice data is acquired in the scene of multi-conference, a voice sheet
It may include the vocal print feature of multiple users in section.The case where for identifying multiple vocal print features from a sound bite:
It, can be by the language if vocal print feature is the vocal print feature of other users when different vocal print features are that occur in different time
Tablet section is added in second speech data;And if vocal print feature includes the vocal print feature of target user and the vocal print of other users
Feature can will then be added in corresponding voice data after the subdivided sub-piece of the sound bite.When different vocal print features be
What the same time occurred, i.e., the same time has at least two users speaking, if then vocal print feature is the vocal print of other users
The sound bite can be added in second speech data by feature, and if vocal print feature include target user vocal print feature and
The vocal print feature of other users can divide according to demand, such as the sound bite that the sound bite is classified as target user is come
To the first voice data, which is perhaps classified as to the sound bite of other users come obtain second speech data or
It is added respectively in the voice data of two kinds of users.
Step 206, speech recognition carried out to first voice data and second speech data respectively, obtains corresponding the
One text data and the second text data.
After getting the first voice data and second speech data, two kinds of voice data can be identified respectively, from
And obtain the first text data of target user and the second text data of other users.
In one alternative embodiment, speech recognition is carried out to first voice data and second speech data respectively, is obtained
Take corresponding first text data and the second text data, comprising: to each sound bite in first voice data respectively into
Row speech recognition generates the first text data using the text fragments that identification obtains;To each voice in the second speech data
Segment carries out speech recognition respectively, generates the second text data using the text fragments that identification obtains.The first voice can be passed through
Identification of the data to each sound bite obtains the corresponding text data of the sound bite, thus the sequence according to sound bite
The first text data is constituted, the second text data also can be obtained using corresponding mode.Due to during interrogation the problem of doctor
Answer with patient is all sequential, therefore corresponding time sequencing is recorded when voice data is divided into sound bite,
Obtained the first text data and the second text data is also to have ordinal relation, is convenient for subsequent accurate arrangement interrogation information.
Step 208, according to first text data and the second text data, interrogation information is obtained.
The time sequencing that sound bite is corresponded to according to the first text data and the second text data, can be by the first text data
In each text fragments in each text fragments and the second text data, be ranked up according to corresponding sequence, such as time sequencing, thus
Corresponding interrogation information is obtained, can record doctor in the interrogation information in the problems in interrogation and respective patient (family members)
Answer and doctor the various information such as diagnosis, doctor's advice.
Step 210, the interrogation information is analyzed, is analyzed accordingly as a result, the analysis result and disease
Diagnosis is related.
After sorting out interrogation information, the embodiment of the present invention can also be analyzed interrogation information according to demand, obtain phase
The analysis answered as a result, due to interrogation be it is relevant to medical diagnosis on disease, the analysis result is also related to medical diagnosis on disease, specifically according to
It is determined according to analysis demand.
For example, the common problem of doctor can be counted to every kind of disease, it is supplied to the less doctor's behaviours reference of experience;
Interrogation information can be analyzed, develop Chinese medicine (doctor trained in Western medicine) artificial intelligence question answering system etc.;It can also be by counting, analyzing
Etc. modes determine the corresponding symptom of every kind of disease, treatment method etc..
Referring to Fig. 3, the step flow chart of another voice-based data processing method embodiment of the invention is shown,
In the present embodiment, the interrogation process data is the text identification that identifies of voice data as a result, can specifically include as follows
Step:
Step 302, the text identification result that voice data identifies is obtained.
The voice data is that interrogation collects in the process, and the voice data collected is obtained by speech recognition conversion
To text recognition result, text recognition result can be directly acquired.
Wherein, above-mentioned steps 104 are identified according to the interrogation process data, obtain corresponding first text data and
Second text data, it may include following steps 304.
Step 304, feature identification is carried out to the text identification result, isolates the first text data according to language feature
With the second text data.
It, can not be directly as asking since unknown every section words are which people says for being identified as the data of text
Information is examined, therefore, if the embodiment of the present invention identifies different user from text identification result and arranges interrogation information.Its
In, during interrogation, doctor would generally put question to symptom, and user can reply Symptoms, consultation of doctor break for corresponding disease,
Inspection, drug of needs of required work etc., to can identify doctor and patient from text identification result based on these features
Sentence, and then isolate the first text data and the second text data.
I.e. the embodiment of the present invention can collect the text of doctor's interrogation and the text of patient's interrogation in advance, and for each
The interrogation information analyzed is collected, to count language feature and patient and its family of doctor (i.e. target user)
Belong to the language feature of (i.e. other users), and establish corresponding model, convenient for distinguishing the text of different user based on the language feature
This.Wherein, it can determine that the language feature of different user establishes preset model by modes such as machine learning, probability statistics.
Wherein, the embodiment of the present invention can obtain a large amount of separated case text as training data, separated doctor
Case text is the interrogation information for identifying target user and other users, the positive information of text such as obtained in history according to identification.It can
To including doctor's content-data (the first text data of target user) and patient content's data (second of other users
Text data) it is trained respectively, doctor's content model and patient content's model are obtained, both certain models can synthesize one
Preset model may recognize that the sentence of doctor and the sentence of patient based on the preset model.
For example, it is the question sentence with symptom class vocabulary that doctor's content is generally mostly in the case information that interrogation obtains, such as you
Feel how, have what symptom, it is where uncomfortable etc.;And patient content is generally mostly to be asked with Symptoms, epidemic disease class
Sentence, such as whether I catch a cold, and is XX disease etc.;It is the declarative sentence with symptom and drug that doctor's content is generally mostly, such as
You are viral influenza, you can have some XX medicine etc..To which the sentence content of doctor and the sentence content of patient all have ratio
More significant language feature, therefore doctor's content model and patient content's mould can be obtained according to the training of separated case information
Type.
Feature identification is carried out to the text identification result, isolates the first text data and the second text according to language feature
Notebook data, comprising: the text identification result is divided, corresponding text fragments are obtained;Using preset model to described
Text fragments are identified, determine that the language feature that the text fragments have, the language feature include first language feature
With second language feature;The first text data is generated using the text fragments with first language feature, and, using having the
The text fragments of two language features generate the second text data.First text identification result can be divided, it can be according to Chinese
Sentence feature etc., is divided into sentence for text identification result, can also divide to obtain multiple text fragments according to other modes.Then will
Each text fragments sequentially input preset model, are identified by preset model to text fragments, each so as to identify
Language feature possessed by text fragments.Certainly, which may be alternatively provided as based on the language feature identified, be this article
This segment divides owning user.Wherein, using language this feature of target user as first language feature, by the language of other users
Feature is sayed as second language feature, then preset model can be used and determine that text fragments have first language feature or the second language
Say feature.Then the first text will can be generated with the text fragments of first language feature according to the stripe sequence of text fragments
Data, and, the second text data is generated using the text fragments with second language feature.
Step 306, according to first text data and the second text data, interrogation information is obtained.
Step 308, the interrogation information is analyzed, is analyzed accordingly as a result, the analysis result and disease
Diagnosis is related.
Correspond to the sequence of sound bite according to the first text data and the second text data, it can will be in the first text data respectively
Each text fragments in text fragments and the second text data, are ranked up according to corresponding sequence, to obtain corresponding interrogation
Information can record doctor in the interrogation information in the problems in interrogation and the answer of respective patient (family members), and doctor
The various information such as raw diagnosis, doctor's advice.
After sorting out interrogation information, the embodiment of the present invention can also be analyzed interrogation information according to demand, obtain phase
The analysis answered as a result, due to interrogation be it is relevant to medical diagnosis on disease, the analysis result is also related to medical diagnosis on disease, specifically according to
It is determined according to analysis demand.
For example, the common problem of doctor can be counted to every kind of disease, it is supplied to the less doctor's behaviours reference of experience;
Interrogation information can be analyzed, develop Chinese medicine (doctor trained in Western medicine) artificial intelligence question answering system etc.;It can also be by counting, analyzing
Etc. modes determine the corresponding symptom of every kind of disease, treatment method etc..
Habit, the demand of case are recorded for doctor, are based on above scheme, can be by way of recording, it will be with patient's
Communication process is recorded, and the sentence of doctor and patient is then demultiplex out, and is distinguished and is arranged, is supplied in the form of a dialog
Doctor's behaviours case can be effectively reduced the time of doctor's institute's telephone expenses in case arrangement.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method
It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to
According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should
Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented
Necessary to example.
Referring to Fig. 4, a kind of structural block diagram of voice-based data processing equipment embodiment of the invention is shown, specifically
May include following module:
Data acquisition module 402, for obtaining interrogation process data, the interrogation process data is adopted in the process according to interrogation
The voice data of collection determines.
Text identification module 404 obtains corresponding first textual data for being identified according to the interrogation process data
According to the second text data, wherein first text data belongs to a target user, and second text data, which belongs to, to be removed
Other users except the target user.
Information determination module 406, for obtaining interrogation information according to first text data and the second text data.
Wherein, at least two users can carry out communication interaction during the interrogation, a user is doctor, other use
Family is patient, family numbers of patients etc..E.g. according to doctor's outpatient service in one day acquisition, then it wherein will include a doctor and several trouble
Person, it is also possible to have one or several family numbers of patients.Therefore can be by doctor's behaviours target user for interrogation record, then the first text
Data are the corresponding interrogation text data of doctor, and using the text data of at least one other users as the second textual data
According to the i.e. corresponding interrogation text data of patient and family members.Since interrogation is usually the process of question and answer, above-mentioned first textual data
It is made up of according to can be with the second text data multiple text fragments, therefore can time based on text fragments and to application
Family obtains interrogation information.
Such as a kind of example of interrogation information is as follows:
What symptom do 2017-4-23 10:23AM doctor A: you have? patient B: my XXX is uncomfortable.Doctor A: either with or without
XXX? patient B, has ...
In actual treatment, it may also be combined with outpatient service record of hospital etc. and obtain patient information, to be distinguished in interrogation information
Different patient etc. out.
In conclusion for passing through acquisition determining interrogation process data during interrogation, it can be from interrogation process data
According to different user identify the first text data and the second text data, wherein first text data belongs to one
Target user, second text data belong to the other users in addition to the target user, can automatic distinguishing interrogation
The sentence of doctor, patient in the process, then according to first text data and the second text data, interrogation information is obtained, it can
Complete record interrogation process, automatic arranging obtain the contents such as case, save the finishing time of interrogation record.
Referring to Fig. 5, a kind of structural block diagram of voice-based data processing equipment embodiment of the invention is shown, specifically
May include following module:
Wherein, the interrogation process data includes the text identification result that voice data and/or voice data identify.
The interrogation process data is voice data;The text identification module 404 may include:
Separate submodule 40402, for according to vocal print feature, isolated from the voice data the first voice data and
Second speech data.
Speech recognition submodule 40404, for carrying out voice respectively to first voice data and second speech data
Identification obtains corresponding first text data and the second text data.
Wherein, the separation submodule 40402, for the voice data to be divided into multiple sound bites;According to sound
Line feature determines the first voice data and second speech data using the sound bite.
Preferably, the separation submodule 40402, for being carried out respectively to each sound bite using benchmark vocal print feature
Match, wherein the benchmark vocal print feature is the vocal print feature of target user;Obtain the voice being consistent with the benchmark vocal print feature
Segment obtains corresponding first voice data;The sound bite not being consistent with the benchmark vocal print feature is obtained, is obtained corresponding
Second speech data.
In the embodiment of the present invention, during to interrogation before the acquisition of voice data, doctor (target user) can first be acquired
One section of voice is as reference data, in order to identify the vocal print feature i.e. benchmark vocal print feature of doctor from the reference data.
Speech recognition modeling can also be set in the embodiment of the present invention, after voice data is inputted the speech recognition modeling, can will be met
The sound bite of benchmark voice print database is separated with the sound bite of other vocal print features, to obtain each voice sheet of target user
The sound bite of section and other users.In doctor's outpatient procedures, a doctor is usually only included in the case information of composition, and is suffered from
Person may have it is multiple, so that its corresponding a large amount of case sample can be obtained for some particular doctor through the above way.
Preferably, the separation submodule 40402, identifies for the vocal print feature to each sound bite;It unites respectively
The quantity that each vocal print feature corresponds to sound bite is counted, determining has the maximum vocal print feature of quantity of sound bite, using described
The corresponding sound bite of vocal print feature generates the first voice data, wherein the maximum vocal print feature of quantity is the sound of target user
Line feature;Second speech data is generated using the sound bite for being not belonging to the first voice data.
It may be the record data of a multiple outpatient service of doctor by interrogation process data based on the characteristic of interrogation process,
Therefore, doctor often occupies more time and different patients and its family members' exchange interrogation, i.e. voice in this process
The voice quantity of doctor (target user) is most in data, therefore the amount field subhead of sound bite can be corresponded to according to different user
User and other users are marked, and obtain the first voice data and second speech data.
In the embodiment of the present invention, since voice data is acquired in the scene of multi-conference, a voice sheet
It may include the vocal print feature of multiple users in section.It is multiple for identifying from a sound bite to separate submodule 40402
The case where vocal print feature, can be performed following processing: in different vocal print features be occurred in different time, if vocal print feature is
The sound bite can then be added in second speech data by the vocal print feature of other users;And if vocal print feature includes target
The vocal print feature of user and the vocal print feature of other users can will then be added to corresponding after the subdivided sub-piece of the sound bite
In voice data.When different vocal print features are that occur in the same time, i.e., the same time has at least two users speaking, then
If vocal print feature is the vocal print feature of other users, which can be added in second speech data, and if vocal print
Feature includes the vocal print feature of target user and the vocal print feature of other users, can be divided according to demand, such as by the voice sheet
Section is classified as the sound bite of target user to obtain the first voice data, or the sound bite is classified as to the voice of other users
Segment is added respectively to obtain second speech data, or in the voice data of two kinds of users.
Preferably, the speech recognition submodule 40404, for distinguishing sound bite each in first voice data
Speech recognition is carried out, generates the first text data using the text fragments that identification obtains;To each language in the second speech data
Tablet section carries out speech recognition respectively, generates the second text data using the text fragments that identification obtains.Then the information determines
Module 406, for according to each text fragments in each text fragments in first text data and second text data point
The time sequencing for not corresponding to sound bite is ranked up each text fragments, obtains interrogation information.
Preferably, the interrogation process data is the text identification result that voice data identifies;The text identification
Module 404 isolates the first text data and the according to language feature for carrying out feature identification to the text identification result
Two text datas.
The text identification module 404, comprising:
Segment changes molecular modules 40406, for dividing to the text identification result, obtains corresponding text piece
Section.
Segment identifies submodule 40408, for identifying using preset model to the text fragments, determines the text
The language feature that this segment has, the language feature include first language feature and second language feature.
Wherein, the embodiment of the present invention can obtain a large amount of separated case text as training data, separated doctor
Case text is the interrogation information for identifying target user and other users, the positive information of text such as obtained in history according to identification.It can
To including doctor's content-data (the first text data of target user) and patient content's data (second of other users
Text data) it is trained respectively, doctor's content model and patient content's model are obtained, both certain models can synthesize one
Preset model may recognize that the sentence of doctor and the sentence of patient based on the preset model.For example, the case information that interrogation obtains
In, it is the question sentence with symptom class vocabulary that doctor's content is generally mostly, such as you feel how, have what symptom, where do not relax
Clothes etc.;And it is the question sentence with Symptoms, epidemic disease class that patient content is generally mostly, such as whether I catch a cold, and is XX disease
Deng;It is the declarative sentence with symptom and drug that doctor's content is generally mostly, such as you are viral influenza, you can have some XX medicine etc.
Deng.To which, the sentence content of doctor and the sentence content of patient all have the significant language feature of comparison, therefore can be according to having divided
From case information training obtain doctor's content model and patient content's model.
Preferably, text generation submodule 40410, for generating first using the text fragments with first language feature
Text data, and, the second text data is generated using the text fragments with second language feature.
Preferably, the device further include: analysis module 408 obtains phase for analyzing the interrogation information
The analysis answered is as a result, the analysis result is related to medical diagnosis on disease.
Correspond to the sequence of sound bite according to the first text data and the second text data, it can will be in the first text data respectively
Each text fragments in text fragments and the second text data, are ranked up according to corresponding sequence, to obtain corresponding interrogation
Information can record doctor in the interrogation information in the problems in interrogation and the answer of respective patient (family members), and doctor
The various information such as raw diagnosis, doctor's advice.
After sorting out interrogation information, the embodiment of the present invention can also be analyzed interrogation information according to demand, obtain phase
The analysis answered as a result, due to interrogation be it is relevant to medical diagnosis on disease, the analysis result is also related to medical diagnosis on disease, specifically according to
It is determined according to analysis demand.
For example, the common problem of doctor can be counted to every kind of disease, it is supplied to the less doctor's behaviours reference of experience;
Interrogation information can be analyzed, develop Chinese medicine (doctor trained in Western medicine) artificial intelligence question answering system etc.;It can also be by counting, analyzing
Etc. modes determine the corresponding symptom of every kind of disease, treatment method etc..
Habit, the demand of case are recorded for doctor, are based on above scheme, can be by way of recording, it will be with patient's
Communication process is recorded, and the sentence of doctor and patient is then demultiplex out, and is distinguished and is arranged, is supplied in the form of a dialog
Doctor's behaviours case can be effectively reduced the time of doctor's institute's telephone expenses in case arrangement.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
Fig. 6 is a kind of electronic equipment 600 for voice-based data processing shown according to an exemplary embodiment
Structural block diagram.For example, electronic equipment 600 can be mobile phone, computer, digital broadcasting terminal, messaging device, trip
Play console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc.;It is also possible to server device, such as services
Device.
Referring to Fig. 6, electronic equipment 600 may include following one or more components: processing component 602, memory 604,
Power supply module 606, multimedia component 608, audio component 610, the interface 612 of input/output (I/O), sensor module 614,
And communication component 616.
The integrated operation of the usual controlling electronic devices 600 of processing component 602, such as with display, call, data are logical
Letter, camera operation and record operate associated operation.Processing element 602 may include one or more processors 620 to hold
Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 602 may include one or more moulds
Block, convenient for the interaction between processing component 602 and other assemblies.For example, processing component 602 may include multi-media module, with
Facilitate the interaction between multimedia component 608 and processing component 602.
Memory 604 is configured as storing various types of data to support the operation in equipment 600.These data are shown
Example includes the instruction of any application or method for operating on electronic equipment 600, contact data, telephone directory number
According to, message, picture, video etc..Memory 604 can by any kind of volatibility or non-volatile memory device or they
Combination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable
Programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, quick flashing
Memory, disk or CD.
Electric power assembly 604 provides electric power for the various assemblies of electronic equipment 600.Electric power assembly 604 may include power supply pipe
Reason system, one or more power supplys and other with for electronic equipment 600 generate, manage, and distribute the associated component of electric power.
Multimedia component 608 includes the screen of one output interface of offer between the electronic equipment 600 and user.
In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface
Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches
Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding
The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments,
Multimedia component 608 includes a front camera and/or rear camera.When electronic equipment 600 is in operation mode, as clapped
When taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition
Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a Mike
Wind (MIC), when electronic equipment 600 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone
It is configured as receiving external audio signal.The received audio signal can be further stored in memory 604 or via logical
Believe that component 616 is sent.In some embodiments, audio component 610 further includes a loudspeaker, is used for output audio signal.
I/O interface 612 provides interface between processing component 402 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock
Determine button.
Sensor module 614 includes one or more sensors, for providing the state of various aspects for electronic equipment 600
Assessment.For example, sensor module 614 can detecte the state that opens/closes of equipment 600, the relative positioning of component, such as institute
The display and keypad that component is electronic equipment 600 are stated, sensor module 614 can also detect electronic equipment 600 or electronics
The position change of 600 1 components of equipment, the existence or non-existence that user contacts with electronic equipment 600,600 orientation of electronic equipment
Or the temperature change of acceleration/deceleration and electronic equipment 600.Sensor module 614 may include proximity sensor, be configured to
It detects the presence of nearby objects without any physical contact.Sensor module 614 can also include optical sensor, such as
CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which can be with
Including acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 616 is configured to facilitate the communication of wired or wireless way between electronic equipment 600 and other equipment.
Electronic equipment 400 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one
In example property embodiment, communication component 614 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel
Relevant information.In one exemplary embodiment, the communication component 614 further includes near-field communication (NFC) module, short to promote
Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module
(UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, electronic equipment 600 can be by one or more application specific integrated circuit (ASIC), number
Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided
It such as include the memory 604 of instruction, above-metioned instruction can be executed by the processor 620 of electronic equipment 400 to complete the above method.Example
Such as, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, soft
Disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipment
When device executes, so that electronic equipment is able to carry out a kind of voice-based data processing method, which comprises obtain interrogation
Process data, the interrogation process data are determined according to the voice data acquired during interrogation;Number of passes is crossed according to the interrogation
According to being identified, corresponding first text data and the second text data are obtained, wherein first text data belongs to one
Target user, second text data belong to the other users in addition to the target user;According to first textual data
According to the second text data, obtain interrogation information.
Optionally, the interrogation process data includes the text identification knot that voice data and/or voice data identify
Fruit.
Optionally, the interrogation process data is voice data;It is described to be identified according to the interrogation process data, it obtains
Take corresponding first text data and the second text data, comprising: according to vocal print feature, the is isolated from the voice data
One voice data and second speech data;Speech recognition is carried out to first voice data and second speech data respectively, is obtained
Take corresponding first text data and the second text data.
Optionally, described according to vocal print feature, the first voice data and the second voice are isolated from the voice data
Data, comprising: the voice data is divided into multiple sound bites;According to vocal print feature, determined using the sound bite
First voice data and second speech data.
Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite
According to, comprising: each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print feature is target
The vocal print feature of user;The sound bite being consistent with the benchmark vocal print feature is obtained, corresponding first voice data is obtained;It obtains
The sound bite not being consistent with the benchmark vocal print feature is taken, corresponding second speech data is obtained.
Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite
According to, comprising: the vocal print feature of each sound bite is identified;Count the quantity that each vocal print feature corresponds to sound bite;It determines
The maximum vocal print feature of quantity with sound bite generates the first voice number using the corresponding sound bite of the vocal print feature
According to;Second speech data is generated using the sound bite for being not belonging to the first voice data.
Optionally, speech recognition carried out to first voice data and second speech data respectively, obtains corresponding the
One text data and the second text data, comprising: speech recognition is carried out respectively to each sound bite in first voice data,
The first text data is generated using the text fragments that identification obtains;Each sound bite in the second speech data is carried out respectively
Speech recognition generates the second text data using the text fragments that identification obtains.
Optionally, the interrogation process data is the text identification result that voice data identifies;Described in the foundation
Interrogation process data is identified, obtains corresponding first text data and the second text data, comprising: to the text identification
As a result feature identification is carried out, isolates the first text data and the second text data according to language feature.
Optionally, feature identification is carried out to the text identification result, isolates the first text data according to language feature
With the second text data, comprising: divided to the text identification result, obtain corresponding text fragments;Using default mould
Type identifies the text fragments, determines that the language feature that the text fragments have, the language feature include first
Language feature and second language feature;The first text data is generated using the text fragments with first language feature, and, it adopts
The second text data is generated with the text fragments with second language feature.
Optionally, further includes: the interrogation information is analyzed, is analyzed accordingly as a result, the analysis result
It is related to medical diagnosis on disease.
Fig. 7 is a kind of electronics for voice-based data processing that the present invention is shown according to another exemplary embodiment
The structural schematic diagram of equipment 700.The electronic equipment 700 can be server, which can produce because configuration or performance are different
Raw bigger difference, may include one or more central processing units (central processing units, CPU)
722 (for example, one or more processors) and memory 732, one or more storage application programs 742 or data
744 storage medium 730 (such as one or more mass memory units).Wherein, memory 732 and storage medium 730
It can be of short duration storage or persistent storage.The program for being stored in storage medium 730 may include one or more module (figures
Show and do not mark), each module may include to the series of instructions operation in server.Further, central processing unit 722
It can be set to communicate with storage medium 730, execute the series of instructions operation in storage medium 730 on the server.
Server can also include one or more power supplys 726, one or more wired or wireless networks connect
Mouthfuls 750, one or more input/output interfaces 758, one or more keyboards 756, and/or, one or one with
Upper operating system 741, such as Windows ServerTM, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
In the exemplary embodiment, server is configured to by one or more than one central processing unit 722 executes one
A or more than one program includes the instruction for performing the following operation: obtaining interrogation process data, number of passes is crossed in the interrogation
It is determined according to according to the voice data acquired during interrogation;It is identified according to the interrogation process data, obtains corresponding the
One text data and the second text data, wherein first text data belongs to a target user, second textual data
According to the other users belonged in addition to the target user;According to first text data and the second text data, asked
Examine information.
Optionally, the interrogation process data includes the text identification knot that voice data and/or voice data identify
Fruit.
Optionally, the interrogation process data is voice data;It is described to be identified according to the interrogation process data, it obtains
Take corresponding first text data and the second text data, comprising: according to vocal print feature, the is isolated from the voice data
One voice data and second speech data;Speech recognition is carried out to first voice data and second speech data respectively, is obtained
Take corresponding first text data and the second text data.
Optionally, described according to vocal print feature, the first voice data and the second voice are isolated from the voice data
Data, comprising: the voice data is divided into multiple sound bites;According to vocal print feature, determined using the sound bite
First voice data and second speech data.
Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite
According to, comprising: each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print feature is target
The vocal print feature of user;The sound bite being consistent with the benchmark vocal print feature is obtained, corresponding first voice data is obtained;It obtains
The sound bite not being consistent with the benchmark vocal print feature is taken, corresponding second speech data is obtained.
Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound bite
According to, comprising: the vocal print feature of each sound bite is identified;Count the quantity that each vocal print feature corresponds to sound bite;It determines
The maximum vocal print feature of quantity with sound bite generates the first voice number using the corresponding sound bite of the vocal print feature
According to;Second speech data is generated using the sound bite for being not belonging to the first voice data.
Optionally, speech recognition carried out to first voice data and second speech data respectively, obtains corresponding the
One text data and the second text data, comprising: speech recognition is carried out respectively to each sound bite in first voice data,
The first text data is generated using the text fragments that identification obtains;Each sound bite in the second speech data is carried out respectively
Speech recognition generates the second text data using the text fragments that identification obtains.
Optionally, the interrogation process data is the text identification result that voice data identifies;Described in the foundation
Interrogation process data is identified, obtains corresponding first text data and the second text data, comprising: to the text identification
As a result feature identification is carried out, isolates the first text data and the second text data according to language feature.
Optionally, feature identification is carried out to the text identification result, isolates the first text data according to language feature
With the second text data, comprising: divided to the text identification result, obtain corresponding text fragments;Using default mould
Type identifies the text fragments, determines that the language feature that the text fragments have, the language feature include first
Language feature and second language feature;The first text data is generated using the text fragments with first language feature, and, it adopts
The second text data is generated with the text fragments with second language feature.
Optionally, server is by one or more than one processor 522 executes the one or more programs
Include the instruction for being also used to perform the following operation: the interrogation information is analyzed, is analyzed accordingly as a result, described point
It is related to medical diagnosis on disease to analyse result.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate
Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and
The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can
With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program
The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions
In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these
Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals
Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices
Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram
The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices
In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet
The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram
The function of being specified in frame or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that
Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus
The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart
And/or in one or more blocks of the block diagram specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases
This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as
Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap
Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article
Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited
Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.
Above to a kind of corpus abstracting method provided by the present invention, a kind of corpus draw-out device and a kind of electronic equipment,
It is described in detail, used herein a specific example illustrates the principle and implementation of the invention, the above reality
The explanation for applying example is merely used to help understand method and its core concept of the invention;Meanwhile for the general technology of this field
Personnel, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this theory
Bright book content should not be construed as limiting the invention.
Claims (11)
1. a kind of voice-based data processing method characterized by comprising
Interrogation process data is obtained, the interrogation process data is determined according to the voice data acquired during interrogation;
It is identified according to the interrogation process data, obtains corresponding first text data and the second text data, wherein institute
It states the first text data and belongs to a target user, second text data belongs to other use in addition to the target user
Family;
According to first text data and the second text data, interrogation information is obtained.
2. the method according to claim 1, wherein the interrogation process data is voice data;
It is described to be identified according to the interrogation process data, obtain corresponding first text data and the second text data, packet
It includes:
According to vocal print feature, the first voice data and second speech data are isolated from the voice data;
Speech recognition carried out respectively to first voice data and second speech data, obtain corresponding first text data and
Second text data.
3. according to the method described in claim 2, it is characterized in that, the foundation vocal print feature, divides from the voice data
Separate out the first voice data and second speech data, comprising:
The voice data is divided into multiple sound bites;
According to vocal print feature, the first voice data and second speech data are determined using the sound bite.
4. according to the method described in claim 3, it is characterized in that, the foundation vocal print feature, true using the sound bite
Fixed first voice data and second speech data, comprising:
Each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print feature is target user
Vocal print feature;
The sound bite being consistent with the benchmark vocal print feature is obtained, corresponding first voice data is obtained;
The sound bite not being consistent with the benchmark vocal print feature is obtained, corresponding second speech data is obtained.
5. according to the method described in claim 3, it is characterized in that, the foundation vocal print feature, true using the sound bite
Fixed first voice data and second speech data, comprising:
The vocal print feature of each sound bite is identified;
Count the quantity that each vocal print feature respectively corresponds sound bite;
It determines the maximum vocal print feature of quantity with sound bite, generates the using the corresponding sound bite of the vocal print feature
One voice data;
Second speech data is generated using the sound bite for being not belonging to first voice data.
6. according to the method described in claim 2, it is characterized in that, described to first voice data and second speech data
Speech recognition is carried out respectively, obtains corresponding first text data and the second text data, comprising:
Speech recognition is carried out to each sound bite in first voice data respectively, is generated using the text fragments that identification obtains
First text data;
Speech recognition is carried out to each sound bite in the second speech data respectively, is generated using the text fragments that identification obtains
Second text data;
Then, described according to first text data and the second text data, obtain interrogation information, comprising:
Language is respectively corresponded according to each text fragments in each text fragments in first text data and second text data
The time sequencing of tablet section is ranked up each text fragments, obtains interrogation information.
7. the method according to claim 1, wherein the interrogation process data is what voice data identified
Text identification result;
It is described to be identified according to the interrogation process data, obtain corresponding first text data and the second text data, packet
It includes:
Feature identification is carried out to the text identification result, isolates the first text data and the second textual data according to language feature
According to.
8. the method according to the description of claim 7 is characterized in that carrying out feature identification, foundation to the text identification result
Language feature isolates the first text data and the second text data, comprising:
The text identification result is divided, corresponding text fragments are obtained;
The text fragments are identified using preset model, determine the language feature that the text fragments have, institute's predicate
Say that feature includes target user's language feature and non-targeted user language feature;
The first text data is generated using the text fragments with target user's language feature, and, using with non-targeted use
The text fragments of family language feature generate the second text data.
9. a kind of voice-based data processing equipment characterized by comprising
Data acquisition module, for obtaining interrogation process data, the interrogation process data is according to the language acquired during interrogation
Sound data determine;
Text identification module obtains corresponding first text data and for being identified according to the interrogation process data
Two text datas, wherein first text data belongs to a target user, and second text data belongs to except the mesh
Mark the other users except user;
Information determination module, for obtaining interrogation information according to first text data and the second text data.
10. a kind of readable storage medium storing program for executing, which is characterized in that when the instruction in the storage medium is held by the processor of electronic equipment
When row, so that electronic equipment is able to carry out at the voice-based data as described in one or more in claim to a method 1-8
Reason method.
11. a kind of electronic equipment, which is characterized in that include memory and one or more than one program, wherein one
A perhaps more than one program is stored in memory and is configured to execute described one by one or more than one processor
A or more than one program includes the instruction for performing the following operation:
Interrogation process data is obtained, the interrogation process data is determined according to the voice data acquired during interrogation;
It is identified according to the interrogation process data, obtains corresponding first text data and the second text data, wherein institute
It states the first text data and belongs to a target user, second text data belongs to other use in addition to the target user
Family;
According to first text data and the second text data, interrogation information is obtained.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710384412.3A CN108962253A (en) | 2017-05-26 | 2017-05-26 | A kind of voice-based data processing method, device and electronic equipment |
PCT/CN2018/082702 WO2018214663A1 (en) | 2017-05-26 | 2018-04-11 | Voice-based data processing method and apparatus, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710384412.3A CN108962253A (en) | 2017-05-26 | 2017-05-26 | A kind of voice-based data processing method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108962253A true CN108962253A (en) | 2018-12-07 |
Family
ID=64395285
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710384412.3A Pending CN108962253A (en) | 2017-05-26 | 2017-05-26 | A kind of voice-based data processing method, device and electronic equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108962253A (en) |
WO (1) | WO2018214663A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582708A (en) * | 2020-04-30 | 2020-08-25 | 北京声智科技有限公司 | Medical information detection method, system, electronic device and computer-readable storage medium |
CN112118415A (en) * | 2020-09-18 | 2020-12-22 | 瑞然(天津)科技有限公司 | Remote diagnosis and treatment method and device, patient side terminal and doctor side terminal |
CN113555133A (en) * | 2021-05-31 | 2021-10-26 | 北京易康医疗科技有限公司 | Medical inquiry data processing method and device |
CN114520062A (en) * | 2022-04-20 | 2022-05-20 | 杭州马兰头医学科技有限公司 | Medical cloud communication system based on AI and letter creation |
CN118486440A (en) * | 2024-06-03 | 2024-08-13 | 江苏苏桦技术股份有限公司 | Intelligent diagnosis guiding system and method for medical self-help machine |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104268279A (en) * | 2014-10-16 | 2015-01-07 | 魔方天空科技(北京)有限公司 | Query method and device of corpus data |
CN104427292A (en) * | 2013-08-22 | 2015-03-18 | 中兴通讯股份有限公司 | Method and device for extracting a conference summary |
CN105469790A (en) * | 2014-08-29 | 2016-04-06 | 上海联影医疗科技有限公司 | Consultation information processing method and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326640A (en) * | 2016-08-12 | 2017-01-11 | 上海交通大学医学院附属瑞金医院卢湾分院 | Medical speech control system and control method thereof |
CN106328124A (en) * | 2016-08-24 | 2017-01-11 | 安徽咪鼠科技有限公司 | Voice recognition method based on user behavior characteristics |
-
2017
- 2017-05-26 CN CN201710384412.3A patent/CN108962253A/en active Pending
-
2018
- 2018-04-11 WO PCT/CN2018/082702 patent/WO2018214663A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104427292A (en) * | 2013-08-22 | 2015-03-18 | 中兴通讯股份有限公司 | Method and device for extracting a conference summary |
CN105469790A (en) * | 2014-08-29 | 2016-04-06 | 上海联影医疗科技有限公司 | Consultation information processing method and device |
CN104268279A (en) * | 2014-10-16 | 2015-01-07 | 魔方天空科技(北京)有限公司 | Query method and device of corpus data |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582708A (en) * | 2020-04-30 | 2020-08-25 | 北京声智科技有限公司 | Medical information detection method, system, electronic device and computer-readable storage medium |
CN112118415A (en) * | 2020-09-18 | 2020-12-22 | 瑞然(天津)科技有限公司 | Remote diagnosis and treatment method and device, patient side terminal and doctor side terminal |
CN113555133A (en) * | 2021-05-31 | 2021-10-26 | 北京易康医疗科技有限公司 | Medical inquiry data processing method and device |
CN114520062A (en) * | 2022-04-20 | 2022-05-20 | 杭州马兰头医学科技有限公司 | Medical cloud communication system based on AI and letter creation |
CN114520062B (en) * | 2022-04-20 | 2022-07-22 | 杭州马兰头医学科技有限公司 | Medical cloud communication system based on AI and letter creation |
CN118486440A (en) * | 2024-06-03 | 2024-08-13 | 江苏苏桦技术股份有限公司 | Intelligent diagnosis guiding system and method for medical self-help machine |
Also Published As
Publication number | Publication date |
---|---|
WO2018214663A1 (en) | 2018-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108962253A (en) | A kind of voice-based data processing method, device and electronic equipment | |
CN108899037B (en) | Animal voiceprint feature extraction method and device and electronic equipment | |
CN104457955B (en) | Body weight information acquisition method, apparatus and system | |
CN105389304B (en) | Event Distillation method and device | |
CN109002184B (en) | Association method and device for candidate words of input method | |
CN110634472B (en) | Speech recognition method, server and computer readable storage medium | |
CN107146631B (en) | Music identification method, note identification model establishment method, device and electronic equipment | |
CN109558599A (en) | A kind of conversion method, device and electronic equipment | |
CN106406562A (en) | Data processing method and device | |
WO2018120447A1 (en) | Method, device and equipment for processing medical record information | |
CN108665889A (en) | The Method of Speech Endpoint Detection, device, equipment and storage medium | |
CN106919629A (en) | The method and device of information sifting is realized in group chat | |
CN108628819A (en) | Treating method and apparatus, the device for processing | |
CN111739535A (en) | Voice recognition method and device and electronic equipment | |
CN106777016A (en) | The method and device of information recommendation is carried out based on instant messaging | |
CN110491384B (en) | Voice data processing method and device | |
CN109036404A (en) | Voice interactive method and device | |
CN112820300B (en) | Audio processing method and device, terminal and storage medium | |
CN109102813B (en) | Voiceprint recognition method and device, electronic equipment and storage medium | |
CN110634570A (en) | Diagnostic simulation method and related device | |
CN108268667A (en) | Audio file clustering method and device | |
CN107247794A (en) | Topic bootstrap technique, live broadcast device and terminal device in live | |
CN109145151B (en) | Video emotion classification acquisition method and device | |
CN105930522A (en) | Intelligent music recommendation method, system and device | |
CN109102812B (en) | Voiceprint recognition method and system and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181207 |
|
RJ01 | Rejection of invention patent application after publication |