CN110491385A - Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium - Google Patents
Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium Download PDFInfo
- Publication number
- CN110491385A CN110491385A CN201910669971.8A CN201910669971A CN110491385A CN 110491385 A CN110491385 A CN 110491385A CN 201910669971 A CN201910669971 A CN 201910669971A CN 110491385 A CN110491385 A CN 110491385A
- Authority
- CN
- China
- Prior art keywords
- sound pick
- voice
- sound
- source
- pick
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/326—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
- H04R29/004—Monitoring arrangements; Testing arrangements for microphones
Abstract
The embodiment of the invention discloses a kind of simultaneous interpretation methods, applied to voice processing technology field, this method comprises: passing through multiple microphone pickup voices, determine the direction in the voice source picked up, and determine the corresponding user in direction in the voice source, target voice is obtained after the voice is carried out signal reinforcement processing, the corresponding translation process of the user is enabled and translates the target voice, and show translation content.The embodiment of the invention also discloses a kind of synchronous translation apparatus and computer readable storage mediums, and the fluency of translation can be improved.
Description
Technical field
The invention belongs to voice processing technology field more particularly to a kind of simultaneous interpretation method, apparatus, electronic device and meter
Calculation machine readable storage medium storing program for executing.
Background technique
The simultaneous interpretation equipment translated at present, since the object spoken can not be accurately positioned, so most of still " turn by
The mode of turn ", that is, when user A loquiturs, user B cannot interrupt, responds, interrupt, and otherwise the translation of simultaneous interpretation equipment is just
Can be inaccurate, only when user A finishes words, user B speaks again, could to translate accurate, smooth.This translation side
Formula is inconvenient, so that interpersonal exchange is very mechanical, does not meet the natural law of person to person's talk.
Summary of the invention
The present invention provides a kind of simultaneous interpretation method, it is intended to solve in more people speech because pair spoken can not be accurately positioned
As generated translation has some setbacks, mechanical problem is exchanged.
The embodiment of the invention provides a kind of simultaneous interpretation methods, comprising:
Pass through multiple microphone pickup voices;
It determines the direction in the voice source picked up, and determines the corresponding user in direction in the voice source;
Target voice is obtained after the voice is carried out signal reinforcement processing;
It enables the corresponding translation process of the user and translates the target voice, and show translation content.
The embodiment of the invention also provides a kind of synchronous translation apparatus, comprising:
Pickup model, for passing through multiple microphone pickup voices;
Determining module for determining the direction in the voice source picked up, and determines that the direction in the voice source is corresponding
User;
Voice reinforcing module, for obtaining target voice after the voice is carried out signal reinforcement processing;
Translation module is translated the target voice for enabling the corresponding translation process of the user, and is shown in translation
Hold.
The embodiment of the invention also provides a kind of electronic device, the electronic device includes: memory, processor and storage
On the memory and the computer program that can run on the processor, the processor execute the computer program
When, realize the simultaneous interpretation method as shown in above-described embodiment.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, described
When computer program is executed by processor, the simultaneous interpretation method as shown in above-described embodiment is realized.
From the embodiments of the present invention it is found that by multiple microphone pickup voices, the voice source picked up is determined
Direction, and determine the corresponding user in direction in voice source, in conjunction with hardware by sound source direction technology, to distinguish the user to speak
The people in direction, different directions speaks, and sound will not interfere with each other, and obtains target voice after voice is carried out signal reinforcement processing,
The corresponding translation process special translating purpose voice of user is enabled, and shows translation content, each direction speaker has independent one
A translation channel, each channel individually carry out speech recognition, translation to the user, text are presented, and translation is smooth, high-efficient.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention.
Fig. 1 is the application scenarios schematic diagram of simultaneous interpretation method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of simultaneous interpretation method provided in an embodiment of the present invention;
Fig. 3 is sound source and sound pick-up positional diagram in simultaneous interpretation method provided in an embodiment of the present invention;
Fig. 4 be in simultaneous interpretation method provided in an embodiment of the present invention sound source to the time delay schematic diagram of each sound pick-up;
Fig. 5 is single directional type microphone appearance diagram in the embodiment of the present invention;
Fig. 6 is single directional type microphone pickup range schematic diagram in the embodiment of the present invention;
Fig. 7 is that the delay compensation process of sound source to each sound pick-up in simultaneous interpretation method provided in an embodiment of the present invention is illustrated
Figure;
Fig. 8 is meeting room schematic diagram of a scenario in simultaneous interpretation method provided in an embodiment of the present invention;
Fig. 9 is the schematic diagram of a scenario that user divides in multiple directions in simultaneous interpretation method provided in an embodiment of the present invention;
Figure 10 is the structural schematic diagram of packet synchronous translation apparatus provided in an embodiment of the present invention.
Specific embodiment
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention
Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described reality
Applying example is only a part of the embodiment of the present invention, and not all embodiments.Based on the embodiments of the present invention, those skilled in the art
Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Referring to Fig. 1, Fig. 1 is the application scenarios schematic diagram of simultaneous interpretation method provided in an embodiment of the present invention, two users
It speaks face-to-face, translation machine equipment picks up the voice of two users from two opposite sides respectively, for translation machine equipment, wherein
The sound source angle of release of one user is 0~180 °, the sound source angle of release of another user is 180 °~360 °, and translation machine equipment determination is picked up
Then the voice of pickup is carried out signal reinforcement, so that voice so that it is determined that being which user is speaking by the direction of the voice taken
Apparent, volume is bigger, and accuracy rate is high when translation.The corresponding independent translation process of each user, translation machine equipment starting
The corresponding translation process of the user to speak, by translation process according to pre-set translation languages, by the voice of pickup from
A kind of language translation is shown on the display screen of translation machine equipment at second of language, and by translation result, for another party
Family is checked.Alternatively, translation machine equipment can throw screen to screen, to more by way of throwing the video lines such as screen, connection HDMI
User checks.
Referring to fig. 2, Fig. 2 is the flow diagram of simultaneous interpretation method provided in an embodiment of the present invention, and this method can answer
With in scene shown in Fig. 1, and by translation machine equipment, such as handheld translation machine, desk-top translator, or there is speech recognition and turn over
Translate the realization of other computer equipments such as smart phone, the tablet computer of function.The translation machine equipment is built-in with microphone and multiple
Sound pick-up, or data line or wireless network connection peripheral hardware microphone and multiple peripheral hardware sound pick-ups can also be passed through.This method master
Include the following steps:
S101, pass through multiple microphone pickup voices;
The quantity and arrangement mode of sound pick-up can match according to the number and arrangement mode of onsite user.
It specifically can be 4 sound pick-ups, pick up the voice of 4 or 6 or more users, arrangement mode can be rectangle
Array, such as positioned at four vertex of translation machine equipment, or positioned at the side where the user of opposite side;It can also be that 2 are picked up
Sound device, picks up the voice of 2 users, and arrangement mode is positioned at the center of the side where the user of opposite side.
Sound pick-up is specifically as follows microphone.
S102, the direction for determining the voice source picked up, and determine the corresponding user in direction in the voice source;
Optionally, determine that there is following two ways in the direction in the voice source picked up:
The first: with sound pick-up be four, and according to rectangular array for.The most short side of rectangle is long to be greater than 5 centimetres.
Referring to Fig. 3, sound pick-up includes the first sound pick-up 10, the second sound pick-up 20, third sound pick-up 30 and the 4th sound pick-up
40.Be now placed in the user of the side of the first sound pick-up 10, the second sound pick-up 20, with, be located at third sound pick-up 30, the 4th pickup
It is near field dialogue between the user of the side of device 40, the judgement to Sounnd source direction can be realized using near field sounds model.It will
Sound source 11 is considered as a spherical wave.
Three-dimensional system of coordinate is established, the positive direction of the three-dimensional system of coordinate x-axis is from the first sound pick-up 10 to third sound pick-up 30
Between line direction, the positive direction of y-axis be from the line direction between first the 10 to the second sound pick-up of sound pick-up 20, z-axis
Positive direction is the direction of the sound source 11 from X/Y plane to the voice picked up.Wherein, X/Y plane, that is, x-axis and the plane where y-axis.
The sound source 11 for detecting the voice picked up reaches the first sound pick-up 10 and its excess-three sound pick-up in four sound pick-ups
Time difference, the time difference detected are respectively τ12、τ13And τ14.According to above-mentioned time difference and velocity of sound v, (velocity of sound is sound
The speed that sound is propagated in air, v=340 meter per second) calculate 11 four sound pick-ups of distance of sound source time:
Specifically, range difference of the sound source 11 at a distance from the first sound pick-up 10 and with the second sound pick-up 20 is equal to sound source 11
It reaches the first sound pick-up 10 and reaches the time difference τ of the second sound pick-up 2012Multiplied by velocity of sound v.
r1-r2=τ12v;
Wherein, r1For the distance of 11 the first sound pick-up of distance 10 of sound source of the voice of pickup;r2For 11 distance of sound source the second ten
The distance of sound device 20;M is sound source 11 in the line projected between the second sound pick-up 20 and the 4th sound pick-up 40 on X/Y plane
Distance;N is that sound source 11 is projecting to the first sound pick-up 10 at a distance from the line between the second sound pick-up 20 on X/Y plane;
θ1For the angle of the line and z-axis of sound source 11 and the first sound pick-up 10;θ2For the line and z-axis of sound source 11 and the second sound pick-up 20
Angle.
Further, range difference of the sound source 11 at a distance from the first sound pick-up 10 and with third sound pick-up 30 is equal to sound source
11 reach the first sound pick-up 10 and reach the time difference τ of third sound pick-up 3013Multiplied by velocity of sound v.
r1-r3=τ13v
Wherein, r1For the distance of the first sound pick-up of sound source distance 10 of the voice of pickup;r3It is sound source 11 apart from third pickup
The distance of device 30;L is the distance between the first sound pick-up 10 and the second sound pick-up 20, and is third sound pick-up 30 and the 4th pickup
The distance between device 40;D is the distance between the first sound pick-up and third sound pick-up, and is the second sound pick-up and the 4th sound pick-up
The distance between;M is sound source 11 in the line projected between the second sound pick-up 20 and the 4th sound pick-up 40 on X/Y plane
Distance;N is that sound source 11 is projecting to the first sound pick-up 10 at a distance from the line between the second sound pick-up 20 on X/Y plane;θ1
For the angle of the line and z-axis of sound source 11 and the first sound pick-up 10;θ3For the line and z-axis of sound source 11 and third sound pick-up 30
Angle (not shown).
Further, range difference of the sound source 11 at a distance from the first sound pick-up 10 and with the 4th sound pick-up 40 is equal to sound source
11 reach the first sound pick-up 10 and reach the time difference τ of the 4th sound pick-up 4014Multiplied by velocity of sound v.
r1-r4=τ14v
Further, range difference of the sound source 11 at a distance from third sound pick-up 30 and with the second sound pick-up 20 is equal to sound source
11 reach the second sound pick-up 20 and reach the time difference τ of third sound pick-up 3023Multiplied by velocity of sound v.
r3-r2=τ23v
Further, range difference of the sound source 11 at a distance from the 4th sound pick-up 40 and with the second sound pick-up 20 is equal to sound source
11 reach the second sound pick-up 20 and reach the time difference τ of the 4th sound pick-up 4042Multiplied by velocity of sound v.
r4-r2=τ42v
More than, m2+n2=r2 2-(r1cosθ1)2=(r1-τ12v)2-(r1cosθ1)2
M-n=(r1-τ13v)2-(r1-τ12v)2-L2-D2/2(L-D)
By the above-mentioned each equation of solution, the solution formula of following m and n can be obtained:
N=(r1-τ14v)2-(r1-τ12v)2/-2D
M=(r1-τ13v)2-(r1-τ12v)2-L2-D2/2(L-D)+(r1-τ14v)2-(r1-τ12v)2/-2D
r1For the distance of 11 the first sound pick-up of distance 10 of sound source of the voice of pickup;r4For the 4th sound pick-up of 11 distance of sound source
40 distance;D is the distance between the first sound pick-up and third sound pick-up, and between the second sound pick-up and the 4th sound pick-up
Distance;M is that sound source 11 is projecting to the second sound pick-up 20 at a distance from the line between the 4th sound pick-up 40 on X/Y plane;n
The first sound pick-up 10 is being projected at a distance from the line between the second sound pick-up 20 on X/Y plane for sound source 11;θ1For sound source
11 and first sound pick-up 10 line and z-axis angle;θ4For the angle of the line and z-axis of sound source 11 and the 4th sound pick-up 40
(not shown).
By above-mentioned each formula joint account, m, n, θ can be obtained1、θ2、θ3、θ4、r1、r2、r3And r4.Wherein, it only calculates
Angle theta1、θ2、θ3、θ4, can just obtain r1、r2、r3And r4Exact value.
Further, r is used1、r2、r3、r4Respectively divided by velocity of sound v, the voice that the sending of sound source 11 is calculated reaches this four
The time of sound pick-up, the time for reaching the first sound pick-up 10 is t1, reach the second sound pick-up 20 time be t2, reach third pick up
The time of sound device 30 is t3, reach the 4th sound pick-up 40 time be t4.That is, the sound source, which reaches four sound pick-ups, time delay,
Time delay schematic diagram referring to fig. 4.
Finally, can determine the direction in voice source according to calculated time delay, for example, being up to time shortest sound pick-up
Direction be determined as the direction in the voice source picked up.
Second: sound pick-up is single directional type microphone;
Referring to figs. 5 and 6, Fig. 5 is the appearance diagram of single directional type microphone.Fig. 6 is single directional type microphone
Pickup range schematic diagram, according to the acoustical cavity of the microphone, each microphone only identifies the direction of the sound source of respective side
Range.It should be understood that Fig. 5 is only a kind of signal, it in practical applications can be without being limited thereto.
Specifically, single directional type microphone mainly uses the principle of barometric gradient (Pressure Gradient) to design
's.The pressure for the tow sides that the vibrating diaphragm in aperture after detecting the cavity of single directivity microphone is subject to, by tow sides
Pressure in direction where the big one side of pressure be determined as the direction in the voice source picked up.
In this way, the volume of the sound for the user 1 that first sound pick-up 10 and second sound pick-up 20 ipsilateral with user 1 receives
And information, to be far longer than the sound and information of other sides user 2;Similarly, second sound pick-up 30 and fourth ipsilateral with user 2
The volume and information of the sound for the user 2 that sound pick-up 40 receives will be far longer than the sound and information of user 1.
S103, target voice is obtained after the voice is carried out signal reinforcement processing;
Superposition beam-forming schemes (Delay and sum beam-forming, DSBF) realization is delayed to the language by weighting
The enhancing of sound.
Specifically, the voice which issues reaches the time of four sound pick-ups, including reaching the first sound pick-up 10
Time t1, reach the second sound pick-up 20 time t2, reach third sound pick-up 30 time t3, reach the 4th sound pick-up 40 when
Between t4, that is, the time delay that the voice reaches each sound pick-up is different, when further carrying out signal to the voice that each sound pick-up receives
Prolong compensation, so that the voice that sound pick-up receives reaches synchronous with the user's interactive voice for issuing the voice, and respectively picks up at this time
The received noise of sound device and the voice of user's (i.e. sound source) sending from opposite are simultaneously asynchronous, then to each sound pick-up delay compensation
Voice signal afterwards is weighted cumulative mean, due to the voice of each microphone pickup be it is synchronous, after weighted accumulation is average
The voice signal is completely remained, and nonsynchronous interference noise is weakened with cumulative mean, and then reaches improvement
The effect of signal-to-noise ratio.As shown in Figure 7.
When sound source is located near field, each received voice signal amplitude difference of sound pick-up clearly, receives voice signal
Signal-to-noise ratio also has certain difference.
In order to preferably improve signal-to-noise ratio, enhances the voice signal, assign greater weight, remaining pickup to specified sound pick-up
Device assigns smaller weight, is 30 He of third sound pick-up for example, assigning greater weight for the first sound pick-up 10 and the second sound pick-up 20
4th sound pick-up 40 assigns smaller weight and is negative sense weight, by a large amount of emulation experiment, obtains most reasonable weight and matches
Set scheme.It is largely retained to realize close to user's one's voice in speech of the first sound pick-up 10 and the second sound pick-up 20, into
And enable translation process and user's word is translated, and noise and close third sound pick-up 30 and the 4th pickup
User's one's voice in speech of device 40 is largely deleted, and will not trigger further translation process.
Optionally, due to by above-mentioned formula calculation delay there may be certain error, can also directly by time
The mode being respectively worth in preset time delay numberical range is gone through, calculating is compensated to the waveform of the sound of each microphone pickup, and
According to calculated result, it will be superimposed corresponding time delay numerical value when audio amplitude maximum, be determined as the time delay for determining sound source, thus
The accuracy and convenience of calculation delay can be improved.
S104, it enables the corresponding translation process of the user and translates the target voice, and show translation content.
The target voice is to carry out the strengthened voice of signal, and sound quality, volume are all further enhanced, and enables and is somebody's turn to do
The corresponding unique translation process of user translates the target voice, which is interacted after obtaining translation with cloud
Text results, be presented on other users side.Or by external large screen, the text results are shown.
Optionally, translation content presentation interface can be generated in the screen of translation machine equipment, which includes
Multiple sub-interfaces, the quantity of sub-interface and the quantity of user are corresponding, and corresponding with the position of sound pick-up.Multiple sub-interfaces are used respectively
In the recognition result of the voice of display different directions.Such as: under the scene of both sides' talks, it is assumed that two users are located at and turn over
The two sides up and down of machine are translated, then generates the display interface comprising two sub-interfaces 1 and 2, is located at upper and lower the two of translator screen
Half part.Wherein, it is used to show the voice on the upside of screen (that is, configuration is being turned over positioned at the sub-interface 1 of screen top half
Translate the voice of the microphone pickup above machine) translation result, positioned at screen lower half portion sub-interface 2 for show from
The translation result of voice (that is, voice of microphone pickup of the configuration below translator) on the downside of screen, vice versa.In reality
In the application of border, the corresponding relationship of the position of translation content and sound pick-up that each sub-interface is shown be can be customized by users.
It should be understood that can refer to the scene of above-mentioned both sides' talks, herein not under the scene of tripartite or more multiparty conference
It repeats again.Like this, by showing the voiced translation from different directions using multiple sub-interfaces as a result, can definitely turn over
The corresponding relationship translating result and translating between object facilitates user especially under the scene of tripartite or more multiparty conference
Recognizing bottom is what who has said, avoids the occurrence of the misunderstanding generated because fastening one person's story upon another person.
Optionally, each sub-interface is interactive interface, that is, user is in the predetermined registration operation of each sub-interface for response, turns over to display
Content is translated to be adjusted, such as: amplify, decrease font size, changes the color of font, word, word, sentence in modification translation content etc..
Further, the voice of multiple directions is picked up, and after terminating dialogue, the voice of pickup is generated not according to direction
The speech audio file is imported Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition, to identify the voice by same speech audio file
The corresponding user of audio file, and according to Application on Voiceprint Recognition as a result, according to the timestamp that voice picks up, in the speech audio file
Time shaft on identify the corresponding user of voice, each speech audio file of identification generates corresponding text file respectively, and
Each text file is merged into a summary text file, in summary text file comprising the corresponding word content of every voice,
It is corresponding between the content and the position of user of the corresponding relationship and user speech of every voice timestamp issued and user
Relationship.
In one example, scene as shown in Figure 8, under the scene of meeting interview indoors, will translation machine equipment pendulum in conference table
Center, one end is against conference table side delegate to the meeting user 1, user 2 and user 3, and one end is against conference table other side meeting generation
Table user A, user B and user C, translation machine equipment have 4 sound pick-ups, specially microphone 1, microphone 2,3 and of microphone
Microphone 4, can distinguish the user of two sides after the conference is over, and distinguish what who has said.
Specifically, microphone 1 and microphone 2 collect the sound of user 1, user 2 and user 3;Microphone 3 and microphone 4
Collect the sound of user A, user B and user C.After the conference is over, the sound in 2 direction of microphone 1 and microphone generates voice sound
Frequency file m1;The sound in 4 direction of microphone 3 and microphone generates speech audio file m2.Speech audio file m1 and m2 are led
Enter to Application on Voiceprint Recognition structure, is respectively user 1, user 2, which user's word in user 3 on time shaft from being identified in m1, from
Identify in m2 is respectively user A, user B, which user's word in user C on time shaft.By m1 and m2 each by voice
Identification obtains text file text1 and text2;Timestamp when being spoken by collecting merges into two text files completely
The dialogue word content of all users is described in text file text.Pass through the timestamp of Application on Voiceprint Recognition speaker, insertion
It into text, identifies which content all users have respectively said, thus obtains a detailed meeting summary text, and identify
The user 1, user 2 of conference table side, the respective speech content of user 3 out, identify the user 1 of the conference table other side, user 2,
The respective speech content of user 3.To easily automatically generate meeting summary, the efficiency for arranging conference content is improved, and can be with
Show the corresponding relationship of the speech content of each user and the position of user and speech content.
Further, in this embodiment simultaneous interpretation method be applied to user respectively translation machine equipment both ends field
Scape can be applied in the scene at translation four end of machine equipment.As shown in Figure 9.
In the embodiment of the present invention, by multiple microphone pickup voices, the direction in the voice source picked up is determined, and really
The corresponding user in direction in attribute sound source is different to distinguish the user direction spoken in conjunction with hardware by sound source direction technology
The people in direction speaks, and sound will not interfere with each other, and obtains target voice after voice is carried out signal reinforcement processing, enables user couple
The translation process special translating purpose voice answered, and show translation content, each direction speaker has independent translation channel,
Each channel individually carries out speech recognition, translation to the user, text is presented, and translation is smooth, high-efficient.
0, Figure 10 is the structural schematic diagram of synchronous translation apparatus provided in an embodiment of the present invention referring to Figure 1, for the ease of
Illustrate, only parts related to embodiments of the present invention are shown.The exemplary synchronous translation apparatus of Figure 10 can be shown in earlier figures 1
The executing subject for the simultaneous interpretation method that embodiment provides, which can be translation machine equipment, alternatively, this is in unison
Interpreting device is built-in in the terminal, and terminal includes PC computer, mobile phone and other electronic equipments.The synchronous translation apparatus includes:
Pickup model 401, for passing through multiple microphone pickup voices;
Determining module 402 for determining the direction in the voice source picked up, and determines the direction pair in the voice source
The user answered;
Voice reinforcing module 403, for obtaining target voice after the voice is carried out signal reinforcement processing;
Translation module 404 translates the target voice for enabling the corresponding translation process of the user, and shows translation
Content.
Further, it is determined that module 402, the voice for being also used to detect pickup reaches in four sound pick-ups the
One sound pick-up respectively with reach the second sound pick-up, third sound pick-up and the 4th sound pick-up time difference;
According to each time difference and velocity of sound, calculate separately the sound source of the voice apart from four sound pick-ups away from
From calculation formula is as follows:
r1-r2=τ12v;
r1-r3=τ13v;
r1-r4=τ14v;
r3-r2=τ23v
r4-r2=τ42v
m2+n2=r2 2-(r1cosθ1)2=(r1-τ12v)2-(r1cosθ1)2
M-n=(r1-τ13v)2-(r1-τ12v)2-L2-D2/2(L-D)
N=(r1-τ14v)2-(r1-τ12v)2/-2D
M=(r1-τ13v)2-(r1-τ12v)2-L2-D2/2(L-D)+(r1-τ14v)2-(r1-τ12v)2/-2D
Wherein, using from first sound pick-up to the line direction the third sound pick-up as the positive direction of x-axis, with
It is the positive direction of y-axis from first sound pick-up to the line direction second sound pick-up, from X/Y plane to described
The direction of the sound source of voice is that the positive direction of z-axis establishes three-dimensional system of coordinate;
τ12First sound pick-up is reached for the sound source and reaches the time difference of second sound pick-up;τ13It is described
Sound source reaches first sound pick-up and reaches the time difference of the third sound pick-up;τ14Described the first ten are reached for the sound source
Sound device and the time difference for reaching the 4th sound pick-up;
L is the distance between the first sound pick-up and second sound pick-up, and is the third sound pick-up and the described 4th
The distance between sound pick-up;D is the distance between first sound pick-up and the third sound pick-up, and is second pickup
The distance between device and the 4th sound pick-up;r1Distance for the sound source apart from first sound pick-up;r2For the sound source
Distance apart from second sound pick-up;r3Distance for the sound source apart from third sound pick-up;r4For described in the sound source distance
The distance of 4th sound pick-up;M is that the sound source projects to second sound pick-up and the 4th sound pick-up on X/Y plane
Between line distance;N is that the sound source projects to first sound pick-up and second sound pick-up on X/Y plane
Between line distance;θ1For the sound source and the line of first sound pick-up and the angle of z-axis;θ2For the sound source with
The line of second sound pick-up and the angle of z-axis;θ3For the sound source and the line of the third sound pick-up and the folder of z-axis
Angle;θ4For the sound source and the line of the 4th sound pick-up and the angle of z-axis;
The distance that the sound source is arrived separately at four sound pick-ups obtains the sound source and arrives respectively divided by the velocity of sound
Up to the time of four sound pick-ups, and the direction for being up to time shortest sound pick-up is determined as the voice source picked up
Direction.
Further, voice reinforces processing module 403, and the superposition beam-forming schemes that are also used to be delayed by weighting are to institute's predicate
Sound obtains the target voice after carrying out signal enhancing processing.
Further, the device further include:
Meeting summary generation module (not shown) is pressed for picking up the voice of multiple directions, and by the voice of pickup
Different speech audio files is generated according to direction;The speech audio file is imported into Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition,
And the timestamp picked up according to voice, the corresponding user of voice is identified on the time shaft of the speech audio file;And
It identifies each speech audio file, generates corresponding text file respectively, and each text file is merged into a discipline
Want text file, the timestamp issued in the summary text file comprising the corresponding word content of every voice, every voice
With the corresponding relationship between the corresponding relationship of user and the content of the user speech and the position of user.
Further, translation module 404 are also used to generate translation content presentation interface in the screen of translation machine equipment,
The translation content interface includes multiple sub-interfaces, and the quantity of the sub-interface and the quantity of user are corresponding, and with the pickup
The position of device is corresponding.Multiple sub-interfaces are respectively used to the recognition result of the voice of display different directions.In practical applications,
The corresponding relationship of the position of translation content and sound pick-up that each sub-interface is shown can be customized by users.
Optionally, each sub-interface is interactive interface, the device further include:
Module (not shown) is adjusted, for responding user in the predetermined registration operation of each sub-interface, in the translation of display
Appearance is adjusted, such as: amplify, decrease font size, changes the color of font, word, word, sentence in modification translation content etc..
The other details of the present embodiment, referring to the description of earlier figures 1 to embodiment illustrated in fig. 9.
In the present embodiment, by multiple microphone pickup voices, the direction in the voice source picked up is determined, and determine language
The corresponding user in the direction in sound source, in conjunction with hardware by sound source direction technology, to distinguish the user direction spoken, different directions
People speak, sound will not interfere with each other, will voice carry out signal reinforcement processing after obtain target voice, it is corresponding to enable user
Translation process special translating purpose voice, and show translation content, each direction speaker has independent translation channel, each
Channel individually carries out speech recognition, translation to the user, text is presented, and translation is smooth, high-efficient.
The embodiment of the invention also provides a kind of electronic devices, comprising: memory, processor and is stored in the memory
Computer program that is upper and can running on the processor, when the processor executes the computer program, is realized as before
State simultaneous interpretation method described in Fig. 1 to Fig. 9.
Further, the electronic device further include:
At least one input equipment, at least one output equipment, multiple sound pick-ups and at least one microphone.
Above-mentioned memory, processor, input equipment, output equipment, sound pick-up and microphone, are connected by bus.
Alternatively, sound pick-up and microphone may be peripheral hardware, which further includes wireless radio frequency modules.Sound pick-up
And microphone establishes data connection with processor by the wireless network that data line or wireless radio frequency modules provide.
Wherein, input equipment concretely camera, touch panel, physical button etc..Output equipment is concretely soft
Property capacitive touch screen.
Memory can be high random access memory body (RAM, Random Access Memory) memory, can also be
Non-labile memory (non-volatile memory), such as magnetic disk storage.Memory is executable for storing one group
Program code, processor are coupled with memory.
Further, the embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable storages
Medium, which can be, to be set in above-mentioned electronic device, which can be the storage of aforementioned electronic
Device.It is stored with computer program on the computer readable storage medium, realizes earlier figures 1 to figure when which is executed by processor
Simultaneous interpretation method described in 9 illustrated embodiments.Further, the computer can storage medium can also be USB flash disk, mobile hard
Disk, read-only memory (ROM, Read-Only Memory), RAM, magnetic or disk etc. be various to can store program code
Medium.
It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this hair
Necessary to bright.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
The above are to simultaneous interpretation method provided by the present invention, synchronous translation apparatus, electronic device and computer-readable
The description of storage medium, for those skilled in the art, thought according to an embodiment of the present invention in specific embodiment and is answered
With in range, there will be changes, and to sum up, the contents of this specification are not to be construed as limiting the invention.
Claims (10)
1. a kind of simultaneous interpretation method characterized by comprising
Pass through multiple microphone pickup voices;
It determines the direction in the voice source picked up, and determines the corresponding user in direction in the voice source;
Target voice is obtained after the voice is carried out signal reinforcement processing;
It enables the corresponding translation process of the user and translates the target voice, and show translation content.
2. the method according to claim 1, wherein the multiple sound pick-up, which is four, is arranged as rectangular array
Sound pick-up, the most short side of the rectangle is long to be greater than 5 centimetres.
3. according to the method described in claim 2, it is characterized in that, the direction in the voice source that the determination is picked up includes:
The voice that detection is picked up reach in four sound pick-ups the first sound pick-up respectively with reach the second sound pick-up, third pickup
The time difference of device and the 4th sound pick-up reaches second sound pick-up and reaches the time difference of the third sound pick-up, with
And reach time difference of second sound pick-up respectively with arrival the 4th sound pick-up;
According to each time difference and velocity of sound, distance of the sound source of the voice apart from four sound pick-ups is calculated separately,
Calculation formula is as follows:
r1-r2=τ12v;
r1-r3=τ13v;
r1-r4=τ14v;
r3-r2=τ23v
r4-r2=τ42v
m2+n2=r2 2-(r1cosθ1)2=(r1-τ12v)2-(r1cosθ1)2
M-n=(r1-τ13v)2-(r1-τ12v)2-L2-D2/2(L-D)
N=(r1-τ14v)2-(r1-τ12v)2/-2D
M=(r1-τ13v)2-(r1-τ12v)2-L2-D2/2(L-D)+(r1-τ14v)2-(r1-τ12v)2/-2D
Wherein, using from first sound pick-up to the line direction the third sound pick-up as the positive direction of x-axis, with from institute
Positive direction of first sound pick-up to the line direction between second sound pick-up for y-axis is stated, from X/Y plane to the voice
Sound source direction be z-axis positive direction establish three-dimensional system of coordinate;
τ12First sound pick-up is reached for the sound source and reaches the time difference of second sound pick-up;τ13It is arrived for the sound source
Up to the time difference of first sound pick-up and the arrival third sound pick-up;τ14For the sound source reach first sound pick-up and
Reach the time difference of the 4th sound pick-up;τ23Second sound pick-up is reached for the sound source and reaches the third sound pick-up
Time difference;τ24Second sound pick-up is reached for the sound source and reaches the time difference of the 4th sound pick-up;
L is the distance between the first sound pick-up and second sound pick-up, and is the third sound pick-up and the 4th pickup
The distance between device;D is the distance between first sound pick-up and the third sound pick-up, and for second sound pick-up and
The distance between described 4th sound pick-up;r1Distance for the sound source apart from first sound pick-up;r2For the sound source distance
The distance of second sound pick-up;r3Distance for the sound source apart from third sound pick-up;r4It is the sound source apart from the described 4th
The distance of sound pick-up;M is the sound source in projecting between second sound pick-up and the 4th sound pick-up on X/Y plane
Line distance;N is the sound source in projecting between first sound pick-up and second sound pick-up on X/Y plane
Line distance;θ1For the sound source and the line of first sound pick-up and the angle of z-axis;θ2For the sound source with it is described
The line of second sound pick-up and the angle of z-axis;θ3For the sound source and the line of the third sound pick-up and the angle of z-axis;θ4For
The sound source and the line of the 4th sound pick-up and the angle of z-axis;
The distance that the sound source is arrived separately at four sound pick-ups obtains the sound source and arrives separately at institute divided by the velocity of sound
The direction stated the time of four sound pick-ups, and be up to time shortest sound pick-up is determined as the side in the voice source picked up
To.
4. the method according to claim 1, wherein the sound pick-up is single directional type microphone, then described
The direction in the determining voice source picked up includes:
The pressure for the tow sides that the vibrating diaphragm in aperture after detecting the cavity of the single directivity microphone is subject to;
The direction in the voice source that the direction where the big one side of pressure in the pressure of tow sides is determined as picking up.
5. the method according to claim 3 or 4, which is characterized in that after the progress signal reinforcement processing by the voice
Obtaining target voice includes:
It is delayed after superposition beam-forming schemes carry out signal enhancing processing to the voice by weighting and obtains the target voice.
6. according to the method described in claim 5, it is characterized in that, the method also includes:
The voice of multiple directions is picked up, and the voice of pickup is generated into different speech audio files according to direction;
The speech audio file is imported into Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition;
According to Application on Voiceprint Recognition as a result, according to the timestamp that voice picks up, identified on the time shaft of the speech audio file
The corresponding user of voice out;
It identifies each speech audio file, generates corresponding text file respectively, and each text file is merged into one
A summary text file, in the summary text file comprising the corresponding word content of every voice, every voice issue when
Between stamp and user corresponding relationship and the user speech content and user position between corresponding relationship.
7. a kind of synchronous translation apparatus characterized by comprising
Pickup model, for passing through multiple microphone pickup voices;
Determining module for determining the direction in the voice source picked up, and determines the corresponding use in direction in the voice source
Family;
Voice reinforcing module, for obtaining target voice after the voice is carried out signal reinforcement processing;
Translation module translates the target voice for enabling the corresponding translation process of the user, and shows translation content.
8. device according to claim 7, which is characterized in that described device further include:
Identification module generates different speech audios according to direction for picking up the voice of multiple directions, and by the voice of pickup
File;
The speech audio file is imported into Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition, and the timestamp picked up according to voice, In
The corresponding user of voice is identified on the time shaft of the speech audio file;
It identifies each speech audio file, generates corresponding text file respectively, and each text file is merged into one
A summary text file, in the summary text file comprising the corresponding word content of every voice, every voice issue when
Between stamp and user corresponding relationship and the user speech content and user position between corresponding relationship.
9. a kind of electronic device, the electronic device includes: memory, processor and is stored on the memory and can be in institute
State the computer program run on processor, which is characterized in that when the processor executes the computer program, realize as weighed
Benefit requires simultaneous interpretation method described in any one in 1 to 6.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
When being executed by processor, the simultaneous interpretation method as described in any one in claim 1 to 6 is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910669971.8A CN110491385A (en) | 2019-07-24 | 2019-07-24 | Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910669971.8A CN110491385A (en) | 2019-07-24 | 2019-07-24 | Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110491385A true CN110491385A (en) | 2019-11-22 |
Family
ID=68548054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910669971.8A Pending CN110491385A (en) | 2019-07-24 | 2019-07-24 | Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110491385A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111245868A (en) * | 2020-03-10 | 2020-06-05 | 诺领科技(南京)有限公司 | Narrowband Internet of things voice message communication method and system |
WO2021134284A1 (en) * | 2019-12-30 | 2021-07-08 | 深圳市欢太科技有限公司 | Voice information processing method, hub device, control terminal and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6125347A (en) * | 1993-09-29 | 2000-09-26 | L&H Applications Usa, Inc. | System for controlling multiple user application programs by spoken input |
CN102968991A (en) * | 2012-11-29 | 2013-03-13 | 华为技术有限公司 | Method, device and system for sorting voice conference minutes |
CN203301691U (en) * | 2013-05-31 | 2013-11-20 | 中山市天键电声有限公司 | Windproof, denoising and unidirectional microphone |
CN107978317A (en) * | 2017-12-18 | 2018-05-01 | 北京百度网讯科技有限公司 | Meeting summary synthetic method, system and terminal device |
CN108962263A (en) * | 2018-06-04 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | A kind of smart machine control method and system |
CN109686363A (en) * | 2019-02-26 | 2019-04-26 | 深圳市合言信息科技有限公司 | A kind of on-the-spot meeting artificial intelligence simultaneous interpretation equipment |
-
2019
- 2019-07-24 CN CN201910669971.8A patent/CN110491385A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6125347A (en) * | 1993-09-29 | 2000-09-26 | L&H Applications Usa, Inc. | System for controlling multiple user application programs by spoken input |
CN102968991A (en) * | 2012-11-29 | 2013-03-13 | 华为技术有限公司 | Method, device and system for sorting voice conference minutes |
CN203301691U (en) * | 2013-05-31 | 2013-11-20 | 中山市天键电声有限公司 | Windproof, denoising and unidirectional microphone |
CN107978317A (en) * | 2017-12-18 | 2018-05-01 | 北京百度网讯科技有限公司 | Meeting summary synthetic method, system and terminal device |
CN108962263A (en) * | 2018-06-04 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | A kind of smart machine control method and system |
CN109686363A (en) * | 2019-02-26 | 2019-04-26 | 深圳市合言信息科技有限公司 | A kind of on-the-spot meeting artificial intelligence simultaneous interpretation equipment |
Non-Patent Citations (2)
Title |
---|
周小东: "《录音工程师手册》", 31 January 2006, 中国广播电视出版社 * |
李学华: "《与信息时代同行 与创新精神共进 北京信息科技大学信息与通信工程学院大学生科技创新与工程实践论文集》", 31 July 2017, 北京邮电大学出版社 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021134284A1 (en) * | 2019-12-30 | 2021-07-08 | 深圳市欢太科技有限公司 | Voice information processing method, hub device, control terminal and storage medium |
CN111245868A (en) * | 2020-03-10 | 2020-06-05 | 诺领科技(南京)有限公司 | Narrowband Internet of things voice message communication method and system |
CN111245868B (en) * | 2020-03-10 | 2021-04-13 | 诺领科技(南京)有限公司 | Narrowband Internet of things voice message communication method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11620983B2 (en) | Speech recognition method, device, and computer-readable storage medium | |
US10304452B2 (en) | Voice interactive device and utterance control method | |
US20200335128A1 (en) | Identifying input for speech recognition engine | |
JP2021009701A (en) | Interface intelligent interaction control method, apparatus, system, and program | |
US10241990B2 (en) | Gesture based annotations | |
CN105301594B (en) | Range measurement | |
WO2020024708A1 (en) | Payment processing method and device | |
CN103886861B (en) | A kind of method of control electronics and electronic equipment | |
CN105453174A (en) | Speech enhancement method and apparatus for same | |
CN102903362A (en) | Integrated local and cloud based speech recognition | |
JP2012220959A (en) | Apparatus and method for determining relevance of input speech | |
CN112513983A (en) | Wearable system speech processing | |
CN110491385A (en) | Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium | |
KR20180127136A (en) | Double-sided display simultaneous translation device, method and apparatus and electronic device | |
KR20200090355A (en) | Multi-Channel-Network broadcasting System with translating speech on moving picture and Method thererof | |
CN106339081A (en) | Commercial equipment-based equipment carrying-free palm-positioning human-computer interaction method | |
JP2000207170A (en) | Device and method for processing information | |
JP2016076007A (en) | Interactive apparatus and interactive method | |
US10269349B2 (en) | Voice interactive device and voice interaction method | |
US20170221481A1 (en) | Data structure, interactive voice response device, and electronic device | |
JP7400364B2 (en) | Speech recognition system and information processing method | |
JP7091745B2 (en) | Display terminals, programs, information processing systems and methods | |
Panek et al. | Challenges in adopting speech control for assistive robots | |
CN111492668B (en) | Method and system for locating the origin of an audio signal within a defined space | |
Soda et al. | Handsfree voice interface for home network service using a microphone array network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191122 |