CN110491385A

CN110491385A - Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium

Info

Publication number: CN110491385A
Application number: CN201910669971.8A
Authority: CN
Inventors: 张岩; 熊涛
Original assignee: Shenzhen Heyan Mdt Infotech Ltd
Current assignee: Shenzhen Heyan Mdt Infotech Ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2019-11-22

Abstract

The embodiment of the invention discloses a kind of simultaneous interpretation methods, applied to voice processing technology field, this method comprises: passing through multiple microphone pickup voices, determine the direction in the voice source picked up, and determine the corresponding user in direction in the voice source, target voice is obtained after the voice is carried out signal reinforcement processing, the corresponding translation process of the user is enabled and translates the target voice, and show translation content.The embodiment of the invention also discloses a kind of synchronous translation apparatus and computer readable storage mediums, and the fluency of translation can be improved.

Description

Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium

Technical field

The invention belongs to voice processing technology field more particularly to a kind of simultaneous interpretation method, apparatus, electronic device and meter Calculation machine readable storage medium storing program for executing.

Background technique

The simultaneous interpretation equipment translated at present, since the object spoken can not be accurately positioned, so most of still " turn by The mode of turn ", that is, when user A loquiturs, user B cannot interrupt, responds, interrupt, and otherwise the translation of simultaneous interpretation equipment is just Can be inaccurate, only when user A finishes words, user B speaks again, could to translate accurate, smooth.This translation side Formula is inconvenient, so that interpersonal exchange is very mechanical, does not meet the natural law of person to person's talk.

Summary of the invention

The present invention provides a kind of simultaneous interpretation method, it is intended to solve in more people speech because pair spoken can not be accurately positioned As generated translation has some setbacks, mechanical problem is exchanged.

The embodiment of the invention provides a kind of simultaneous interpretation methods, comprising:

Pass through multiple microphone pickup voices；

It determines the direction in the voice source picked up, and determines the corresponding user in direction in the voice source；

Target voice is obtained after the voice is carried out signal reinforcement processing；

It enables the corresponding translation process of the user and translates the target voice, and show translation content.

The embodiment of the invention also provides a kind of synchronous translation apparatus, comprising:

Pickup model, for passing through multiple microphone pickup voices；

Determining module for determining the direction in the voice source picked up, and determines that the direction in the voice source is corresponding User；

Voice reinforcing module, for obtaining target voice after the voice is carried out signal reinforcement processing；

Translation module is translated the target voice for enabling the corresponding translation process of the user, and is shown in translation Hold.

The embodiment of the invention also provides a kind of electronic device, the electronic device includes: memory, processor and storage On the memory and the computer program that can run on the processor, the processor execute the computer program When, realize the simultaneous interpretation method as shown in above-described embodiment.

The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, described When computer program is executed by processor, the simultaneous interpretation method as shown in above-described embodiment is realized.

From the embodiments of the present invention it is found that by multiple microphone pickup voices, the voice source picked up is determined Direction, and determine the corresponding user in direction in voice source, in conjunction with hardware by sound source direction technology, to distinguish the user to speak The people in direction, different directions speaks, and sound will not interfere with each other, and obtains target voice after voice is carried out signal reinforcement processing, The corresponding translation process special translating purpose voice of user is enabled, and shows translation content, each direction speaker has independent one A translation channel, each channel individually carry out speech recognition, translation to the user, text are presented, and translation is smooth, high-efficient.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention.

Fig. 1 is the application scenarios schematic diagram of simultaneous interpretation method provided in an embodiment of the present invention；

Fig. 2 is the flow diagram of simultaneous interpretation method provided in an embodiment of the present invention；

Fig. 3 is sound source and sound pick-up positional diagram in simultaneous interpretation method provided in an embodiment of the present invention；

Fig. 4 be in simultaneous interpretation method provided in an embodiment of the present invention sound source to the time delay schematic diagram of each sound pick-up；

Fig. 5 is single directional type microphone appearance diagram in the embodiment of the present invention；

Fig. 6 is single directional type microphone pickup range schematic diagram in the embodiment of the present invention；

Fig. 7 is that the delay compensation process of sound source to each sound pick-up in simultaneous interpretation method provided in an embodiment of the present invention is illustrated Figure；

Fig. 8 is meeting room schematic diagram of a scenario in simultaneous interpretation method provided in an embodiment of the present invention；

Fig. 9 is the schematic diagram of a scenario that user divides in multiple directions in simultaneous interpretation method provided in an embodiment of the present invention；

Figure 10 is the structural schematic diagram of packet synchronous translation apparatus provided in an embodiment of the present invention.

Specific embodiment

In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described reality Applying example is only a part of the embodiment of the present invention, and not all embodiments.Based on the embodiments of the present invention, those skilled in the art Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Referring to Fig. 1, Fig. 1 is the application scenarios schematic diagram of simultaneous interpretation method provided in an embodiment of the present invention, two users It speaks face-to-face, translation machine equipment picks up the voice of two users from two opposite sides respectively, for translation machine equipment, wherein The sound source angle of release of one user is 0~180 °, the sound source angle of release of another user is 180 °~360 °, and translation machine equipment determination is picked up Then the voice of pickup is carried out signal reinforcement, so that voice so that it is determined that being which user is speaking by the direction of the voice taken Apparent, volume is bigger, and accuracy rate is high when translation.The corresponding independent translation process of each user, translation machine equipment starting The corresponding translation process of the user to speak, by translation process according to pre-set translation languages, by the voice of pickup from A kind of language translation is shown on the display screen of translation machine equipment at second of language, and by translation result, for another party Family is checked.Alternatively, translation machine equipment can throw screen to screen, to more by way of throwing the video lines such as screen, connection HDMI User checks.

Referring to fig. 2, Fig. 2 is the flow diagram of simultaneous interpretation method provided in an embodiment of the present invention, and this method can answer With in scene shown in Fig. 1, and by translation machine equipment, such as handheld translation machine, desk-top translator, or there is speech recognition and turn over Translate the realization of other computer equipments such as smart phone, the tablet computer of function.The translation machine equipment is built-in with microphone and multiple Sound pick-up, or data line or wireless network connection peripheral hardware microphone and multiple peripheral hardware sound pick-ups can also be passed through.This method master Include the following steps:

S101, pass through multiple microphone pickup voices；

The quantity and arrangement mode of sound pick-up can match according to the number and arrangement mode of onsite user.

It specifically can be 4 sound pick-ups, pick up the voice of 4 or 6 or more users, arrangement mode can be rectangle Array, such as positioned at four vertex of translation machine equipment, or positioned at the side where the user of opposite side；It can also be that 2 are picked up Sound device, picks up the voice of 2 users, and arrangement mode is positioned at the center of the side where the user of opposite side.

Sound pick-up is specifically as follows microphone.

S102, the direction for determining the voice source picked up, and determine the corresponding user in direction in the voice source；

Optionally, determine that there is following two ways in the direction in the voice source picked up:

The first: with sound pick-up be four, and according to rectangular array for.The most short side of rectangle is long to be greater than 5 centimetres.

Referring to Fig. 3, sound pick-up includes the first sound pick-up 10, the second sound pick-up 20, third sound pick-up 30 and the 4th sound pick-up 40.Be now placed in the user of the side of the first sound pick-up 10, the second sound pick-up 20, with, be located at third sound pick-up 30, the 4th pickup It is near field dialogue between the user of the side of device 40, the judgement to Sounnd source direction can be realized using near field sounds model.It will Sound source 11 is considered as a spherical wave.

Three-dimensional system of coordinate is established, the positive direction of the three-dimensional system of coordinate x-axis is from the first sound pick-up 10 to third sound pick-up 30 Between line direction, the positive direction of y-axis be from the line direction between first the 10 to the second sound pick-up of sound pick-up 20, z-axis Positive direction is the direction of the sound source 11 from X/Y plane to the voice picked up.Wherein, X/Y plane, that is, x-axis and the plane where y-axis.

The sound source 11 for detecting the voice picked up reaches the first sound pick-up 10 and its excess-three sound pick-up in four sound pick-ups Time difference, the time difference detected are respectively τ₁₂、τ₁₃And τ₁₄.According to above-mentioned time difference and velocity of sound v, (velocity of sound is sound The speed that sound is propagated in air, v=340 meter per second) calculate 11 four sound pick-ups of distance of sound source time:

Specifically, range difference of the sound source 11 at a distance from the first sound pick-up 10 and with the second sound pick-up 20 is equal to sound source 11 It reaches the first sound pick-up 10 and reaches the time difference τ of the second sound pick-up 20₁₂Multiplied by velocity of sound v.

r₁-r₂=τ₁₂v；

Wherein, r₁For the distance of 11 the first sound pick-up of distance 10 of sound source of the voice of pickup；r₂For 11 distance of sound source the second ten The distance of sound device 20；M is sound source 11 in the line projected between the second sound pick-up 20 and the 4th sound pick-up 40 on X/Y plane Distance；N is that sound source 11 is projecting to the first sound pick-up 10 at a distance from the line between the second sound pick-up 20 on X/Y plane； θ₁For the angle of the line and z-axis of sound source 11 and the first sound pick-up 10；θ₂For the line and z-axis of sound source 11 and the second sound pick-up 20 Angle.

Further, range difference of the sound source 11 at a distance from the first sound pick-up 10 and with third sound pick-up 30 is equal to sound source 11 reach the first sound pick-up 10 and reach the time difference τ of third sound pick-up 30₁₃Multiplied by velocity of sound v.

r₁-r₃=τ₁₃v

Wherein, r₁For the distance of the first sound pick-up of sound source distance 10 of the voice of pickup；r₃It is sound source 11 apart from third pickup The distance of device 30；L is the distance between the first sound pick-up 10 and the second sound pick-up 20, and is third sound pick-up 30 and the 4th pickup The distance between device 40；D is the distance between the first sound pick-up and third sound pick-up, and is the second sound pick-up and the 4th sound pick-up The distance between；M is sound source 11 in the line projected between the second sound pick-up 20 and the 4th sound pick-up 40 on X/Y plane Distance；N is that sound source 11 is projecting to the first sound pick-up 10 at a distance from the line between the second sound pick-up 20 on X/Y plane；θ₁ For the angle of the line and z-axis of sound source 11 and the first sound pick-up 10；θ₃For the line and z-axis of sound source 11 and third sound pick-up 30 Angle (not shown).

Further, range difference of the sound source 11 at a distance from the first sound pick-up 10 and with the 4th sound pick-up 40 is equal to sound source 11 reach the first sound pick-up 10 and reach the time difference τ of the 4th sound pick-up 40₁₄Multiplied by velocity of sound v.

r₁-r₄=τ₁₄v

Further, range difference of the sound source 11 at a distance from third sound pick-up 30 and with the second sound pick-up 20 is equal to sound source 11 reach the second sound pick-up 20 and reach the time difference τ of third sound pick-up 30₂₃Multiplied by velocity of sound v.

r₃-r₂=τ₂₃v

Further, range difference of the sound source 11 at a distance from the 4th sound pick-up 40 and with the second sound pick-up 20 is equal to sound source 11 reach the second sound pick-up 20 and reach the time difference τ of the 4th sound pick-up 40₄₂Multiplied by velocity of sound v.

r₄-r₂=τ₄₂v

More than, m²+n²=r₂ ²-(r₁cosθ₁)²=(r₁-τ₁₂v)²-(r₁cosθ₁)²

M-n=(r₁-τ₁₃v)²-(r₁-τ₁₂v)²-L²-D²/2(L-D)

By the above-mentioned each equation of solution, the solution formula of following m and n can be obtained:

N=(r₁-τ₁₄v)²-(r₁-τ₁₂v)²/-2D

M=(r₁-τ₁₃v)²-(r₁-τ₁₂v)²-L²-D²/2(L-D)+(r₁-τ₁₄v)²-(r₁-τ₁₂v)²/-2D

r₁For the distance of 11 the first sound pick-up of distance 10 of sound source of the voice of pickup；r₄For the 4th sound pick-up of 11 distance of sound source 40 distance；D is the distance between the first sound pick-up and third sound pick-up, and between the second sound pick-up and the 4th sound pick-up Distance；M is that sound source 11 is projecting to the second sound pick-up 20 at a distance from the line between the 4th sound pick-up 40 on X/Y plane；n The first sound pick-up 10 is being projected at a distance from the line between the second sound pick-up 20 on X/Y plane for sound source 11；θ₁For sound source 11 and first sound pick-up 10 line and z-axis angle；θ₄For the angle of the line and z-axis of sound source 11 and the 4th sound pick-up 40 (not shown).

By above-mentioned each formula joint account, m, n, θ can be obtained₁、θ₂、θ₃、θ₄、r₁、r₂、r₃And r₄.Wherein, it only calculates Angle theta₁、θ₂、θ₃、θ₄, can just obtain r₁、r₂、r₃And r₄Exact value.

Further, r is used₁、r₂、r₃、r₄Respectively divided by velocity of sound v, the voice that the sending of sound source 11 is calculated reaches this four The time of sound pick-up, the time for reaching the first sound pick-up 10 is t₁, reach the second sound pick-up 20 time be t₂, reach third pick up The time of sound device 30 is t₃, reach the 4th sound pick-up 40 time be t₄.That is, the sound source, which reaches four sound pick-ups, time delay, Time delay schematic diagram referring to fig. 4.

Finally, can determine the direction in voice source according to calculated time delay, for example, being up to time shortest sound pick-up Direction be determined as the direction in the voice source picked up.

Second: sound pick-up is single directional type microphone；

Referring to figs. 5 and 6, Fig. 5 is the appearance diagram of single directional type microphone.Fig. 6 is single directional type microphone Pickup range schematic diagram, according to the acoustical cavity of the microphone, each microphone only identifies the direction of the sound source of respective side Range.It should be understood that Fig. 5 is only a kind of signal, it in practical applications can be without being limited thereto.

Specifically, single directional type microphone mainly uses the principle of barometric gradient (Pressure Gradient) to design 's.The pressure for the tow sides that the vibrating diaphragm in aperture after detecting the cavity of single directivity microphone is subject to, by tow sides Pressure in direction where the big one side of pressure be determined as the direction in the voice source picked up.

In this way, the volume of the sound for the user 1 that first sound pick-up 10 and second sound pick-up 20 ipsilateral with user 1 receives And information, to be far longer than the sound and information of other sides user 2；Similarly, second sound pick-up 30 and fourth ipsilateral with user 2 The volume and information of the sound for the user 2 that sound pick-up 40 receives will be far longer than the sound and information of user 1.

S103, target voice is obtained after the voice is carried out signal reinforcement processing；

Superposition beam-forming schemes (Delay and sum beam-forming, DSBF) realization is delayed to the language by weighting The enhancing of sound.

Specifically, the voice which issues reaches the time of four sound pick-ups, including reaching the first sound pick-up 10 Time t1, reach the second sound pick-up 20 time t2, reach third sound pick-up 30 time t3, reach the 4th sound pick-up 40 when Between t4, that is, the time delay that the voice reaches each sound pick-up is different, when further carrying out signal to the voice that each sound pick-up receives Prolong compensation, so that the voice that sound pick-up receives reaches synchronous with the user's interactive voice for issuing the voice, and respectively picks up at this time The received noise of sound device and the voice of user's (i.e. sound source) sending from opposite are simultaneously asynchronous, then to each sound pick-up delay compensation Voice signal afterwards is weighted cumulative mean, due to the voice of each microphone pickup be it is synchronous, after weighted accumulation is average The voice signal is completely remained, and nonsynchronous interference noise is weakened with cumulative mean, and then reaches improvement The effect of signal-to-noise ratio.As shown in Figure 7.

When sound source is located near field, each received voice signal amplitude difference of sound pick-up clearly, receives voice signal Signal-to-noise ratio also has certain difference.

In order to preferably improve signal-to-noise ratio, enhances the voice signal, assign greater weight, remaining pickup to specified sound pick-up Device assigns smaller weight, is 30 He of third sound pick-up for example, assigning greater weight for the first sound pick-up 10 and the second sound pick-up 20 4th sound pick-up 40 assigns smaller weight and is negative sense weight, by a large amount of emulation experiment, obtains most reasonable weight and matches Set scheme.It is largely retained to realize close to user's one's voice in speech of the first sound pick-up 10 and the second sound pick-up 20, into And enable translation process and user's word is translated, and noise and close third sound pick-up 30 and the 4th pickup User's one's voice in speech of device 40 is largely deleted, and will not trigger further translation process.

Optionally, due to by above-mentioned formula calculation delay there may be certain error, can also directly by time The mode being respectively worth in preset time delay numberical range is gone through, calculating is compensated to the waveform of the sound of each microphone pickup, and According to calculated result, it will be superimposed corresponding time delay numerical value when audio amplitude maximum, be determined as the time delay for determining sound source, thus The accuracy and convenience of calculation delay can be improved.

S104, it enables the corresponding translation process of the user and translates the target voice, and show translation content.

The target voice is to carry out the strengthened voice of signal, and sound quality, volume are all further enhanced, and enables and is somebody's turn to do The corresponding unique translation process of user translates the target voice, which is interacted after obtaining translation with cloud Text results, be presented on other users side.Or by external large screen, the text results are shown.

Optionally, translation content presentation interface can be generated in the screen of translation machine equipment, which includes Multiple sub-interfaces, the quantity of sub-interface and the quantity of user are corresponding, and corresponding with the position of sound pick-up.Multiple sub-interfaces are used respectively In the recognition result of the voice of display different directions.Such as: under the scene of both sides' talks, it is assumed that two users are located at and turn over The two sides up and down of machine are translated, then generates the display interface comprising two sub-interfaces 1 and 2, is located at upper and lower the two of translator screen Half part.Wherein, it is used to show the voice on the upside of screen (that is, configuration is being turned over positioned at the sub-interface 1 of screen top half Translate the voice of the microphone pickup above machine) translation result, positioned at screen lower half portion sub-interface 2 for show from The translation result of voice (that is, voice of microphone pickup of the configuration below translator) on the downside of screen, vice versa.In reality In the application of border, the corresponding relationship of the position of translation content and sound pick-up that each sub-interface is shown be can be customized by users.

It should be understood that can refer to the scene of above-mentioned both sides' talks, herein not under the scene of tripartite or more multiparty conference It repeats again.Like this, by showing the voiced translation from different directions using multiple sub-interfaces as a result, can definitely turn over The corresponding relationship translating result and translating between object facilitates user especially under the scene of tripartite or more multiparty conference Recognizing bottom is what who has said, avoids the occurrence of the misunderstanding generated because fastening one person's story upon another person.

Optionally, each sub-interface is interactive interface, that is, user is in the predetermined registration operation of each sub-interface for response, turns over to display Content is translated to be adjusted, such as: amplify, decrease font size, changes the color of font, word, word, sentence in modification translation content etc..

Further, the voice of multiple directions is picked up, and after terminating dialogue, the voice of pickup is generated not according to direction The speech audio file is imported Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition, to identify the voice by same speech audio file The corresponding user of audio file, and according to Application on Voiceprint Recognition as a result, according to the timestamp that voice picks up, in the speech audio file Time shaft on identify the corresponding user of voice, each speech audio file of identification generates corresponding text file respectively, and Each text file is merged into a summary text file, in summary text file comprising the corresponding word content of every voice, It is corresponding between the content and the position of user of the corresponding relationship and user speech of every voice timestamp issued and user Relationship.

In one example, scene as shown in Figure 8, under the scene of meeting interview indoors, will translation machine equipment pendulum in conference table Center, one end is against conference table side delegate to the meeting user 1, user 2 and user 3, and one end is against conference table other side meeting generation Table user A, user B and user C, translation machine equipment have 4 sound pick-ups, specially microphone 1, microphone 2,3 and of microphone Microphone 4, can distinguish the user of two sides after the conference is over, and distinguish what who has said.

Specifically, microphone 1 and microphone 2 collect the sound of user 1, user 2 and user 3；Microphone 3 and microphone 4 Collect the sound of user A, user B and user C.After the conference is over, the sound in 2 direction of microphone 1 and microphone generates voice sound Frequency file m1；The sound in 4 direction of microphone 3 and microphone generates speech audio file m2.Speech audio file m1 and m2 are led Enter to Application on Voiceprint Recognition structure, is respectively user 1, user 2, which user's word in user 3 on time shaft from being identified in m1, from Identify in m2 is respectively user A, user B, which user's word in user C on time shaft.By m1 and m2 each by voice Identification obtains text file text1 and text2；Timestamp when being spoken by collecting merges into two text files completely The dialogue word content of all users is described in text file text.Pass through the timestamp of Application on Voiceprint Recognition speaker, insertion It into text, identifies which content all users have respectively said, thus obtains a detailed meeting summary text, and identify The user 1, user 2 of conference table side, the respective speech content of user 3 out, identify the user 1 of the conference table other side, user 2, The respective speech content of user 3.To easily automatically generate meeting summary, the efficiency for arranging conference content is improved, and can be with Show the corresponding relationship of the speech content of each user and the position of user and speech content.

Further, in this embodiment simultaneous interpretation method be applied to user respectively translation machine equipment both ends field Scape can be applied in the scene at translation four end of machine equipment.As shown in Figure 9.

In the embodiment of the present invention, by multiple microphone pickup voices, the direction in the voice source picked up is determined, and really The corresponding user in direction in attribute sound source is different to distinguish the user direction spoken in conjunction with hardware by sound source direction technology The people in direction speaks, and sound will not interfere with each other, and obtains target voice after voice is carried out signal reinforcement processing, enables user couple The translation process special translating purpose voice answered, and show translation content, each direction speaker has independent translation channel, Each channel individually carries out speech recognition, translation to the user, text is presented, and translation is smooth, high-efficient.

0, Figure 10 is the structural schematic diagram of synchronous translation apparatus provided in an embodiment of the present invention referring to Figure 1, for the ease of Illustrate, only parts related to embodiments of the present invention are shown.The exemplary synchronous translation apparatus of Figure 10 can be shown in earlier figures 1 The executing subject for the simultaneous interpretation method that embodiment provides, which can be translation machine equipment, alternatively, this is in unison Interpreting device is built-in in the terminal, and terminal includes PC computer, mobile phone and other electronic equipments.The synchronous translation apparatus includes:

Pickup model 401, for passing through multiple microphone pickup voices；

Determining module 402 for determining the direction in the voice source picked up, and determines the direction pair in the voice source The user answered；

Voice reinforcing module 403, for obtaining target voice after the voice is carried out signal reinforcement processing；

Translation module 404 translates the target voice for enabling the corresponding translation process of the user, and shows translation Content.

Further, it is determined that module 402, the voice for being also used to detect pickup reaches in four sound pick-ups the One sound pick-up respectively with reach the second sound pick-up, third sound pick-up and the 4th sound pick-up time difference；

According to each time difference and velocity of sound, calculate separately the sound source of the voice apart from four sound pick-ups away from From calculation formula is as follows:

r₁-r₂=τ₁₂v；

r₁-r₃=τ₁₃v；

r₁-r₄=τ₁₄v；

r₃-r₂=τ₂₃v

r₄-r₂=τ₄₂v

m²+n²=r₂ ²-(r₁cosθ₁)²=(r₁-τ₁₂v)²-(r₁cosθ₁)²

M-n=(r₁-τ₁₃v)²-(r₁-τ₁₂v)²-L²-D²/2(L-D)

N=(r₁-τ₁₄v)²-(r₁-τ₁₂v)²/-2D

Wherein, using from first sound pick-up to the line direction the third sound pick-up as the positive direction of x-axis, with It is the positive direction of y-axis from first sound pick-up to the line direction second sound pick-up, from X/Y plane to described The direction of the sound source of voice is that the positive direction of z-axis establishes three-dimensional system of coordinate；

τ₁₂First sound pick-up is reached for the sound source and reaches the time difference of second sound pick-up；τ₁₃It is described Sound source reaches first sound pick-up and reaches the time difference of the third sound pick-up；τ₁₄Described the first ten are reached for the sound source Sound device and the time difference for reaching the 4th sound pick-up；

L is the distance between the first sound pick-up and second sound pick-up, and is the third sound pick-up and the described 4th The distance between sound pick-up；D is the distance between first sound pick-up and the third sound pick-up, and is second pickup The distance between device and the 4th sound pick-up；r₁Distance for the sound source apart from first sound pick-up；r₂For the sound source Distance apart from second sound pick-up；r₃Distance for the sound source apart from third sound pick-up；r₄For described in the sound source distance The distance of 4th sound pick-up；M is that the sound source projects to second sound pick-up and the 4th sound pick-up on X/Y plane Between line distance；N is that the sound source projects to first sound pick-up and second sound pick-up on X/Y plane Between line distance；θ₁For the sound source and the line of first sound pick-up and the angle of z-axis；θ₂For the sound source with The line of second sound pick-up and the angle of z-axis；θ₃For the sound source and the line of the third sound pick-up and the folder of z-axis Angle；θ₄For the sound source and the line of the 4th sound pick-up and the angle of z-axis；

The distance that the sound source is arrived separately at four sound pick-ups obtains the sound source and arrives respectively divided by the velocity of sound Up to the time of four sound pick-ups, and the direction for being up to time shortest sound pick-up is determined as the voice source picked up Direction.

Further, voice reinforces processing module 403, and the superposition beam-forming schemes that are also used to be delayed by weighting are to institute's predicate Sound obtains the target voice after carrying out signal enhancing processing.

Further, the device further include:

Meeting summary generation module (not shown) is pressed for picking up the voice of multiple directions, and by the voice of pickup Different speech audio files is generated according to direction；The speech audio file is imported into Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition, And the timestamp picked up according to voice, the corresponding user of voice is identified on the time shaft of the speech audio file；And It identifies each speech audio file, generates corresponding text file respectively, and each text file is merged into a discipline Want text file, the timestamp issued in the summary text file comprising the corresponding word content of every voice, every voice With the corresponding relationship between the corresponding relationship of user and the content of the user speech and the position of user.

Further, translation module 404 are also used to generate translation content presentation interface in the screen of translation machine equipment, The translation content interface includes multiple sub-interfaces, and the quantity of the sub-interface and the quantity of user are corresponding, and with the pickup The position of device is corresponding.Multiple sub-interfaces are respectively used to the recognition result of the voice of display different directions.In practical applications, The corresponding relationship of the position of translation content and sound pick-up that each sub-interface is shown can be customized by users.

Optionally, each sub-interface is interactive interface, the device further include:

Module (not shown) is adjusted, for responding user in the predetermined registration operation of each sub-interface, in the translation of display Appearance is adjusted, such as: amplify, decrease font size, changes the color of font, word, word, sentence in modification translation content etc..

The other details of the present embodiment, referring to the description of earlier figures 1 to embodiment illustrated in fig. 9.

In the present embodiment, by multiple microphone pickup voices, the direction in the voice source picked up is determined, and determine language The corresponding user in the direction in sound source, in conjunction with hardware by sound source direction technology, to distinguish the user direction spoken, different directions People speak, sound will not interfere with each other, will voice carry out signal reinforcement processing after obtain target voice, it is corresponding to enable user Translation process special translating purpose voice, and show translation content, each direction speaker has independent translation channel, each Channel individually carries out speech recognition, translation to the user, text is presented, and translation is smooth, high-efficient.

The embodiment of the invention also provides a kind of electronic devices, comprising: memory, processor and is stored in the memory Computer program that is upper and can running on the processor, when the processor executes the computer program, is realized as before State simultaneous interpretation method described in Fig. 1 to Fig. 9.

Further, the electronic device further include:

At least one input equipment, at least one output equipment, multiple sound pick-ups and at least one microphone.

Above-mentioned memory, processor, input equipment, output equipment, sound pick-up and microphone, are connected by bus.

Alternatively, sound pick-up and microphone may be peripheral hardware, which further includes wireless radio frequency modules.Sound pick-up And microphone establishes data connection with processor by the wireless network that data line or wireless radio frequency modules provide.

Wherein, input equipment concretely camera, touch panel, physical button etc..Output equipment is concretely soft Property capacitive touch screen.

Memory can be high random access memory body (RAM, Random Access Memory) memory, can also be Non-labile memory (non-volatile memory), such as magnetic disk storage.Memory is executable for storing one group Program code, processor are coupled with memory.

Further, the embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable storages Medium, which can be, to be set in above-mentioned electronic device, which can be the storage of aforementioned electronic Device.It is stored with computer program on the computer readable storage medium, realizes earlier figures 1 to figure when which is executed by processor Simultaneous interpretation method described in 9 illustrated embodiments.Further, the computer can storage medium can also be USB flash disk, mobile hard Disk, read-only memory (ROM, Read-Only Memory), RAM, magnetic or disk etc. be various to can store program code Medium.

It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this hair Necessary to bright.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.

The above are to simultaneous interpretation method provided by the present invention, synchronous translation apparatus, electronic device and computer-readable The description of storage medium, for those skilled in the art, thought according to an embodiment of the present invention in specific embodiment and is answered With in range, there will be changes, and to sum up, the contents of this specification are not to be construed as limiting the invention.

Claims

1. a kind of simultaneous interpretation method characterized by comprising

Pass through multiple microphone pickup voices；

2. the method according to claim 1, wherein the multiple sound pick-up, which is four, is arranged as rectangular array Sound pick-up, the most short side of the rectangle is long to be greater than 5 centimetres.

3. according to the method described in claim 2, it is characterized in that, the direction in the voice source that the determination is picked up includes:

The voice that detection is picked up reach in four sound pick-ups the first sound pick-up respectively with reach the second sound pick-up, third pickup The time difference of device and the 4th sound pick-up reaches second sound pick-up and reaches the time difference of the third sound pick-up, with And reach time difference of second sound pick-up respectively with arrival the 4th sound pick-up；

According to each time difference and velocity of sound, distance of the sound source of the voice apart from four sound pick-ups is calculated separately, Calculation formula is as follows:

r₁-r₂=τ₁₂v；

r₁-r₃=τ₁₃v；

r₁-r₄=τ₁₄v；

r₃-r₂=τ₂₃v

r₄-r₂=τ₄₂v

m²+n²=r₂ ²-(r₁cosθ₁)²=(r₁-τ₁₂v)²-(r₁cosθ₁)²

M-n=(r₁-τ₁₃v)²-(r₁-τ₁₂v)²-L²-D²/2(L-D)

N=(r₁-τ₁₄v)²-(r₁-τ₁₂v)²/-2D

Wherein, using from first sound pick-up to the line direction the third sound pick-up as the positive direction of x-axis, with from institute Positive direction of first sound pick-up to the line direction between second sound pick-up for y-axis is stated, from X/Y plane to the voice Sound source direction be z-axis positive direction establish three-dimensional system of coordinate；

τ₁₂First sound pick-up is reached for the sound source and reaches the time difference of second sound pick-up；τ₁₃It is arrived for the sound source Up to the time difference of first sound pick-up and the arrival third sound pick-up；τ₁₄For the sound source reach first sound pick-up and Reach the time difference of the 4th sound pick-up；τ₂₃Second sound pick-up is reached for the sound source and reaches the third sound pick-up Time difference；τ₂₄Second sound pick-up is reached for the sound source and reaches the time difference of the 4th sound pick-up；

L is the distance between the first sound pick-up and second sound pick-up, and is the third sound pick-up and the 4th pickup The distance between device；D is the distance between first sound pick-up and the third sound pick-up, and for second sound pick-up and The distance between described 4th sound pick-up；r₁Distance for the sound source apart from first sound pick-up；r₂For the sound source distance The distance of second sound pick-up；r₃Distance for the sound source apart from third sound pick-up；r₄It is the sound source apart from the described 4th The distance of sound pick-up；M is the sound source in projecting between second sound pick-up and the 4th sound pick-up on X/Y plane Line distance；N is the sound source in projecting between first sound pick-up and second sound pick-up on X/Y plane Line distance；θ₁For the sound source and the line of first sound pick-up and the angle of z-axis；θ₂For the sound source with it is described The line of second sound pick-up and the angle of z-axis；θ₃For the sound source and the line of the third sound pick-up and the angle of z-axis；θ₄For The sound source and the line of the 4th sound pick-up and the angle of z-axis；

The distance that the sound source is arrived separately at four sound pick-ups obtains the sound source and arrives separately at institute divided by the velocity of sound The direction stated the time of four sound pick-ups, and be up to time shortest sound pick-up is determined as the side in the voice source picked up To.

4. the method according to claim 1, wherein the sound pick-up is single directional type microphone, then described The direction in the determining voice source picked up includes:

The pressure for the tow sides that the vibrating diaphragm in aperture after detecting the cavity of the single directivity microphone is subject to；

The direction in the voice source that the direction where the big one side of pressure in the pressure of tow sides is determined as picking up.

5. the method according to claim 3 or 4, which is characterized in that after the progress signal reinforcement processing by the voice Obtaining target voice includes:

It is delayed after superposition beam-forming schemes carry out signal enhancing processing to the voice by weighting and obtains the target voice.

6. according to the method described in claim 5, it is characterized in that, the method also includes:

The voice of multiple directions is picked up, and the voice of pickup is generated into different speech audio files according to direction；

The speech audio file is imported into Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition；

According to Application on Voiceprint Recognition as a result, according to the timestamp that voice picks up, identified on the time shaft of the speech audio file The corresponding user of voice out；

It identifies each speech audio file, generates corresponding text file respectively, and each text file is merged into one A summary text file, in the summary text file comprising the corresponding word content of every voice, every voice issue when Between stamp and user corresponding relationship and the user speech content and user position between corresponding relationship.

7. a kind of synchronous translation apparatus characterized by comprising

Pickup model, for passing through multiple microphone pickup voices；

Determining module for determining the direction in the voice source picked up, and determines the corresponding use in direction in the voice source Family；

Translation module translates the target voice for enabling the corresponding translation process of the user, and shows translation content.

8. device according to claim 7, which is characterized in that described device further include:

Identification module generates different speech audios according to direction for picking up the voice of multiple directions, and by the voice of pickup File；

The speech audio file is imported into Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition, and the timestamp picked up according to voice, In The corresponding user of voice is identified on the time shaft of the speech audio file；

9. a kind of electronic device, the electronic device includes: memory, processor and is stored on the memory and can be in institute State the computer program run on processor, which is characterized in that when the processor executes the computer program, realize as weighed Benefit requires simultaneous interpretation method described in any one in 1 to 6.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program When being executed by processor, the simultaneous interpretation method as described in any one in claim 1 to 6 is realized.