CN110491385A - Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium - Google Patents

Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium Download PDF

Info

Publication number
CN110491385A
CN110491385A CN201910669971.8A CN201910669971A CN110491385A CN 110491385 A CN110491385 A CN 110491385A CN 201910669971 A CN201910669971 A CN 201910669971A CN 110491385 A CN110491385 A CN 110491385A
Authority
CN
China
Prior art keywords
sound pick
voice
sound
source
pick
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910669971.8A
Other languages
Chinese (zh)
Inventor
张岩
熊涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Heyan Mdt Infotech Ltd
Original Assignee
Shenzhen Heyan Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Heyan Mdt Infotech Ltd filed Critical Shenzhen Heyan Mdt Infotech Ltd
Priority to CN201910669971.8A priority Critical patent/CN110491385A/en
Publication of CN110491385A publication Critical patent/CN110491385A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/326Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/004Monitoring arrangements; Testing arrangements for microphones

Abstract

The embodiment of the invention discloses a kind of simultaneous interpretation methods, applied to voice processing technology field, this method comprises: passing through multiple microphone pickup voices, determine the direction in the voice source picked up, and determine the corresponding user in direction in the voice source, target voice is obtained after the voice is carried out signal reinforcement processing, the corresponding translation process of the user is enabled and translates the target voice, and show translation content.The embodiment of the invention also discloses a kind of synchronous translation apparatus and computer readable storage mediums, and the fluency of translation can be improved.

Description

Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium
Technical field
The invention belongs to voice processing technology field more particularly to a kind of simultaneous interpretation method, apparatus, electronic device and meter Calculation machine readable storage medium storing program for executing.
Background technique
The simultaneous interpretation equipment translated at present, since the object spoken can not be accurately positioned, so most of still " turn by The mode of turn ", that is, when user A loquiturs, user B cannot interrupt, responds, interrupt, and otherwise the translation of simultaneous interpretation equipment is just Can be inaccurate, only when user A finishes words, user B speaks again, could to translate accurate, smooth.This translation side Formula is inconvenient, so that interpersonal exchange is very mechanical, does not meet the natural law of person to person's talk.
Summary of the invention
The present invention provides a kind of simultaneous interpretation method, it is intended to solve in more people speech because pair spoken can not be accurately positioned As generated translation has some setbacks, mechanical problem is exchanged.
The embodiment of the invention provides a kind of simultaneous interpretation methods, comprising:
Pass through multiple microphone pickup voices;
It determines the direction in the voice source picked up, and determines the corresponding user in direction in the voice source;
Target voice is obtained after the voice is carried out signal reinforcement processing;
It enables the corresponding translation process of the user and translates the target voice, and show translation content.
The embodiment of the invention also provides a kind of synchronous translation apparatus, comprising:
Pickup model, for passing through multiple microphone pickup voices;
Determining module for determining the direction in the voice source picked up, and determines that the direction in the voice source is corresponding User;
Voice reinforcing module, for obtaining target voice after the voice is carried out signal reinforcement processing;
Translation module is translated the target voice for enabling the corresponding translation process of the user, and is shown in translation Hold.
The embodiment of the invention also provides a kind of electronic device, the electronic device includes: memory, processor and storage On the memory and the computer program that can run on the processor, the processor execute the computer program When, realize the simultaneous interpretation method as shown in above-described embodiment.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, described When computer program is executed by processor, the simultaneous interpretation method as shown in above-described embodiment is realized.
From the embodiments of the present invention it is found that by multiple microphone pickup voices, the voice source picked up is determined Direction, and determine the corresponding user in direction in voice source, in conjunction with hardware by sound source direction technology, to distinguish the user to speak The people in direction, different directions speaks, and sound will not interfere with each other, and obtains target voice after voice is carried out signal reinforcement processing, The corresponding translation process special translating purpose voice of user is enabled, and shows translation content, each direction speaker has independent one A translation channel, each channel individually carry out speech recognition, translation to the user, text are presented, and translation is smooth, high-efficient.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention.
Fig. 1 is the application scenarios schematic diagram of simultaneous interpretation method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of simultaneous interpretation method provided in an embodiment of the present invention;
Fig. 3 is sound source and sound pick-up positional diagram in simultaneous interpretation method provided in an embodiment of the present invention;
Fig. 4 be in simultaneous interpretation method provided in an embodiment of the present invention sound source to the time delay schematic diagram of each sound pick-up;
Fig. 5 is single directional type microphone appearance diagram in the embodiment of the present invention;
Fig. 6 is single directional type microphone pickup range schematic diagram in the embodiment of the present invention;
Fig. 7 is that the delay compensation process of sound source to each sound pick-up in simultaneous interpretation method provided in an embodiment of the present invention is illustrated Figure;
Fig. 8 is meeting room schematic diagram of a scenario in simultaneous interpretation method provided in an embodiment of the present invention;
Fig. 9 is the schematic diagram of a scenario that user divides in multiple directions in simultaneous interpretation method provided in an embodiment of the present invention;
Figure 10 is the structural schematic diagram of packet synchronous translation apparatus provided in an embodiment of the present invention.
Specific embodiment
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described reality Applying example is only a part of the embodiment of the present invention, and not all embodiments.Based on the embodiments of the present invention, those skilled in the art Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Referring to Fig. 1, Fig. 1 is the application scenarios schematic diagram of simultaneous interpretation method provided in an embodiment of the present invention, two users It speaks face-to-face, translation machine equipment picks up the voice of two users from two opposite sides respectively, for translation machine equipment, wherein The sound source angle of release of one user is 0~180 °, the sound source angle of release of another user is 180 °~360 °, and translation machine equipment determination is picked up Then the voice of pickup is carried out signal reinforcement, so that voice so that it is determined that being which user is speaking by the direction of the voice taken Apparent, volume is bigger, and accuracy rate is high when translation.The corresponding independent translation process of each user, translation machine equipment starting The corresponding translation process of the user to speak, by translation process according to pre-set translation languages, by the voice of pickup from A kind of language translation is shown on the display screen of translation machine equipment at second of language, and by translation result, for another party Family is checked.Alternatively, translation machine equipment can throw screen to screen, to more by way of throwing the video lines such as screen, connection HDMI User checks.
Referring to fig. 2, Fig. 2 is the flow diagram of simultaneous interpretation method provided in an embodiment of the present invention, and this method can answer With in scene shown in Fig. 1, and by translation machine equipment, such as handheld translation machine, desk-top translator, or there is speech recognition and turn over Translate the realization of other computer equipments such as smart phone, the tablet computer of function.The translation machine equipment is built-in with microphone and multiple Sound pick-up, or data line or wireless network connection peripheral hardware microphone and multiple peripheral hardware sound pick-ups can also be passed through.This method master Include the following steps:
S101, pass through multiple microphone pickup voices;
The quantity and arrangement mode of sound pick-up can match according to the number and arrangement mode of onsite user.
It specifically can be 4 sound pick-ups, pick up the voice of 4 or 6 or more users, arrangement mode can be rectangle Array, such as positioned at four vertex of translation machine equipment, or positioned at the side where the user of opposite side;It can also be that 2 are picked up Sound device, picks up the voice of 2 users, and arrangement mode is positioned at the center of the side where the user of opposite side.
Sound pick-up is specifically as follows microphone.
S102, the direction for determining the voice source picked up, and determine the corresponding user in direction in the voice source;
Optionally, determine that there is following two ways in the direction in the voice source picked up:
The first: with sound pick-up be four, and according to rectangular array for.The most short side of rectangle is long to be greater than 5 centimetres.
Referring to Fig. 3, sound pick-up includes the first sound pick-up 10, the second sound pick-up 20, third sound pick-up 30 and the 4th sound pick-up 40.Be now placed in the user of the side of the first sound pick-up 10, the second sound pick-up 20, with, be located at third sound pick-up 30, the 4th pickup It is near field dialogue between the user of the side of device 40, the judgement to Sounnd source direction can be realized using near field sounds model.It will Sound source 11 is considered as a spherical wave.
Three-dimensional system of coordinate is established, the positive direction of the three-dimensional system of coordinate x-axis is from the first sound pick-up 10 to third sound pick-up 30 Between line direction, the positive direction of y-axis be from the line direction between first the 10 to the second sound pick-up of sound pick-up 20, z-axis Positive direction is the direction of the sound source 11 from X/Y plane to the voice picked up.Wherein, X/Y plane, that is, x-axis and the plane where y-axis.
The sound source 11 for detecting the voice picked up reaches the first sound pick-up 10 and its excess-three sound pick-up in four sound pick-ups Time difference, the time difference detected are respectively τ12、τ13And τ14.According to above-mentioned time difference and velocity of sound v, (velocity of sound is sound The speed that sound is propagated in air, v=340 meter per second) calculate 11 four sound pick-ups of distance of sound source time:
Specifically, range difference of the sound source 11 at a distance from the first sound pick-up 10 and with the second sound pick-up 20 is equal to sound source 11 It reaches the first sound pick-up 10 and reaches the time difference τ of the second sound pick-up 2012Multiplied by velocity of sound v.
r1-r212v;
Wherein, r1For the distance of 11 the first sound pick-up of distance 10 of sound source of the voice of pickup;r2For 11 distance of sound source the second ten The distance of sound device 20;M is sound source 11 in the line projected between the second sound pick-up 20 and the 4th sound pick-up 40 on X/Y plane Distance;N is that sound source 11 is projecting to the first sound pick-up 10 at a distance from the line between the second sound pick-up 20 on X/Y plane; θ1For the angle of the line and z-axis of sound source 11 and the first sound pick-up 10;θ2For the line and z-axis of sound source 11 and the second sound pick-up 20 Angle.
Further, range difference of the sound source 11 at a distance from the first sound pick-up 10 and with third sound pick-up 30 is equal to sound source 11 reach the first sound pick-up 10 and reach the time difference τ of third sound pick-up 3013Multiplied by velocity of sound v.
r1-r313v
Wherein, r1For the distance of the first sound pick-up of sound source distance 10 of the voice of pickup;r3It is sound source 11 apart from third pickup The distance of device 30;L is the distance between the first sound pick-up 10 and the second sound pick-up 20, and is third sound pick-up 30 and the 4th pickup The distance between device 40;D is the distance between the first sound pick-up and third sound pick-up, and is the second sound pick-up and the 4th sound pick-up The distance between;M is sound source 11 in the line projected between the second sound pick-up 20 and the 4th sound pick-up 40 on X/Y plane Distance;N is that sound source 11 is projecting to the first sound pick-up 10 at a distance from the line between the second sound pick-up 20 on X/Y plane;θ1 For the angle of the line and z-axis of sound source 11 and the first sound pick-up 10;θ3For the line and z-axis of sound source 11 and third sound pick-up 30 Angle (not shown).
Further, range difference of the sound source 11 at a distance from the first sound pick-up 10 and with the 4th sound pick-up 40 is equal to sound source 11 reach the first sound pick-up 10 and reach the time difference τ of the 4th sound pick-up 4014Multiplied by velocity of sound v.
r1-r414v
Further, range difference of the sound source 11 at a distance from third sound pick-up 30 and with the second sound pick-up 20 is equal to sound source 11 reach the second sound pick-up 20 and reach the time difference τ of third sound pick-up 3023Multiplied by velocity of sound v.
r3-r223v
Further, range difference of the sound source 11 at a distance from the 4th sound pick-up 40 and with the second sound pick-up 20 is equal to sound source 11 reach the second sound pick-up 20 and reach the time difference τ of the 4th sound pick-up 4042Multiplied by velocity of sound v.
r4-r242v
More than, m2+n2=r2 2-(r1cosθ1)2=(r112v)2-(r1cosθ1)2
M-n=(r113v)2-(r112v)2-L2-D2/2(L-D)
By the above-mentioned each equation of solution, the solution formula of following m and n can be obtained:
N=(r114v)2-(r112v)2/-2D
M=(r113v)2-(r112v)2-L2-D2/2(L-D)+(r114v)2-(r112v)2/-2D
r1For the distance of 11 the first sound pick-up of distance 10 of sound source of the voice of pickup;r4For the 4th sound pick-up of 11 distance of sound source 40 distance;D is the distance between the first sound pick-up and third sound pick-up, and between the second sound pick-up and the 4th sound pick-up Distance;M is that sound source 11 is projecting to the second sound pick-up 20 at a distance from the line between the 4th sound pick-up 40 on X/Y plane;n The first sound pick-up 10 is being projected at a distance from the line between the second sound pick-up 20 on X/Y plane for sound source 11;θ1For sound source 11 and first sound pick-up 10 line and z-axis angle;θ4For the angle of the line and z-axis of sound source 11 and the 4th sound pick-up 40 (not shown).
By above-mentioned each formula joint account, m, n, θ can be obtained1、θ2、θ3、θ4、r1、r2、r3And r4.Wherein, it only calculates Angle theta1、θ2、θ3、θ4, can just obtain r1、r2、r3And r4Exact value.
Further, r is used1、r2、r3、r4Respectively divided by velocity of sound v, the voice that the sending of sound source 11 is calculated reaches this four The time of sound pick-up, the time for reaching the first sound pick-up 10 is t1, reach the second sound pick-up 20 time be t2, reach third pick up The time of sound device 30 is t3, reach the 4th sound pick-up 40 time be t4.That is, the sound source, which reaches four sound pick-ups, time delay, Time delay schematic diagram referring to fig. 4.
Finally, can determine the direction in voice source according to calculated time delay, for example, being up to time shortest sound pick-up Direction be determined as the direction in the voice source picked up.
Second: sound pick-up is single directional type microphone;
Referring to figs. 5 and 6, Fig. 5 is the appearance diagram of single directional type microphone.Fig. 6 is single directional type microphone Pickup range schematic diagram, according to the acoustical cavity of the microphone, each microphone only identifies the direction of the sound source of respective side Range.It should be understood that Fig. 5 is only a kind of signal, it in practical applications can be without being limited thereto.
Specifically, single directional type microphone mainly uses the principle of barometric gradient (Pressure Gradient) to design 's.The pressure for the tow sides that the vibrating diaphragm in aperture after detecting the cavity of single directivity microphone is subject to, by tow sides Pressure in direction where the big one side of pressure be determined as the direction in the voice source picked up.
In this way, the volume of the sound for the user 1 that first sound pick-up 10 and second sound pick-up 20 ipsilateral with user 1 receives And information, to be far longer than the sound and information of other sides user 2;Similarly, second sound pick-up 30 and fourth ipsilateral with user 2 The volume and information of the sound for the user 2 that sound pick-up 40 receives will be far longer than the sound and information of user 1.
S103, target voice is obtained after the voice is carried out signal reinforcement processing;
Superposition beam-forming schemes (Delay and sum beam-forming, DSBF) realization is delayed to the language by weighting The enhancing of sound.
Specifically, the voice which issues reaches the time of four sound pick-ups, including reaching the first sound pick-up 10 Time t1, reach the second sound pick-up 20 time t2, reach third sound pick-up 30 time t3, reach the 4th sound pick-up 40 when Between t4, that is, the time delay that the voice reaches each sound pick-up is different, when further carrying out signal to the voice that each sound pick-up receives Prolong compensation, so that the voice that sound pick-up receives reaches synchronous with the user's interactive voice for issuing the voice, and respectively picks up at this time The received noise of sound device and the voice of user's (i.e. sound source) sending from opposite are simultaneously asynchronous, then to each sound pick-up delay compensation Voice signal afterwards is weighted cumulative mean, due to the voice of each microphone pickup be it is synchronous, after weighted accumulation is average The voice signal is completely remained, and nonsynchronous interference noise is weakened with cumulative mean, and then reaches improvement The effect of signal-to-noise ratio.As shown in Figure 7.
When sound source is located near field, each received voice signal amplitude difference of sound pick-up clearly, receives voice signal Signal-to-noise ratio also has certain difference.
In order to preferably improve signal-to-noise ratio, enhances the voice signal, assign greater weight, remaining pickup to specified sound pick-up Device assigns smaller weight, is 30 He of third sound pick-up for example, assigning greater weight for the first sound pick-up 10 and the second sound pick-up 20 4th sound pick-up 40 assigns smaller weight and is negative sense weight, by a large amount of emulation experiment, obtains most reasonable weight and matches Set scheme.It is largely retained to realize close to user's one's voice in speech of the first sound pick-up 10 and the second sound pick-up 20, into And enable translation process and user's word is translated, and noise and close third sound pick-up 30 and the 4th pickup User's one's voice in speech of device 40 is largely deleted, and will not trigger further translation process.
Optionally, due to by above-mentioned formula calculation delay there may be certain error, can also directly by time The mode being respectively worth in preset time delay numberical range is gone through, calculating is compensated to the waveform of the sound of each microphone pickup, and According to calculated result, it will be superimposed corresponding time delay numerical value when audio amplitude maximum, be determined as the time delay for determining sound source, thus The accuracy and convenience of calculation delay can be improved.
S104, it enables the corresponding translation process of the user and translates the target voice, and show translation content.
The target voice is to carry out the strengthened voice of signal, and sound quality, volume are all further enhanced, and enables and is somebody's turn to do The corresponding unique translation process of user translates the target voice, which is interacted after obtaining translation with cloud Text results, be presented on other users side.Or by external large screen, the text results are shown.
Optionally, translation content presentation interface can be generated in the screen of translation machine equipment, which includes Multiple sub-interfaces, the quantity of sub-interface and the quantity of user are corresponding, and corresponding with the position of sound pick-up.Multiple sub-interfaces are used respectively In the recognition result of the voice of display different directions.Such as: under the scene of both sides' talks, it is assumed that two users are located at and turn over The two sides up and down of machine are translated, then generates the display interface comprising two sub-interfaces 1 and 2, is located at upper and lower the two of translator screen Half part.Wherein, it is used to show the voice on the upside of screen (that is, configuration is being turned over positioned at the sub-interface 1 of screen top half Translate the voice of the microphone pickup above machine) translation result, positioned at screen lower half portion sub-interface 2 for show from The translation result of voice (that is, voice of microphone pickup of the configuration below translator) on the downside of screen, vice versa.In reality In the application of border, the corresponding relationship of the position of translation content and sound pick-up that each sub-interface is shown be can be customized by users.
It should be understood that can refer to the scene of above-mentioned both sides' talks, herein not under the scene of tripartite or more multiparty conference It repeats again.Like this, by showing the voiced translation from different directions using multiple sub-interfaces as a result, can definitely turn over The corresponding relationship translating result and translating between object facilitates user especially under the scene of tripartite or more multiparty conference Recognizing bottom is what who has said, avoids the occurrence of the misunderstanding generated because fastening one person's story upon another person.
Optionally, each sub-interface is interactive interface, that is, user is in the predetermined registration operation of each sub-interface for response, turns over to display Content is translated to be adjusted, such as: amplify, decrease font size, changes the color of font, word, word, sentence in modification translation content etc..
Further, the voice of multiple directions is picked up, and after terminating dialogue, the voice of pickup is generated not according to direction The speech audio file is imported Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition, to identify the voice by same speech audio file The corresponding user of audio file, and according to Application on Voiceprint Recognition as a result, according to the timestamp that voice picks up, in the speech audio file Time shaft on identify the corresponding user of voice, each speech audio file of identification generates corresponding text file respectively, and Each text file is merged into a summary text file, in summary text file comprising the corresponding word content of every voice, It is corresponding between the content and the position of user of the corresponding relationship and user speech of every voice timestamp issued and user Relationship.
In one example, scene as shown in Figure 8, under the scene of meeting interview indoors, will translation machine equipment pendulum in conference table Center, one end is against conference table side delegate to the meeting user 1, user 2 and user 3, and one end is against conference table other side meeting generation Table user A, user B and user C, translation machine equipment have 4 sound pick-ups, specially microphone 1, microphone 2,3 and of microphone Microphone 4, can distinguish the user of two sides after the conference is over, and distinguish what who has said.
Specifically, microphone 1 and microphone 2 collect the sound of user 1, user 2 and user 3;Microphone 3 and microphone 4 Collect the sound of user A, user B and user C.After the conference is over, the sound in 2 direction of microphone 1 and microphone generates voice sound Frequency file m1;The sound in 4 direction of microphone 3 and microphone generates speech audio file m2.Speech audio file m1 and m2 are led Enter to Application on Voiceprint Recognition structure, is respectively user 1, user 2, which user's word in user 3 on time shaft from being identified in m1, from Identify in m2 is respectively user A, user B, which user's word in user C on time shaft.By m1 and m2 each by voice Identification obtains text file text1 and text2;Timestamp when being spoken by collecting merges into two text files completely The dialogue word content of all users is described in text file text.Pass through the timestamp of Application on Voiceprint Recognition speaker, insertion It into text, identifies which content all users have respectively said, thus obtains a detailed meeting summary text, and identify The user 1, user 2 of conference table side, the respective speech content of user 3 out, identify the user 1 of the conference table other side, user 2, The respective speech content of user 3.To easily automatically generate meeting summary, the efficiency for arranging conference content is improved, and can be with Show the corresponding relationship of the speech content of each user and the position of user and speech content.
Further, in this embodiment simultaneous interpretation method be applied to user respectively translation machine equipment both ends field Scape can be applied in the scene at translation four end of machine equipment.As shown in Figure 9.
In the embodiment of the present invention, by multiple microphone pickup voices, the direction in the voice source picked up is determined, and really The corresponding user in direction in attribute sound source is different to distinguish the user direction spoken in conjunction with hardware by sound source direction technology The people in direction speaks, and sound will not interfere with each other, and obtains target voice after voice is carried out signal reinforcement processing, enables user couple The translation process special translating purpose voice answered, and show translation content, each direction speaker has independent translation channel, Each channel individually carries out speech recognition, translation to the user, text is presented, and translation is smooth, high-efficient.
0, Figure 10 is the structural schematic diagram of synchronous translation apparatus provided in an embodiment of the present invention referring to Figure 1, for the ease of Illustrate, only parts related to embodiments of the present invention are shown.The exemplary synchronous translation apparatus of Figure 10 can be shown in earlier figures 1 The executing subject for the simultaneous interpretation method that embodiment provides, which can be translation machine equipment, alternatively, this is in unison Interpreting device is built-in in the terminal, and terminal includes PC computer, mobile phone and other electronic equipments.The synchronous translation apparatus includes:
Pickup model 401, for passing through multiple microphone pickup voices;
Determining module 402 for determining the direction in the voice source picked up, and determines the direction pair in the voice source The user answered;
Voice reinforcing module 403, for obtaining target voice after the voice is carried out signal reinforcement processing;
Translation module 404 translates the target voice for enabling the corresponding translation process of the user, and shows translation Content.
Further, it is determined that module 402, the voice for being also used to detect pickup reaches in four sound pick-ups the One sound pick-up respectively with reach the second sound pick-up, third sound pick-up and the 4th sound pick-up time difference;
According to each time difference and velocity of sound, calculate separately the sound source of the voice apart from four sound pick-ups away from From calculation formula is as follows:
r1-r212v;
r1-r313v;
r1-r414v;
r3-r223v
r4-r242v
m2+n2=r2 2-(r1cosθ1)2=(r112v)2-(r1cosθ1)2
M-n=(r113v)2-(r112v)2-L2-D2/2(L-D)
N=(r114v)2-(r112v)2/-2D
M=(r113v)2-(r112v)2-L2-D2/2(L-D)+(r114v)2-(r112v)2/-2D
Wherein, using from first sound pick-up to the line direction the third sound pick-up as the positive direction of x-axis, with It is the positive direction of y-axis from first sound pick-up to the line direction second sound pick-up, from X/Y plane to described The direction of the sound source of voice is that the positive direction of z-axis establishes three-dimensional system of coordinate;
τ12First sound pick-up is reached for the sound source and reaches the time difference of second sound pick-up;τ13It is described Sound source reaches first sound pick-up and reaches the time difference of the third sound pick-up;τ14Described the first ten are reached for the sound source Sound device and the time difference for reaching the 4th sound pick-up;
L is the distance between the first sound pick-up and second sound pick-up, and is the third sound pick-up and the described 4th The distance between sound pick-up;D is the distance between first sound pick-up and the third sound pick-up, and is second pickup The distance between device and the 4th sound pick-up;r1Distance for the sound source apart from first sound pick-up;r2For the sound source Distance apart from second sound pick-up;r3Distance for the sound source apart from third sound pick-up;r4For described in the sound source distance The distance of 4th sound pick-up;M is that the sound source projects to second sound pick-up and the 4th sound pick-up on X/Y plane Between line distance;N is that the sound source projects to first sound pick-up and second sound pick-up on X/Y plane Between line distance;θ1For the sound source and the line of first sound pick-up and the angle of z-axis;θ2For the sound source with The line of second sound pick-up and the angle of z-axis;θ3For the sound source and the line of the third sound pick-up and the folder of z-axis Angle;θ4For the sound source and the line of the 4th sound pick-up and the angle of z-axis;
The distance that the sound source is arrived separately at four sound pick-ups obtains the sound source and arrives respectively divided by the velocity of sound Up to the time of four sound pick-ups, and the direction for being up to time shortest sound pick-up is determined as the voice source picked up Direction.
Further, voice reinforces processing module 403, and the superposition beam-forming schemes that are also used to be delayed by weighting are to institute's predicate Sound obtains the target voice after carrying out signal enhancing processing.
Further, the device further include:
Meeting summary generation module (not shown) is pressed for picking up the voice of multiple directions, and by the voice of pickup Different speech audio files is generated according to direction;The speech audio file is imported into Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition, And the timestamp picked up according to voice, the corresponding user of voice is identified on the time shaft of the speech audio file;And It identifies each speech audio file, generates corresponding text file respectively, and each text file is merged into a discipline Want text file, the timestamp issued in the summary text file comprising the corresponding word content of every voice, every voice With the corresponding relationship between the corresponding relationship of user and the content of the user speech and the position of user.
Further, translation module 404 are also used to generate translation content presentation interface in the screen of translation machine equipment, The translation content interface includes multiple sub-interfaces, and the quantity of the sub-interface and the quantity of user are corresponding, and with the pickup The position of device is corresponding.Multiple sub-interfaces are respectively used to the recognition result of the voice of display different directions.In practical applications, The corresponding relationship of the position of translation content and sound pick-up that each sub-interface is shown can be customized by users.
Optionally, each sub-interface is interactive interface, the device further include:
Module (not shown) is adjusted, for responding user in the predetermined registration operation of each sub-interface, in the translation of display Appearance is adjusted, such as: amplify, decrease font size, changes the color of font, word, word, sentence in modification translation content etc..
The other details of the present embodiment, referring to the description of earlier figures 1 to embodiment illustrated in fig. 9.
In the present embodiment, by multiple microphone pickup voices, the direction in the voice source picked up is determined, and determine language The corresponding user in the direction in sound source, in conjunction with hardware by sound source direction technology, to distinguish the user direction spoken, different directions People speak, sound will not interfere with each other, will voice carry out signal reinforcement processing after obtain target voice, it is corresponding to enable user Translation process special translating purpose voice, and show translation content, each direction speaker has independent translation channel, each Channel individually carries out speech recognition, translation to the user, text is presented, and translation is smooth, high-efficient.
The embodiment of the invention also provides a kind of electronic devices, comprising: memory, processor and is stored in the memory Computer program that is upper and can running on the processor, when the processor executes the computer program, is realized as before State simultaneous interpretation method described in Fig. 1 to Fig. 9.
Further, the electronic device further include:
At least one input equipment, at least one output equipment, multiple sound pick-ups and at least one microphone.
Above-mentioned memory, processor, input equipment, output equipment, sound pick-up and microphone, are connected by bus.
Alternatively, sound pick-up and microphone may be peripheral hardware, which further includes wireless radio frequency modules.Sound pick-up And microphone establishes data connection with processor by the wireless network that data line or wireless radio frequency modules provide.
Wherein, input equipment concretely camera, touch panel, physical button etc..Output equipment is concretely soft Property capacitive touch screen.
Memory can be high random access memory body (RAM, Random Access Memory) memory, can also be Non-labile memory (non-volatile memory), such as magnetic disk storage.Memory is executable for storing one group Program code, processor are coupled with memory.
Further, the embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable storages Medium, which can be, to be set in above-mentioned electronic device, which can be the storage of aforementioned electronic Device.It is stored with computer program on the computer readable storage medium, realizes earlier figures 1 to figure when which is executed by processor Simultaneous interpretation method described in 9 illustrated embodiments.Further, the computer can storage medium can also be USB flash disk, mobile hard Disk, read-only memory (ROM, Read-Only Memory), RAM, magnetic or disk etc. be various to can store program code Medium.
It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this hair Necessary to bright.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.
The above are to simultaneous interpretation method provided by the present invention, synchronous translation apparatus, electronic device and computer-readable The description of storage medium, for those skilled in the art, thought according to an embodiment of the present invention in specific embodiment and is answered With in range, there will be changes, and to sum up, the contents of this specification are not to be construed as limiting the invention.

Claims (10)

1. a kind of simultaneous interpretation method characterized by comprising
Pass through multiple microphone pickup voices;
It determines the direction in the voice source picked up, and determines the corresponding user in direction in the voice source;
Target voice is obtained after the voice is carried out signal reinforcement processing;
It enables the corresponding translation process of the user and translates the target voice, and show translation content.
2. the method according to claim 1, wherein the multiple sound pick-up, which is four, is arranged as rectangular array Sound pick-up, the most short side of the rectangle is long to be greater than 5 centimetres.
3. according to the method described in claim 2, it is characterized in that, the direction in the voice source that the determination is picked up includes:
The voice that detection is picked up reach in four sound pick-ups the first sound pick-up respectively with reach the second sound pick-up, third pickup The time difference of device and the 4th sound pick-up reaches second sound pick-up and reaches the time difference of the third sound pick-up, with And reach time difference of second sound pick-up respectively with arrival the 4th sound pick-up;
According to each time difference and velocity of sound, distance of the sound source of the voice apart from four sound pick-ups is calculated separately, Calculation formula is as follows:
r1-r212v;
r1-r313v;
r1-r414v;
r3-r223v
r4-r242v
m2+n2=r2 2-(r1cosθ1)2=(r112v)2-(r1cosθ1)2
M-n=(r113v)2-(r112v)2-L2-D2/2(L-D)
N=(r114v)2-(r112v)2/-2D
M=(r113v)2-(r112v)2-L2-D2/2(L-D)+(r114v)2-(r112v)2/-2D
Wherein, using from first sound pick-up to the line direction the third sound pick-up as the positive direction of x-axis, with from institute Positive direction of first sound pick-up to the line direction between second sound pick-up for y-axis is stated, from X/Y plane to the voice Sound source direction be z-axis positive direction establish three-dimensional system of coordinate;
τ12First sound pick-up is reached for the sound source and reaches the time difference of second sound pick-up;τ13It is arrived for the sound source Up to the time difference of first sound pick-up and the arrival third sound pick-up;τ14For the sound source reach first sound pick-up and Reach the time difference of the 4th sound pick-up;τ23Second sound pick-up is reached for the sound source and reaches the third sound pick-up Time difference;τ24Second sound pick-up is reached for the sound source and reaches the time difference of the 4th sound pick-up;
L is the distance between the first sound pick-up and second sound pick-up, and is the third sound pick-up and the 4th pickup The distance between device;D is the distance between first sound pick-up and the third sound pick-up, and for second sound pick-up and The distance between described 4th sound pick-up;r1Distance for the sound source apart from first sound pick-up;r2For the sound source distance The distance of second sound pick-up;r3Distance for the sound source apart from third sound pick-up;r4It is the sound source apart from the described 4th The distance of sound pick-up;M is the sound source in projecting between second sound pick-up and the 4th sound pick-up on X/Y plane Line distance;N is the sound source in projecting between first sound pick-up and second sound pick-up on X/Y plane Line distance;θ1For the sound source and the line of first sound pick-up and the angle of z-axis;θ2For the sound source with it is described The line of second sound pick-up and the angle of z-axis;θ3For the sound source and the line of the third sound pick-up and the angle of z-axis;θ4For The sound source and the line of the 4th sound pick-up and the angle of z-axis;
The distance that the sound source is arrived separately at four sound pick-ups obtains the sound source and arrives separately at institute divided by the velocity of sound The direction stated the time of four sound pick-ups, and be up to time shortest sound pick-up is determined as the side in the voice source picked up To.
4. the method according to claim 1, wherein the sound pick-up is single directional type microphone, then described The direction in the determining voice source picked up includes:
The pressure for the tow sides that the vibrating diaphragm in aperture after detecting the cavity of the single directivity microphone is subject to;
The direction in the voice source that the direction where the big one side of pressure in the pressure of tow sides is determined as picking up.
5. the method according to claim 3 or 4, which is characterized in that after the progress signal reinforcement processing by the voice Obtaining target voice includes:
It is delayed after superposition beam-forming schemes carry out signal enhancing processing to the voice by weighting and obtains the target voice.
6. according to the method described in claim 5, it is characterized in that, the method also includes:
The voice of multiple directions is picked up, and the voice of pickup is generated into different speech audio files according to direction;
The speech audio file is imported into Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition;
According to Application on Voiceprint Recognition as a result, according to the timestamp that voice picks up, identified on the time shaft of the speech audio file The corresponding user of voice out;
It identifies each speech audio file, generates corresponding text file respectively, and each text file is merged into one A summary text file, in the summary text file comprising the corresponding word content of every voice, every voice issue when Between stamp and user corresponding relationship and the user speech content and user position between corresponding relationship.
7. a kind of synchronous translation apparatus characterized by comprising
Pickup model, for passing through multiple microphone pickup voices;
Determining module for determining the direction in the voice source picked up, and determines the corresponding use in direction in the voice source Family;
Voice reinforcing module, for obtaining target voice after the voice is carried out signal reinforcement processing;
Translation module translates the target voice for enabling the corresponding translation process of the user, and shows translation content.
8. device according to claim 7, which is characterized in that described device further include:
Identification module generates different speech audios according to direction for picking up the voice of multiple directions, and by the voice of pickup File;
The speech audio file is imported into Application on Voiceprint Recognition interface and carries out Application on Voiceprint Recognition, and the timestamp picked up according to voice, In The corresponding user of voice is identified on the time shaft of the speech audio file;
It identifies each speech audio file, generates corresponding text file respectively, and each text file is merged into one A summary text file, in the summary text file comprising the corresponding word content of every voice, every voice issue when Between stamp and user corresponding relationship and the user speech content and user position between corresponding relationship.
9. a kind of electronic device, the electronic device includes: memory, processor and is stored on the memory and can be in institute State the computer program run on processor, which is characterized in that when the processor executes the computer program, realize as weighed Benefit requires simultaneous interpretation method described in any one in 1 to 6.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program When being executed by processor, the simultaneous interpretation method as described in any one in claim 1 to 6 is realized.
CN201910669971.8A 2019-07-24 2019-07-24 Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium Pending CN110491385A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910669971.8A CN110491385A (en) 2019-07-24 2019-07-24 Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910669971.8A CN110491385A (en) 2019-07-24 2019-07-24 Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN110491385A true CN110491385A (en) 2019-11-22

Family

ID=68548054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910669971.8A Pending CN110491385A (en) 2019-07-24 2019-07-24 Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110491385A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111245868A (en) * 2020-03-10 2020-06-05 诺领科技(南京)有限公司 Narrowband Internet of things voice message communication method and system
WO2021134284A1 (en) * 2019-12-30 2021-07-08 深圳市欢太科技有限公司 Voice information processing method, hub device, control terminal and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125347A (en) * 1993-09-29 2000-09-26 L&H Applications Usa, Inc. System for controlling multiple user application programs by spoken input
CN102968991A (en) * 2012-11-29 2013-03-13 华为技术有限公司 Method, device and system for sorting voice conference minutes
CN203301691U (en) * 2013-05-31 2013-11-20 中山市天键电声有限公司 Windproof, denoising and unidirectional microphone
CN107978317A (en) * 2017-12-18 2018-05-01 北京百度网讯科技有限公司 Meeting summary synthetic method, system and terminal device
CN108962263A (en) * 2018-06-04 2018-12-07 百度在线网络技术(北京)有限公司 A kind of smart machine control method and system
CN109686363A (en) * 2019-02-26 2019-04-26 深圳市合言信息科技有限公司 A kind of on-the-spot meeting artificial intelligence simultaneous interpretation equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125347A (en) * 1993-09-29 2000-09-26 L&H Applications Usa, Inc. System for controlling multiple user application programs by spoken input
CN102968991A (en) * 2012-11-29 2013-03-13 华为技术有限公司 Method, device and system for sorting voice conference minutes
CN203301691U (en) * 2013-05-31 2013-11-20 中山市天键电声有限公司 Windproof, denoising and unidirectional microphone
CN107978317A (en) * 2017-12-18 2018-05-01 北京百度网讯科技有限公司 Meeting summary synthetic method, system and terminal device
CN108962263A (en) * 2018-06-04 2018-12-07 百度在线网络技术(北京)有限公司 A kind of smart machine control method and system
CN109686363A (en) * 2019-02-26 2019-04-26 深圳市合言信息科技有限公司 A kind of on-the-spot meeting artificial intelligence simultaneous interpretation equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周小东: "《录音工程师手册》", 31 January 2006, 中国广播电视出版社 *
李学华: "《与信息时代同行 与创新精神共进 北京信息科技大学信息与通信工程学院大学生科技创新与工程实践论文集》", 31 July 2017, 北京邮电大学出版社 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021134284A1 (en) * 2019-12-30 2021-07-08 深圳市欢太科技有限公司 Voice information processing method, hub device, control terminal and storage medium
CN111245868A (en) * 2020-03-10 2020-06-05 诺领科技(南京)有限公司 Narrowband Internet of things voice message communication method and system
CN111245868B (en) * 2020-03-10 2021-04-13 诺领科技(南京)有限公司 Narrowband Internet of things voice message communication method and system

Similar Documents

Publication Publication Date Title
US11620983B2 (en) Speech recognition method, device, and computer-readable storage medium
US10304452B2 (en) Voice interactive device and utterance control method
US20200335128A1 (en) Identifying input for speech recognition engine
JP2021009701A (en) Interface intelligent interaction control method, apparatus, system, and program
US10241990B2 (en) Gesture based annotations
CN105301594B (en) Range measurement
WO2020024708A1 (en) Payment processing method and device
CN103886861B (en) A kind of method of control electronics and electronic equipment
CN105453174A (en) Speech enhancement method and apparatus for same
CN102903362A (en) Integrated local and cloud based speech recognition
JP2012220959A (en) Apparatus and method for determining relevance of input speech
CN112513983A (en) Wearable system speech processing
CN110491385A (en) Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium
KR20180127136A (en) Double-sided display simultaneous translation device, method and apparatus and electronic device
KR20200090355A (en) Multi-Channel-Network broadcasting System with translating speech on moving picture and Method thererof
CN106339081A (en) Commercial equipment-based equipment carrying-free palm-positioning human-computer interaction method
JP2000207170A (en) Device and method for processing information
JP2016076007A (en) Interactive apparatus and interactive method
US10269349B2 (en) Voice interactive device and voice interaction method
US20170221481A1 (en) Data structure, interactive voice response device, and electronic device
JP7400364B2 (en) Speech recognition system and information processing method
JP7091745B2 (en) Display terminals, programs, information processing systems and methods
Panek et al. Challenges in adopting speech control for assistive robots
CN111492668B (en) Method and system for locating the origin of an audio signal within a defined space
Soda et al. Handsfree voice interface for home network service using a microphone array network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191122