CN112764549A

CN112764549A - Translation method, translation device, translation medium and near-to-eye display equipment

Info

Publication number: CN112764549A
Application number: CN202110380356.2A
Authority: CN
Inventors: 梁祥龙; 黄海; 刘彦婷; 孙树成; 谢天豪
Original assignee: Beijing LLvision Technology Co ltd
Current assignee: Beijing LLvision Technology Co ltd
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-05-07
Anticipated expiration: 2041-04-09
Also published as: CN112764549B

Abstract

The present disclosure provides a translation method, apparatus, medium, and near-eye display device. The translation method comprises the following steps: when information to be translated is obtained, calling a translation model of a target language to translate the information to be translated to obtain a translation text; acquiring source characteristics of the information to be translated, and determining a rendering mode of the translated text matched with the source characteristics; displaying the translated text on the near-eye display device based on the rendering manner. Through the technical scheme provided by the embodiment of the disclosure, the rendering mode of the translated text corresponds to the acquired source characteristics, the adaptability between the display of the translated text and the sound source can be ensured, the display effect of the translated text on the near-to-eye display equipment is further optimized, and correspondingly, the search experience of a user on the translated text is favorably promoted.

Description

Translation method, translation device, translation medium and near-to-eye display equipment

Technical Field

The present disclosure relates to the field of near-eye display technologies, and in particular, to a translation method, a translation device, a computer-readable storage medium, and a near-eye display device.

Background

The translation equipment translates the collected sentences of the sounder, converts the translated text into an audio file or a video file and plays the audio file or the video file for the user, and in the related technology, as a display mode of the translated text, the translated text can be displayed on a large screen of a meeting place for the audience to watch, but the mode needs the audience to watch the screen all the time, so that the scene is limited, and the experience of the audience is poor.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide a translation method, a translation apparatus, a medium, and a near-eye display device for overcoming, at least to some extent, the problem of a user's poor look and feel of translated text due to limitations and disadvantages of the related art.

According to an aspect of an embodiment of the present disclosure, there is provided a translation method including: when information to be translated is obtained, calling a translation model of a target language to translate the information to be translated to obtain a translation text; acquiring source characteristics of the information to be translated, and determining a rendering mode of the translated text matched with the source characteristics; displaying the translated text on the near-eye display device based on the rendering manner.

In an exemplary embodiment of the present disclosure, the obtaining a source feature of the information to be translated, and determining a rendering manner of the translated text matching the source feature includes: acquiring a window range image of the near-eye display device; carrying out face recognition on the window range image, and determining an information source of the information to be translated based on the face recognition result; and determining the acquired visual features of the information source as the source features so as to determine the rendering mode matched with the visual features.

In an exemplary embodiment of the present disclosure, the determining the acquired visual feature of the information source as the source feature to determine the rendering manner matched with the visual feature includes: detecting a distance to the information source based on the visual feature; determining a display size of the translated text based on the distance; and performing display rendering on the translated text based on the display size.

In an exemplary embodiment of the present disclosure, the determining the acquired visual feature of the information source as the source feature to determine the rendering manner matched with the visual feature includes: when the visual features of the information source are determined, acquiring action information and expression information of the information source based on the visual features; determining a display time and a blanking time of the translated text based on the action information; determining a display style matched with the expression information; and performing display rendering on the translation text based on the display time, the blanking time and the display style.

In an exemplary embodiment of the present disclosure, the displaying the translated text on the near-eye display device based on the rendering manner further comprises: determining position information of an information source of the information to be translated in a space coordinate system based on sound phase and/or source image characteristics of the information to be translated; mapping the translated text displayed on the near-eye display device to a position of the information source based on the position information based on a mapping relationship between a display coordinate system and a spatial coordinate system of the near-eye display device.

In an exemplary embodiment of the present disclosure, the mapping the translated text displayed on the near-eye display device to the position of the information source based on the mapping relationship between the display coordinate system and the spatial coordinate system of the near-eye display device and the position information further includes: the position information comprises a distance from the information source, and a display focal plane of the near-eye display device is adjusted based on the distance; causing the translated text to be displayed on the adjusted focal plane.

In an exemplary embodiment of the present disclosure, the mapping the translated text displayed on the near-eye display device to the position of the information source based on a mapping relationship between a display coordinate system and a spatial coordinate system of the near-eye display device includes: when the information source is detected to move, performing face tracking operation on the information source so as to continuously capture the position information of the information source; rendering the translated text to move according to the position information based on the mapping relation so that the translated text moves along with the information source.

In an exemplary embodiment of the present disclosure, the information to be translated includes video information, and when the information to be translated is obtained, invoking a translation model of a target language to translate the information to be translated to obtain a translation text includes: performing sign language recognition on the video information to recognize sign language information; and calling a sign language translation model of a target language to convert the sign language information into the translation text.

In an exemplary embodiment of the present disclosure, the information to be translated includes voice information, and the invoking a translation model of a target language to translate the information to be translated to obtain a translation text includes: recognizing the language of the voice information based on a language model of a convolutional neural network confidence coefficient; calling a voice recognition engine corresponding to the language to convert the voice information into text information; and calling a translation engine of the target language, and translating the text information into the translation text.

In an exemplary embodiment of the disclosure, the translation engine for retrieving the target language includes: calling a translation engine of the target language based on the acquired selection instruction of the user; or based on the preset corresponding relation between the language of the voice information and the target language, calling a translation engine of the target language; or acquiring attribute information of the user to call a translation engine of the corresponding target language based on the attribute information of the user.

In an exemplary embodiment of the present disclosure, the information to be translated includes voice information, the obtaining a source feature of the information to be translated, and determining a rendering manner of the translated text matching the source feature includes: extracting voiceprint features in the voice information, and determining the voiceprint features as the source features; configuring a display mode of characters in the translation text based on the voiceprint features; and displaying and rendering the translation text based on the display mode of the characters.

In an exemplary embodiment of the present disclosure, further comprising: receiving the voice information based on a wired transmission link and/or a wireless transmission link; and/or collecting the voice information based on a sound collection module of the near-eye display device.

In an exemplary embodiment of the present disclosure, the information to be translated includes voice information, and before the translation model of the target language is called to translate the information to be translated to obtain the translated text, the method further includes: when the voice information is collected, detecting whether voiceprint information of the voice information is matched with prestored voiceprint information or not so as to determine whether the voice information is sent out by a wearing user of the near-to-eye display equipment or not; and/or when the voice information is collected, carrying out sound source positioning on the voice information to determine whether the voice information is sent by a wearing user of the near-eye display equipment, wherein when the voice information is determined not to be sent by the wearing user, the obtained translation text is displayed based on the rendering mode, and when the voice information is determined to be sent by the wearing user, the obtained translation text is sent to a receiving target in a text and/or voice mode.

In an exemplary embodiment of the present disclosure, further comprising: when a picture to be translated is received, executing character recognition operation on the picture; when the picture is recognized to comprise the characters to be translated, translating the characters to be translated into target language characters; and superposing the target language characters on the picture, and displaying the picture on the near-to-eye display equipment.

In an exemplary embodiment of the present disclosure, further comprising: converting the translated text into audio information; playing the audio information in a headset of the near-eye display device.

According to another aspect of the embodiments of the present disclosure, there is provided a translation apparatus including: the translation module is used for calling a translation model of a target language to translate the information to be translated when the information to be translated is obtained, so as to obtain a translation text; the rendering module is used for acquiring the source characteristics of the information to be translated and determining the rendering mode of the translated text matched with the source characteristics; a display module to display the translated text on the near-eye display device based on the rendering.

According to another aspect of the present disclosure, there is provided a near-eye display device including: a processor; and a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the translation method of any of the above aspects via execution of the executable instructions.

According to another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a translation method as recited in any of the above.

According to the technical scheme, the received information to be translated is translated by calling the translation model of the target language to obtain the translated text, the source characteristics of the information to be translated are analyzed, the matched rendering mode is determined based on the analyzed different source characteristics, the translated text is rendered based on the rendering mode and then displayed on the near-to-eye display device, the rendering mode corresponds to the obtained source characteristics, the adaptability between the display of the translated text and a sound source can be guaranteed, the display effect of the translated text on the near-to-eye display device is further optimized, and correspondingly, the looking-up experience of a user on the translated text is favorably improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 shows a schematic structural diagram of a near-eye display device in an exemplary embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of a method of translation in an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a flow diagram of another translation method in an exemplary embodiment of the present disclosure;

FIG. 4 illustrates a flow chart of yet another translation method in an exemplary embodiment of the present disclosure;

FIG. 5 illustrates a flow chart of yet another translation method in an exemplary embodiment of the present disclosure;

FIG. 6 illustrates a flow chart of yet another translation method in an exemplary embodiment of the present disclosure;

FIG. 7 illustrates a flow chart of yet another translation method in an exemplary embodiment of the present disclosure;

FIG. 8 illustrates a flow chart of yet another translation method in an exemplary embodiment of the present disclosure;

FIG. 9 illustrates a flow chart of yet another translation method in an exemplary embodiment of the present disclosure;

FIG. 10 illustrates a flow chart of yet another translation method in an exemplary embodiment of the present disclosure;

FIG. 11 is a block diagram of a translation device in an exemplary embodiment of the present disclosure;

fig. 12 illustrates a block diagram of a near-eye display device in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Further, the drawings are merely schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus, a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Fig. 1 is a schematic structural diagram of a near-eye display device in an exemplary embodiment of the present disclosure.

The near-eye display device is one of display devices, and is generally made into a wearable device form (for example, made into a form of glasses or a head-mounted device), and through the near-eye display device, display can be realized within a distance of 1-5cm from eyes, and a real scene can be superimposed.

Referring to fig. 1, a near-eye display device includes a display lens 102, a frame assembly 104, a system processing module 106, a battery and communications module 112, a sound capture module 114, a sound playback module 116, and an image capture module 118. The smart device 108 and the connection cable 110 perform secondary operations.

The system processing module 106 may perform the translation and recognition operations, or the external smart device 108 may perform the translation and recognition operations.

The connection cable 110 is used to connect the near-eye display device and the smart device 112.

A battery user in the battery and communication module 112 powers the near-eye display device and the communication module in the battery and communication module 112 is configured to receive and transmit related information.

The sound collection module 114 is used for collecting voice information.

The sound playing module 116 the user plays the translated content to the user.

The image acquisition module 118 is used to acquire image information of the information source.

In addition, in order to cooperate factors such as user's eyesight difference, environmental condition to assist and set up annex such as lens, myopia picture frame and nose support to guarantee the comfortable experience of user. Glasses have miniature battery, but the autonomous working also can link to each other with the server through communication devices such as WIFI, bluetooth that glasses were taken certainly, also can link to each other with intelligent terminal equipment through USB or other wired modes, will translate the process record and save to look over in preparation for tracing back.

The following detailed description of exemplary embodiments of the disclosure refers to the accompanying drawings.

FIG. 2 is a flow chart of a translation method in an exemplary embodiment of the present disclosure.

Referring to fig. 2, the translation method may include:

step S202, when the information to be translated is obtained, a translation model of the target language is called to translate the information to be translated to obtain a translation text.

The information to be translated can be text information, voice information and video information.

In an exemplary embodiment of the present disclosure, when the information to be translated is voice information, the method further includes: receiving voice information based on a wired transmission link and/or a wireless transmission link; and/or voice information is collected by a sound collection module based on the near-eye display device.

In this embodiment, the sound source of the voice information may be the voice information directly sent by the speaker, or the recorded and broadcast voice information imported by the third party sound source, or the voice information collected by the sound collection module, or the voice information received by the wireless transmission link and/or the wired transmission link, or the voice information obtained by text conversion, and the use requirements under different translation scenes are met by setting different voice information acquisition modes.

Specifically, the translation scheme applied to the near-eye display device in the application may be applied to a variety of different scenes, for example, in a conference scene, voice information of participants may be collected through a wired link, a wireless link, a sound collection module, and the like, or voice information transmitted by a third party may be received through a wired link and a wireless link, and in a cinema scene, a movie sound source is translated into a native language of an audience through a wired link and a wireless link and displayed on a lens of the near-eye display device.

Further, the voice information is translated into the information of the text format of the target language by calling the translation model so as to facilitate the display of the translated text.

In addition, as can be understood by those skilled in the art, the translation operation on the acquired information to be translated may be implemented by loading the translation model on the near-eye display device, or the translation model may be deployed in a server or a terminal communicatively connected to the near-eye display device, and the server or the terminal issues the translated text to the near-eye display device, and the rendering and display operation is performed by the near-eye display device.

Step S204, acquiring the source characteristics of the information to be translated, and determining the rendering mode of the translated text matched with the source characteristics.

The source characteristics of the information to be translated may be directly analyzed from the voice information and/or the video information, and for the voice information, the source characteristics may also be obtained based on the analysis of the sound source, where the source characteristics include, but are not limited to, the position of the sound source, the voiceprint characteristics of the sound source, the state of the sound source, and the like.

Rendering modes of the translated text include, but are not limited to, a display position, a display area size, a display font type and color, a display duration and the like.

And step S206, displaying the translation text on the near-eye display device based on the rendering mode.

The method comprises the steps of displaying translated text on a near-eye display device, wherein the near-eye display comprises an immersive display mode and a perspective display mode, the immersive display mode blocks a real world view field of a user, the perspective display mode enables the real world view field of the user to be in an open state, a transparent image or a very small opaque image is created, and only a small part of peripheral vision of the user is blocked.

In addition, in the see-through near-eye display device, including a free-form surface mirror, a free-form surface waveguide, a slab array waveguide, a hologram waveguide based on the diffraction principle, and the like, a waveguide having a multi-focal plane is exemplified in the present disclosure.

In this embodiment, a translation model of a target language is called to translate received information to be translated to obtain a translated text, the source characteristics of the information to be translated are analyzed, a matched rendering mode is determined based on the analyzed different source characteristics, the translated text is rendered based on the rendering mode and then displayed on near-to-eye display equipment, the rendering mode corresponds to the obtained source characteristics, the adaptability between the display of the translated text and a sound source can be ensured, the display effect of the translated text on the near-to-eye display equipment is further optimized, and correspondingly, the search experience of a user on the translated text is favorably improved.

In addition, based on near-to-eye display technology, the user can easily project translated content information to the front of the eye while seeing the real world clearly, and therefore daily communication experience among different language wearers is improved.

The steps of the translation method will be described in further detail with reference to fig. 3 to 10. As shown in fig. 3, in an exemplary embodiment of the present disclosure, in step S204, obtaining a source feature of information to be translated, and determining a rendering manner of a translation text matching the source feature includes:

step S302, acquiring a window range image of the near-eye display device.

And if the sound source is in the visual field range of the user, setting an image acquisition module on the near-eye display equipment, and correspondingly acquiring a window range image of the near-eye display equipment through the image acquisition module so as to further execute face recognition operation on the window range image.

And step S304, carrying out face recognition on the image in the window range, and determining an information source of the information to be translated based on the face recognition result.

The face recognition operation is performed on the view range image, the mouth motion of the object can be detected for recognition, and when the mouth motion of the object is detected, the object can be considered as the object which is speaking. The obtained face image can also be compared with a pre-stored face set to determine the identity of the object, and further the voiceprint feature in the voice information is extracted to compare the voiceprint feature with the voiceprint set to determine the identity of the object, wherein the object with the identity consistent with the face object identity is the object which is speaking.

Step S306, the visual characteristics of the collected information source are determined as source characteristics, and a rendering mode matched with the visual characteristics is determined.

Wherein, the visual characteristics of the information source, i.e. the source characteristics that can be obtained by image capturing, such as the motion of the information source, the position of the information source, etc., the identity characteristics of the information source, etc.

In the embodiment, when the image of the information source is acquired based on the face recognition operation, the characteristic rendering is realized by detecting the visual features of the information source and performing the adaptive rendering operation based on the visual features, so that the user can be immersed in the context of the translation more easily, and the use experience of the user is further improved.

As shown in fig. 4, in an exemplary embodiment of the present disclosure, step S306, determining a visual feature of the acquired information source as a source feature, so as to determine a specific implementation manner of a rendering manner matched with the visual feature, includes:

step S402, detecting the distance between the information source and the visual characteristic.

In step S404, the display size of the translated text is determined based on the distance.

Step S406, rendering the translation text based on the display size.

In this embodiment, the distance between the near-eye display device and the information source is determined according to the depth camera or a certain distance mapping algorithm, so as to determine the size of the text box displayed in the translated text according to the distance, for example, the distance between the text box and the information source is longer, the occupied area of the information source in the window is smaller, the text box can be expanded at this time, the distance between the text box and the information source is shorter, the occupied area of the information source in the window is larger, the text box can be appropriately reduced at this time, so as to prevent the information source from being blocked, and thus, the interaction feeling between the user and the information source is favorably improved when the translated text is looked up.

As shown in fig. 5, in an exemplary embodiment of the present disclosure, step S306, determining a visual feature of the acquired information source as a source feature, so as to determine a specific implementation manner of a rendering manner matched with the visual feature, includes:

step S502, when the visual characteristics of the information source are determined, the action information and the expression information of the information source are collected based on the visual characteristics.

In step S504, the display time and the blanking time of the translated text are determined based on the motion information.

In step S506, a display style matching the expression information is determined.

And step S508, performing display rendering on the translation text based on the display time, the blanking time and the display style.

In this embodiment, the actions of the sound-generating object include, but are not limited to, head gestures, lip actions, and the like of the sound-generating object, for example, when a lip start action of an information source is detected and voice information is received, the object may be considered to start speaking, at this time, a corresponding translated text is obtained based on a translation result of the voice information and starts to be displayed on the near-eye display device, when a lip stop action of the information source is detected, it may be considered that the object has finished speaking, and after the finally received voice information is translated into the translated text and is displayed, a blanking operation is performed, which is beneficial to improving the synchronism between the display blanking of the translated text and the speaking.

Further, emotion analysis results of the information source, such as pleasure, seriousness, sadness and the like, are obtained in combination with analysis of expression information of the face of the information source, so that a display style matched with emotion is determined based on the emotion analysis results of the expression information, and the method is beneficial to assisting in infecting emotion of the sound-emitting object to a user, namely improving immersion effect when the translated text is displayed by using a near-eye display device.

As shown in fig. 6, in an exemplary embodiment of the present disclosure, in step S204, the information to be translated includes voice information, the source feature of the information to be translated is obtained, and determining a rendering manner of the translated text matching the source feature includes:

step S602, extracting the voiceprint feature in the voice information, and determining the voiceprint feature as the source feature.

Step S604, configuring a display mode of characters in the translation text based on the voiceprint characteristics.

Step S606, the translated text is displayed and rendered based on the display mode of the characters.

In this embodiment, by obtaining the voiceprint feature in the voice information, the sound source identifier of the information source can be obtained based on the voiceprint, the sound source identifier can represent the features of the information source, such as males, females, elderly people, children, and the like, and further the display mode of the translated text is configured based on the sound source identifier, such as displaying the translated text with different colors, fonts, and the like based on different sound source identifiers, so as to realize personalized display of the translated text.

As shown in fig. 7, in an exemplary embodiment of the present disclosure, displaying the translated text on the near-eye display device based on the rendering manner further includes:

step S702, determining the position information of the information source of the information to be translated in the space coordinate system based on the sound phase and/or the source image characteristics of the information to be translated.

Step S704, based on the mapping relationship between the display coordinate system and the spatial coordinate system of the near-eye display device, maps the translated text displayed on the near-eye display device to the position of the information source based on the position information.

In this embodiment, the image acquisition module is used for portrait detection, and when the portrait is detected, the position information of the information source in the spatial coordinate system is determined based on the image characteristics of the portrait, such as the position information of the portrait in the image.

And/or the sound phase based on the acquired voice information, for example, if the left and right ears are completely consistent from the structure to the nervous system, and the sound pressure with the same amplitude acts on the eardrum, the sound pressure with the same amplitude can cause the vibration with the same amplitude to be converted into the neural signal with the same amplitude, but if the vibration is not synchronous, namely, a phase difference exists, the brain of the user can identify the difference of the phase when processing the two signals and tell the user that the source of the sound deviates from the symmetrical plane between the two ears, so that the user can help the user to distinguish the direction of the information source by combining other information, and the position information of the information source in the space coordinate system is determined based on the direction.

The position of the information source is determined and tracked, the position mapping relation between the display coordinate system and the actual space coordinate system is obtained according to the perspective principle, namely, the translated content can be displayed on the lens and mapped to the position near the actual information source, so that a wearer can more visually see the content of the information source, the translated text is superposed on the information source in the visual field of the user, the display function of the translated text is realized, the matching display between the translated text and the information source of the original voice information is realized, and the immersive experience effect of the near-to-eye display device is further favorably improved.

In an exemplary embodiment of the present disclosure, mapping the translated text displayed on the near-eye display device to a position of the information source based on the mapping relationship between the display coordinate system and the spatial coordinate system of the near-eye display device and the position information, further includes:

the position information includes a distance from the information source, and a display focal plane of the near-eye display device is adjusted based on the distance such that the translated text is displayed on the adjusted focal plane.

In this embodiment, the focal plane may be understood as a plane where a person in the window focuses on the eyeball of the user through the near-eye display device, and the distance between the translated text and the information source and the user is the same in the vision of the user by adjusting the position of the lens in the near-eye display device, so as to achieve a better virtual-real fusion effect.

As shown in fig. 8, in an exemplary embodiment of the present disclosure, mapping the translated text displayed on the near-eye display device to the position of the information source based on the mapping relationship between the display coordinate system and the spatial coordinate system of the near-eye display device includes:

in step S802, when the information source is detected to move, a face tracking operation is performed on the information source to continuously capture the position information of the information source.

And step S804, rendering the translation text to move according to the position information based on the mapping relation so that the translation text moves along with the information source.

In the embodiment, when the face recognition is performed, a face position tracking algorithm is executed at the same time, the position information of the information source is continuously acquired, and the system is continuously rendered after being mapped by a coordinate system, so that the effect of tracking the sound source object by the character display frame is achieved.

As shown in fig. 9, in an exemplary embodiment of the present disclosure, the information to be translated includes video information, and when the information to be translated is acquired, invoking a translation model of a target language to translate the information to be translated to obtain a translation text includes:

step S902, performing sign language recognition on the video information to recognize the sign language information.

Step S904, a sign language translation model of the target language is called to convert the sign language information into a translation text.

In this embodiment, the information to be translated may also be video information, the video information includes sign language information, the sign language information in the video information is extracted through the sign language recognition model, and is further translated into text information through a conversion model between the sign language and the text, that is, a sign language translation model, and if the obtained text information is not a text in a target language, a secondary translation operation is performed to obtain a finally displayed translated text.

Further, the above process may be performed in reverse, that is, the voice information is translated into video information including sign language information corresponding to the voice information, and the video information is played on the near-to-eye display device, and at this time, the wearer of the near-to-eye display device may be a deaf-mute person.

In addition, sign language can be recognized by wearing an external device such as a specific sign language recognition glove, and the recognized sign language can be directly converted into a translation text of a target language by connecting the recognized sign language to a near-eye display device through wireless or wired data.

In an exemplary embodiment of the present disclosure, the information to be translated includes voice information, and the invoking of the translation model of the target language to translate the information to be translated to obtain the translation text includes:

and recognizing the language of the voice information based on the language model of the confidence coefficient of the convolutional neural network.

And calling a voice recognition engine corresponding to the language to convert the voice information into text information.

And calling a translation engine of the target language, and translating the text information into a translation text.

In the embodiment, a language model is obtained based on the confidence coefficient of the convolutional neural network, the sound and the language of an information source are judged through the language model, the information source is quickly translated into the target language to obtain a translated text, the translated text is displayed in front of eyes of a wearer in real time through a transparent near-eye display technology, and daily life and communication can be better carried out in a multilingual environment.

In an exemplary embodiment of the present disclosure, a translation engine for retrieving a target language includes: calling a translation engine of a target language based on the acquired selection instruction of the user; or calling a translation engine of the target language based on a preset corresponding relation between the language of the voice information and the target language; or acquiring the attribute information of the user to call the translation engine of the corresponding target language based on the attribute information of the user.

In this embodiment, when receiving the voice information, determining the target language corresponding to the language of the voice information includes, but is not limited to, the following ways:

first, the selection operation is determined by receiving a user, wherein the user's selection operation includes but is not limited to: the touch operation on the designated area on the near-eye display device, the touch operation on the touch device in communication connection with the near-eye display device, the voice selection instruction of the user and the like are beneficial to ensuring the reliability of the target language selection.

Secondly, the corresponding relation between different languages is preset, such as the corresponding relation between English and Chinese, the corresponding relation between German and Chinese, and the like, so that the operation of a user is facilitated to be simplified.

Thirdly, the attribute information of the user of the current near-eye display device is detected to determine, where the attribute information of the user includes but is not limited to nationality of the user, appearance characteristics of the user, language type of the user, and the like, for example, the language type of the user is determined by collecting voice information sent by the user, so as to determine the language type of the user as a target language, and the reliability of determining the target language is improved, and meanwhile, the operation steps of the user are simplified, so that the use experience of the user can be improved.

In an exemplary embodiment of the present disclosure, the information to be translated includes voice information, and before the translation model of the target language is called to translate the information to be translated, and a translation text is obtained, the method further includes: when voice information is collected, detecting whether voiceprint information of the voice information is matched with prestored voiceprint information or not so as to determine whether the voice information is sent out by a wearing user of the near-eye display equipment or not; and/or when the voice information is collected, carrying out sound source positioning on the voice information to determine whether the voice information is sent by a wearing user of the near-eye display device, wherein when the voice information is determined not to be sent by the wearing user, the obtained translation text is displayed based on a rendering mode, and when the voice information is determined to be sent by the wearing user, the obtained translation text is sent to a receiving target in a text and/or voice mode.

In this embodiment, the near-eye display device is adopted to display a translated text obtained by translating voice information sent by a receiving target, and simultaneously, the voice information of the wearing user can be translated into a translated text of a target language and sent to the receiving target in a text and/or audio manner, so that the wearing user can perform interactive cooperation with the receiving target at a remote end of the network in a remote cooperation scene based on the cooperative communication of different languages.

In an exemplary embodiment of the present disclosure, further comprising: when a picture to be translated is received, executing character recognition operation on the picture; when the picture is recognized to comprise the characters to be translated, translating the characters to be translated into the target language characters; and superposing the target language characters on the picture, and displaying the picture on a near-eye display device.

In this embodiment, the translation system may also support recognition of characters or identifiers in an image, and by using an ocr (optical character recognition) recognition technology, the identifiers with specific meanings, such as characters, icons, and guideboards, captured by the image capture module may be translated in real time and displayed on the near-eye display device to be presented to a wearing user, which is beneficial to further improving the user experience of the user.

For example, in a conference scene, after the near-eye display device is networked, the pictures or slides sent by the background network can be translated into the native language of the user, and personalized translation content is provided for wearers of different native languages, so that the user experience in participating in the conference is better.

In an exemplary embodiment of the present disclosure, further comprising: converting the translated text into audio information; audio information is played in the headphones of the near-eye display device.

In this embodiment, the near-eye display device converts the text in the target language into the target language and plays the target language by using a voice playing module in the TTS (texttospace) engine for calling the target language, which is beneficial to further improving the synchronization of translation.

As shown in fig. 10, a translation method applied to a near-eye display device according to one embodiment of the present disclosure includes:

step S1002, performs a sound collection operation to obtain voice information sent by a sound source.

Step S1004, voiceprint recognition is carried out based on a Transformer neural network algorithm to obtain voiceprint information, and a sound source identifier is generated based on the voiceprint information.

Wherein the sound source identification is determined as a source feature.

Step S1006, language identification is carried out based on the language model of the confidence coefficient of the convolutional neural network to obtain the language of the information source.

Step S1008, the ASR engine of the corresponding language is called to convert the speech into a source language text.

Step S1010, a translation engine corresponding to the target language is called, and the source language text is translated into a translation text of the target language.

Step S1012, associate and mark the translated text with the sound source identifier.

And step S1014, using an image acquisition module to perform portrait recognition, determining identity information and spatial position information of the sound source, and associating the identity information and the spatial position information with the sound source identifier.

Step S1016, rendering the text mark into different display modes according to the information associated with the sound source identification, displaying on the transparent near-eye display lens and mapping to the vicinity of the actual information source space position.

Further, the translated text can be transmitted to a second wearer through a sharing function and displayed on the near-eye display lens of the second wearer, and when the translated text is displayed on the near-eye display lens of the second wearer, the display position of the translated text can be adaptively adjusted according to the relative position relation between the second wearer and the sound source.

When the image acquisition module is used for portrait detection, the position information of the sound source is determined and tracked, and the position mapping relation between the display coordinate system and the actual space coordinate system is obtained according to the perspective principle, so that the translated content can be displayed on the lens and mapped to the position near the position of the actual information source, and a wearer can more visually see the content spoken by the information source.

Or when the image acquisition module is used for identifying the portrait, the identity information of the information source is determined, the identity information is associated with the sound source identification obtained by performing voiceprint identification on the voice information, and according to the perspective principle, when a plurality of information sources are provided, the translation contents corresponding to the plurality of information sources can be displayed on the lens and mapped to the positions near the plurality of information source positions, so that a wearer can more intuitively see the contents spoken by the plurality of information sources.

In addition, the translation system based on the near-eye display equipment also supports the introduction of a third-party sound source, is in sound butt joint with other systems, can be configured with a selected sound source when being used in a large-scale conference, and a participant wearer can select to translate different native language speakers on a conference chairman table and display the translated content text on lenses, so that the personalized participant experience is provided for the wearer. Similarly, when viewing a movie, the translation system can translate the source language of the movie into the native language of the viewer and display the native language on the lens, thereby providing the wearer with a personalized viewing experience.

In a remote cooperation scene, a wearer of the equipment carries out conversation cooperation with collaborators of different mother languages at a remote end of the network, the equipment can translate the sound of the wearer into the mother language of the collaborators, the translated result is sent to the collaborators in the form of characters or voice, and the presenting mode of the collaborators is not limited. Meanwhile, the equipment can input the sound of the collaborator as a third-party sound source, translate the sound into the native language of the wearer and display the native language on the display lens, and better cross-country cooperation experience can be achieved.

Corresponding to the above embodiments, the present disclosure also provides a translation apparatus, which may be used to execute the above embodiments.

Fig. 11 is a block diagram of a translation apparatus in an exemplary embodiment of the present disclosure.

Referring to fig. 11, the translation apparatus 1100 may include:

the translation module 1102 is configured to, when the information to be translated is acquired, invoke a translation model of a target language to translate the information to be translated to obtain a translation text.

And the rendering module 1104 is configured to acquire a source feature of the information to be translated, and determine a rendering mode of the translated text matched with the source feature.

A display module 1106 configured to display the translated text on a near-eye display device based on the rendering mode.

Since the functions of the apparatus 1100 have been described in detail in their corresponding embodiments, the disclosure is not repeated herein.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the functionality and features of two or more of the modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present disclosure. Conversely, the functions and functionalities of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In an exemplary embodiment of the present disclosure, there is also provided a near-eye display apparatus capable of implementing the above.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

A near-eye display device 1200 according to this embodiment of the invention is described below with reference to fig. 12. The near-eye display device 1200 shown in fig. 12 is only an example, and should not bring any limitation to the functions and the range of use of the embodiment of the present invention.

As shown in fig. 12, near-eye display device 1200 is in the form of a general purpose computing device. Components of near-eye display device 1200 may include, but are not limited to: the at least one processing unit 1210, the at least one memory unit 1220, and a bus 1230 connecting the various system components including the memory unit 1220 and the processing unit 1210.

Wherein the storage unit stores program codes, which can be executed by the processing unit 1210, so that the processing unit 1210 executes the steps according to various exemplary embodiments of the present invention described in the above "exemplary" section of this specification. For example, the processing unit 1210 described above may perform as shown in the embodiments of the present disclosure.

The storage unit 1220 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)12201 and/or a cache memory unit 12202, and may further include a read only memory unit (ROM) 12203.

Storage unit 1220 may also include a program/utility 12204 having a set (at least one) of program modules 12205, such program modules 12205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1230 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The near-eye display device 1200 may also communicate with one or more peripheral devices 1300 (e.g., a keyboard, a pointing device, a bluetooth device, etc.), may also communicate with one or more devices that enable a user to interact with the near-eye display device 1200, and/or may communicate with any devices (e.g., a router, a modem, etc.) that enable the near-eye display device 1200 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 1250. Also, near-eye display device 1200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as an internet) via network adapter 12120. As shown, the network adapter 12120 communicates with the other modules of the near-eye display device 1200 via a bus 1230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with near-eye display device 1200, including but not limited to: microcode, device drivers, redundant processing units, external magnetic disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the embodiments according to the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above-mentioned "exemplary" section of the present description, when said program product is run on the terminal device.

The program product for implementing the above-described may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer, according to an embodiment of the present invention. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through an internet connection using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved according to exemplary embodiments of the present invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A translation method applied to a near-eye display device is characterized by comprising the following steps:

when information to be translated is obtained, calling a translation model of a target language to translate the information to be translated to obtain a translation text;

acquiring source characteristics of the information to be translated, and determining a rendering mode of the translated text matched with the source characteristics;

displaying the translated text on the near-eye display device based on the rendering manner.

2. The translation method according to claim 1, wherein the obtaining of the source feature of the information to be translated and the determining of the rendering manner of the translated text matching the source feature comprise:

acquiring a window range image of the near-eye display device;

carrying out face recognition on the window range image, and determining an information source of the information to be translated based on the face recognition result;

and determining the acquired visual features of the information source as the source features so as to determine the rendering mode matched with the visual features.

3. The translation method according to claim 2, wherein the determining the collected visual features of the information source as the source features to determine the rendering mode matching the visual features comprises:

detecting a distance to the information source based on a visual characteristic of the information source;

determining a display size of the translated text based on the distance;

and performing display rendering on the translated text based on the display size.

4. The translation method according to claim 2, wherein the determining the collected visual features of the information source as the source features to determine the rendering mode matching the visual features comprises:

when the visual features of the information source are determined, acquiring action information and expression information of the information source based on the visual features;

determining a display time and a blanking time of the translated text based on the action information; and

determining a display style matched with the expression information;

and performing display rendering on the translation text based on the display time, the blanking time and the display style.

5. The translation method of claim 1, wherein said displaying the translated text on the near-eye display device based on the rendering further comprises:

determining position information of an information source of the information to be translated in a space coordinate system based on sound phase and/or source image characteristics of the information to be translated;

mapping the translated text displayed on the near-eye display device to a position of the information source based on the position information based on a mapping relationship between a display coordinate system of the near-eye display device and the spatial coordinate system.

6. The translation method according to claim 5, wherein the mapping the translated text displayed on the near-eye display device to the position of the information source based on the mapping relationship between the display coordinate system and the spatial coordinate system of the near-eye display device and the position information further comprises:

the position information comprises a distance from the information source, and a display focal plane of the near-eye display device is adjusted based on the distance;

causing the translated text to be displayed on the adjusted focal plane.

7. The translation method according to claim 6, wherein said mapping the translated text displayed on the near-eye display device to the position of the information source based on the mapping relationship between the display coordinate system and the spatial coordinate system of the near-eye display device comprises:

when the information source is detected to move, performing face tracking operation on the information source so as to continuously capture the position information of the information source;

rendering the translated text to move according to the position information based on the mapping relation so that the translated text moves along with the information source.

8. The translation method according to claim 1, wherein the information to be translated comprises video information, and when the information to be translated is obtained, calling a translation model of a target language to translate the information to be translated to obtain a translation text comprises:

performing sign language recognition on the video information to recognize sign language information;

and calling a sign language translation model of a target language to convert the sign language information into the translation text.

9. The translation method according to claim 1, wherein the information to be translated includes voice information, and the invoking of the translation model in the target language to translate the information to be translated to obtain a translated text includes:

recognizing the language of the voice information based on a language model of a convolutional neural network confidence coefficient;

calling a voice recognition engine corresponding to the language to convert the voice information into text information;

and calling a translation engine of the target language, and translating the text information into the translation text.

10. The translation method according to claim 9, wherein said invoking the translation engine of the target language comprises:

calling a translation engine of the target language based on the acquired selection instruction of the user; or

Calling a translation engine of the target language based on a preset corresponding relation between the language of the voice information and the target language; or

And acquiring attribute information of a user to call a corresponding translation engine of the target language based on the attribute information of the user.

11. The translation method according to claim 1, wherein the information to be translated includes voice information, the obtaining of the source feature of the information to be translated, and the determining of the rendering manner of the translation text matching the source feature includes:

extracting voiceprint features in the voice information, and determining the voiceprint features as the source features;

configuring a display mode of characters in the translation text based on the voiceprint features;

and displaying and rendering the translation text based on the display mode of the characters.

12. The translation method according to claim 9 or 11, further comprising:

receiving the voice information based on a wired transmission link and/or a wireless transmission link; and/or

And acquiring the voice information based on a sound acquisition module of the near-to-eye display equipment.

13. The translation method according to claim 1, wherein the information to be translated includes voice information, and before the translation model of the target language is called to translate the information to be translated to obtain the translated text, the method further includes:

when the voice information is collected, detecting whether voiceprint information of the voice information is matched with prestored voiceprint information or not so as to determine whether the voice information is sent out by a wearing user of the near-to-eye display equipment or not; and/or

When the voice information is collected, carrying out sound source positioning on the voice information to determine whether the voice information is sent out by a wearing user of the near-to-eye display equipment or not,

and when the voice information is determined to be sent by the wearing user, sending the obtained translation text to a receiving target in a character and/or voice mode.

14. The translation method according to any one of claims 1 to 11, further comprising:

when a picture to be translated is received, executing character recognition operation on the picture;

when the picture is recognized to comprise the characters to be translated, translating the characters to be translated into target language characters;

and superposing the target language characters on the picture, and displaying the picture on the near-to-eye display equipment.

15. The translation method according to any one of claims 1 to 11, further comprising:

converting the translated text into audio information;

playing the audio information in a headset of the near-eye display device.

16. A translation apparatus applied to a near-eye display device, comprising:

the translation module is used for calling a translation model of a target language to translate the information to be translated when the information to be translated is obtained, so as to obtain a translation text;

the rendering module is used for acquiring the source characteristics of the information to be translated and determining the rendering mode of the translated text matched with the source characteristics;

a display module to display the translated text on the near-eye display device based on the rendering.

17. A near-eye display device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the translation method of any of claims 1-15 via execution of the executable instructions.

18. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a translation method according to any one of claims 1 to 15.