CN113255377A

CN113255377A - Translation method, translation device, electronic equipment and storage medium

Info

Publication number: CN113255377A
Application number: CN202110639370.XA
Authority: CN
Inventors: 刘坚; 李秋平; 李磊; 王明轩
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-08-13

Abstract

The embodiment of the application discloses a translation method, a translation device, electronic equipment and a storage medium. In the translation method provided by the embodiment of the application, the target image can be acquired through the AR device, the target translation mode most suitable for the current scene is determined from a plurality of translation operations, the target object is determined from the target image according to the target translation mode, and then the target object is translated and displayed. In this way, when different translation modes are adopted as the target translation modes, the AR device can extract different target objects from the target image. That is, for the same target image, different target objects can be translated by adopting different translation modes. In this way, when the application scenes are different, the user can select the translation mode which is most suitable for the current scene from the multiple translation modes, so that the requirements of the user are met.

Description

Translation method, translation device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a translation method and apparatus, an electronic device, and a storage medium.

Background

Today, globalization is rapidly developed, people can communicate and exchange with other parties in different languages or read different language materials, and the difficulty of language obstruction exists in the situation, and at the moment, translation equipment is brought forward. A translation device may translate a piece of text or speech from one language to another. People using different languages can communicate with each other through the translation device or read language material of other languages through the translation device.

With the development of computer technology, translation equipment is more and more. However, most of the current translation devices have the problem of inconvenient use for users.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present application provide a translation method, an apparatus, an electronic device, and a storage medium, which are intended to meet user requirements in different environments.

In a first aspect, an embodiment of the present application provides a translation method, where the method is applied to an augmented reality AR device, and includes:

acquiring a target image through the AR equipment;

selecting a target translation mode from a plurality of translation modes;

determining a target object from the target image according to the target translation mode;

translating the target object to obtain a translated target object;

and displaying the translated target object through a display unit of the AR device.

In a second aspect, an embodiment of the present application provides a translation apparatus, where the apparatus is applied to an AR device, and includes:

a receiving unit configured to acquire a target image through the AR device;

a processing unit for selecting a target translation mode from a plurality of translation modes; determining a target object from the target image according to the target translation mode; translating the target object to obtain a translated target object;

and the display unit is used for displaying the translated target object through the display unit of the AR equipment.

In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a translation method as described in the foregoing first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the translation method according to the first aspect.

In the translation method provided by the embodiment of the application, the target image can be acquired through the AR device, the target translation mode most suitable for the current scene is determined from a plurality of translation operations, the target object is determined from the target image according to the target translation mode, and then the target object is translated and displayed. In this way, when different translation modes are adopted as the target translation modes, the AR device can extract different target objects from the target image. That is, for the same target image, different target objects can be translated by adopting different translation modes. In this way, when the application scenes are different, the user can select the translation mode which is most suitable for the current scene from the multiple translation modes, so that the requirements of the user are met. In addition, the target object after translation is displayed through the AR equipment, different display requirements under different translation modes are met, the translation instantaneity and the immersion degree are improved, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic diagram of a framework of an exemplary application scenario provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a translation method provided herein;

fig. 3 is a schematic view of a usage scenario of the AR glasses provided in the present application;

fig. 4 is a schematic view of a usage scenario of another AR glasses provided in the present application;

FIG. 5 is a schematic structural diagram of a translation apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present application. It should be understood that the drawings and embodiments of the present application are for illustration purposes only and are not intended to limit the scope of the present application.

It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present application is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present application are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this application are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.

In order to solve the problems of the prior art, the embodiments of the present application provide a translation method, which is described in detail below with reference to the accompanying drawings of the specification.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, first, a description is made with reference to a scene example shown in fig. 1.

Referring to fig. 1, the figure is a schematic diagram of a framework of an exemplary application scenario provided in an embodiment of the present application. The AR device 10 includes an image acquisition unit 11, a processing unit 12, and a display unit 13. The image capturing unit 11 may be configured to capture the external environment 20 to obtain a target image. The processing unit 12 may determine and translate a target object from the target image according to the target translation mode, and display the translated target object through the display unit 13, so that the user 30 can see the translated target object on the basis of the external environment 20.

It should be noted that, in the embodiment shown in fig. 1, the image acquisition unit 11 belongs to the AR device 10. In some other possible implementations, the image capturing unit 11 may also be another device other than the AR device 10, and sends the captured target image to the AR device 10 through a wired or wireless connection.

Fig. 2 is a flowchart of a translation method provided in an embodiment of the present application, where the present embodiment is applicable to a scenario in which a user translates a target object through an AR device, the method may be executed by translation software installed in the AR device, the unit may be implemented by software, and a code of the unit is integrated in a memory of the AR device and executed by a processing unit of the AR device. Of course, the method provided in the embodiment of the present application may also be executed by a server or a computer, and the server or the computer may perform data interaction with the AR device, so as to translate the target object in the target image captured by the AR device. The method is described below by taking the processing unit of the AR device as an example. As shown in fig. 2, the method specifically includes the following steps:

s201: and acquiring a target image through the AR equipment.

Before translating the target object, the target image may be acquired through the AR device. Specifically, the image acquisition unit of the AR device may be controlled to acquire the target image, or an image acquisition instruction may be sent to the external image acquisition unit through the AR device, so as to control the image acquisition unit to acquire the image of the external environment.

In the embodiment of the application, the target image can be acquired through the AR device according to the operation triggered by the user. Specifically, the user may click a control or send a voice instruction to control the processing unit of the AR device to obtain the target image through the AR device. For example, assuming that the AR device has a microphone and a camera, when a user sends a voice command such as "take a picture", a processor of the AR device may acquire a voice signal sent by the user through the microphone and analyze the voice signal, so as to determine a purpose of the user, and control the camera to take a picture of an external environment, so as to obtain a target image.

In some possible implementations, the target image may not be obtained by the AR device through the image acquisition unit. Specifically, when the target translation mode described later is a video stream translation mode, the target image may be derived from a video data stream received by the AR device. Specifically, the AR device may receive a video data stream via a network or a wired connection, and parse the video data stream to obtain a target image therefrom. For example, a key frame in a video may be determined as a target image. For the video stream translation mode, reference may be made to the description of S203, which is not described herein.

S202: a target translation mode is selected from a plurality of translation modes.

In an embodiment of the present application, the AR device may have multiple translation modes, and different translation modes may correspond to different translation scenarios. Before translating the target object, a target translation mode may be selected from a plurality of translation modes, the target translation mode being the most suitable translation mode for translating the target object in the target image, the translation mode may be actively selected by the user, or the translation mode may be selected by the processing unit of the AR device according to the target image. These two implementations are described separately below.

In a first possible implementation, the target translation mode is actively selected by the user. Specifically, the user may control the client to select a target translation mode from a plurality of translation modes by triggering a control or inputting a voice instruction.

When a user selects a target translation mode from a plurality of translation modes by triggering the control, the AR device may display the translation control on the display unit, and the user may trigger a translation display operation for the translation control. The processor of the AR device may determine the target translation mode from the plurality of translation modes according to the translation display operation after receiving the translation display operation triggered by the user.

For example. The AR equipment can display a plurality of translation controls on the display unit, each translation control corresponds to one translation mode, and a user can select the translation control corresponding to the target translation mode from the translation controls according to actual requirements and click the translation control. In this way, the target translation mode that the user wants to select can be determined according to the motor operation of the user. Of course, the AR device may also present the user with a plurality of translation modes by way of a wheel, a list, etc., so that the user may select a target translation mode from among the translation modes.

When a user selects a target translation mode from a plurality of translation modes through voice instruction control, the AR equipment can receive voice information sent by the user through the sound signal acquisition unit, analyze a voice instruction from the voice information of the user, and determine the target translation mode from the plurality of translation modes according to the voice instruction sent by the user.

For example. Assuming that the target translation mode is a "keyword translation mode", the corresponding voice instruction includes "adopting the keyword translation mode". Then when the user wants to translate the target object using the "keyword translation mode" as the target translation mode, the user can issue a voice signal including "using the keyword translation mode". The AR device may collect a voice signal emitted by a user through a sound signal collection unit such as a microphone, and recognize a voice instruction "adopt a keyword translation mode" from the voice signal, thereby selecting the keyword translation mode from a plurality of translation modes as a target translation mode according to the voice instruction. For the introduction of the keyword translation mode, reference may be made to the following, and details are not repeated here.

As can be seen from the foregoing description, in S201, the target image may be captured and acquired according to the operation of the user-triggered control or the voice instruction, and in S202, the target translation mode may also be determined according to the operation of the user-triggered control or the voice instruction. It should be noted that the control used for controlling the AR device to acquire the target image in S201 and the control used for selecting the target translation mode from the multiple translation modes in S202 may be the same control or different controls.

Similarly, the voice instruction for controlling the AR device to acquire the target image in S201 and the voice instruction for selecting the target translation mode from the plurality of translation modes in S202 may be the same voice instruction or different voice instructions. That is to say, the user may respectively control the AR device to acquire the target image and determine the target translation mode through two voice instructions, or may control the AR device to acquire the target image and determine the target translation mode through one instruction. For example, the user may control the AR device to acquire a target image through a voice instruction of "take a picture", and control the AR device to determine a target translation mode through a voice instruction of "adopt a keyword translation mode". Alternatively, the user may control the AR device to acquire the target image and determine the target translation mode by a voice instruction "adopt the keyword translation mode".

In a second possible implementation, the target translation mode may be the translation mode that the processor determines from the target image to best fit the current application scenario. Specifically, after the target image is obtained through the AR device, the target image may be identified to obtain an identification result. The recognition result can show the scene corresponding to the target image, namely the type of the scene shot by the image acquisition unit. If the scene is matched with the preset scene, the target translation mode can be determined to be the preset scene translation mode. The preset scenes comprise one or more special application scenes, and under the scenes, the target object needing to be translated is different from the target object needing to be translated in other application scenes, and the translation method may be different. Then, if it is determined that the scene corresponding to the target image is a preset scene, the target object may be determined and translated by using a method corresponding to the preset scene.

For example. Assuming that the target image includes the commodity price tag, since the format of the commodity price tag is special, and the translation mode applicable to the commodity price tag may be different from the translation mode applicable to other scenes (e.g., news or papers), if the target image includes the commodity price tag, a preset scene translation mode for translating the commodity price tag may be selected from a plurality of translation modes as the target translation mode.

It should be noted that, when the target translation mode is the video stream translation mode, S202 may be performed before S201.

S203: and determining a target object from the target image according to the target translation mode.

When a plurality of objects capable of being translated are included in the target volume image, the target objects capable of being translated in different translation modes may be different, and then before translation, the target object to be translated may be selected from the target image according to the target translation mode.

In this embodiment of the application, the plurality of translation modes may include any one or more of a full-text translation mode, a regional translation mode, a keyword translation mode, a preset scene translation mode, a full-text translation mode, and a video stream translation mode. The following describes cases where the target translation patterns are these translation patterns, respectively.

In a first possible implementation, the target translation mode is a full-text translation mode. If the user selects the full-text translation mode as the target translation mode, it indicates that the user needs to translate a larger amount of text, for example, the user may wish to translate all text appearing in the field of view. Then the target image may be subjected to character recognition, and all translatable objects in the target image may be extracted as target objects.

In some possible implementations, multiple objects in different languages may be included in the target image. For example, a piece of english text and a piece of french text may be included. Then an object in the target image that does not match the target language may be determined as the target object when the target object is extracted. Wherein the target language is the language used by the translated target object. For example, assuming that the target language selected by the user is chinese, other objects than the chinese object in the target image may be determined as the target object.

In a second possible implementation, the target translation mode is a regional translation mode. When there are many objects in the target image that can be translated, translation redundancy may occur if all of these objects are translated. Then, if the user only tends to a part of the target object in the translation target image, the user can select the regional translation mode as the target translation mode. Then, the AR device may determine a preset region from the target image according to the operation of the user, and then extract the target object from the preset region.

In some possible implementations, the target object may be determined based on a point of regard of the user. Wherein, the fixation point is the drop point of the sight of the user. That is, the target image is regarded as a plane, and the gaze point is an intersection of the user's sight line and the plane. Then, a target object may be determined from the target pattern according to the position of the point of regard. Specifically, an image located in a first region around the gazing point may be first found from the target image according to the gazing point position, and the image may be taken as the first image. Next, a target object may be extracted from the first image.

In a third possible implementation, the target translation mode is a keyword translation mode. If the user selects a keyword translation mode as the target translation mode, it is stated that the user may wish to translate certain keywords in the target image. Then, when determining the target object from the target image, one or more candidate words matching the words in the preset garage may be selected from the target image, and then determined as the target object.

Specifically, all translatable objects may be extracted from the entire target image (or a preset area in the target image). These translatable objects may then be compared for matching words in a predetermined lexicon. If there is a match, the translatable object may be determined to be the target object.

In the embodiment of the present application, the preset lexicon may also be determined by a user through a selection operation. That is, the user may trigger a selection operation on the AR device through the operation control or the voice instruction, and after receiving the selection operation triggered by the user, the processing unit of the AR device may determine the preset lexicon according to the selection operation, and further determine the target object according to the preset lexicon.

For example. Suppose that the target image includes an english article and the preset word bank selected by the user is a four-six level word bank. Then when determining the target object, the english article may be extracted from the target image and split into multiple english words. Then, whether each English word in the English words belongs to a word in the four-six level word bank can be sequentially judged. If a word belongs to the four-six level thesaurus, the word can be determined as the target object.

In a fourth possible implementation, the target translation mode is a preset scene translation mode. As can be seen from the foregoing description, when the application scene corresponding to the target image is a preset scene, it may be determined that the target translation mode is a preset scene translation mode. Then, in order to translate the object to be translated in the specific scene, a preset rule may be obtained, and the target object may be obtained from the target image according to the preset rule. The preset rule can be preset by a technician and indicates which parts in the target image need to be translated in a preset scene.

For example. Assume that a preset scene translation mode for translating a commodity price tag is selected as a target translation mode, and the commodity price tag in the target image describes a commodity name, a commodity introduction, a price, and a price unit. The commodity profile, price and price unit in the target image may be determined as the target object. In this way, when translating the target object, not only the commodity introduction can be translated, but also the price marked on the commodity can be converted into the price which is convenient for the user to understand. For example, assuming that the nationality of the user is China and the price tags of the goods are labeled in U.S. dollars, the method can automatically convert the U.S. dollars into Chinese, so that the user has a clearer sense of the price of the goods.

In a fifth possible implementation, the target translation mode is a full-text translation mode. When the user selects the full-text translation mode as the target translation mode, it is described that any translatable object in the target image may be the target object. For example, in reading a document, all the text presented in the user's field of view may be what the user wants to translate. For this case, the user can select the full-text translation mode as the target translation mode. Accordingly, the AR device may determine any object capable of translation in the target image as the target object.

In a sixth possible implementation, the target translation mode is a video stream translation mode. When the user selects the video stream translation mode as the target translation mode, it indicates that the user needs to translate the target object contained in the video stream. For example, in a scene such as a live or video conference scene, a user may receive a video data stream through an AR device and wish to translate words or words included in the video data stream in real time. Of course, in a non-live scene, the user may also translate the video data through the AR device.

As can be seen from the foregoing description, the target image in the video stream translation mode is obtained from the video data stream received by the AR device. Then the object in the target image in the first language may be determined as the target object when the target object is determined from the target image. The first language may be a language selected by a user for use with a target object to be translated.

In some possible implementation manners, when the target translation mode is the video stream translation mode, the AR device may further obtain audio data in the video data stream, translate the audio data, and display the translated audio data through a display unit of the AR device.

S204: and translating the target object to obtain a translated target object.

After the target object is determined, the target object may be translated to obtain a translated target object.

Specifically, the first language and the second language may be determined first. The first language is a language used by a target object to be translated, and the second language is a language used by the translated target object. Optionally, the first language and/or the second language may be determined according to a language selection operation of a user, or the first language and/or the second language may be determined according to a system language used by the AR device. For example, assuming that the user sets the system language used by the AR system to chinese, chinese may be determined as the second language and other languages other than chinese may be determined as the first language.

After determining the first language and the second language, the target object may be translated from the first language to the second language.

In practical application scenarios, one target object may be written in two positions in the target image, respectively, limited by the display position, for example, a long sentence may be displayed across lines because the number of words that can be displayed per line is limited. For this case, translation ambiguity may occur if translation is performed using a conventional translation method. In order to improve the accuracy of translation, the target object may be semantically divided, the target object may be divided into at least one relatively independent target short sentence, and then each target short sentence in the at least one target short sentence may be translated. Therefore, the target object is translated after being divided according to the semantics, the integrity of the target object can be ensured, one sentence to be translated cannot be split into two sentences for translation, and the translation accuracy is ensured.

S205: and displaying the translated target object through a display unit of the AR device.

After the translated target object is obtained, the translated target object may be displayed through a display unit of the AR device. Specifically, the target image and the translated target object may be superimposed, and the target image on which the target object is superimposed may be displayed. In this way, the user can see both the external environment (i.e., the target image) that actually exists and the virtual translated target object through the display unit of the AR device.

The specific display method will be described in detail below.

The background color and/or font size of the translated target object may also be determined prior to displaying the translated target object. Among them, the background color of the translated target object may be referred to as a target background color, and the font size of the translated target object may be referred to as a target font size. These two cases will be described separately below.

First, a method of determining the color of the background of the object will be described.

In the embodiment of the present application, a target background color used for displaying the translated target object may be determined before displaying the translated target object. Specifically, the processing unit of the AR device may determine the background color of the target object before translation according to the target image, and then determine the target background color according to the background color of the target object before translation, so as to display the translated target object by using the target background color as the background color of the translated target object. Alternatively, the target background color may be similar to or opposite to the background color of the target object before translation, depending on the purpose of display. These two cases will be described separately below.

In a first possible implementation, for example, in a full-text translation mode or a regional translation mode, the user may prefer to fuse the translated target object in the target image for display. Then, a color close to the background color of the target object before translation may be determined as the target background color, for example, a color whose difference from the background color of the target object before translation is smaller than a first threshold may be determined as the target background color.

In some possible implementations, the background color of the target object before translation may be obtained and averaged over a Red (Red, R) value, a Green (Green, G) value, and a Blue (Blue, B) value, respectively, and the calculated average of the R, G, and B values may be used as the R, G, and B values of the target background color, respectively.

In a second possible implementation, for example in a keyword translation mode or a regional translation mode, the user may tend to highlight the translated target object in the target image. Then, a color that differs more from the background color of the target object before translation may be determined as the target background color, and for example, a color that differs more than a first threshold value from the background color of the target object before translation may be determined as the target background color.

In some possible implementations, a background color of the target object before translation may be obtained, and average values of the background color on a Red (Red, R) value, a Green (Green, G) value, and a Blue (Blue, B) value are respectively obtained, and then difference is made between 255 and the average value of the R value, the average value of the G value, and the average value of the B value, and a result obtained by the difference is used as the R value, the G value, and the B value of the target background color.

It should be noted that, when a color having a large difference from the background color of the target object before translation is used as the target background color, in order to ensure that the user can accurately see the target object after translation, the AR device may use a color having a large difference from the target background color as the display color of the target object. For example, assuming that the display color of the target object in the target image is black and the background color is white, the AR device may display the translated target object with the target background color being black and the display color of the translated target object being white.

In an actual application scenario, the target image may include a plurality of target objects with different background colors, for example, the target image may include a target object a and a target object B, where the background color of the target object a is black and the background color of the target object B is white. It may be determined that the target background color may be gray directly from the background colors of the target object a and the target object B, and the display effect of displaying the translated target object using gray may be poor.

To solve this problem, a target background color corresponding to each of a plurality of target objects having different background colors may be determined, respectively. Specifically, the target image may be divided into at least one region according to the background color of the target object, the background color of the target object in each region is similar, for example, the difference between the background colors of the target objects in the same region is smaller than the second threshold. Alternatively, the second threshold may be the same as the first threshold. Then, the target background color of each of the at least one region may be determined according to the background color of the target object in the region, so that the translated target object belonging to the region is displayed according to the target background color corresponding to the region.

The method of determining the target font size is described below.

In this embodiment, the target font size used for displaying the translated target object may be determined before displaying the translated target object. Specifically, the processing unit of the AR device may determine the font size of the target object before the translation from the target image, and then determine the target font size from the font size of the target object before the translation, so as to display the translated target object with the target font size as the font size of the translated target object. Alternatively, the target font size may be similar to or opposite to the font size of the target object before translation, depending on the purpose of display. These two cases will be described separately below.

In a first possible implementation, for example, in a full-text translation mode or a regional translation mode, the user may prefer to fuse the translated target object in the target image for display. Then, a size close to the font size of the target object before translation may be determined as the target font size, and for example, a size having a difference smaller than a third threshold from the font size of the target object before translation may be determined as the target font size.

In a second possible implementation, for example in a keyword translation mode, the user may tend to highlight the translated target object in the target image. Then, a size that differs from the font size of the target object before translation by a large amount may be determined as the target font size, and for example, a size that differs from the font size of the target object before translation by more than a second threshold may be determined as the target font size.

In an actual application scenario, the target image may include a plurality of target objects with different font sizes, and then the target font size corresponding to each of the target objects with different font sizes may be determined, for example, the target image may be divided into at least one region according to a fourth threshold, and the target font size in each region may be determined. The process is similar to the process of determining the target background colors of the plurality of regions, and is not described herein again.

In some possible implementations, before displaying the translated target object, the AR device may further determine information such as a display color and a display font of the translated target object, and the used method may refer to the foregoing description about determining the target background color and the target font size, which is not described herein again.

In some possible implementations, the AR device may also display annotation information based on the user's operation. For example, the AR device may capture a gesture operation of the user through an image capture device such as a camera. If the gesture operation of the user is matched with the gesture operation stored in advance, the fact that the user wants to add the label to the target object can be determined. Then, when displaying the translated target object, the AR device may display the user-added annotation, for example, may display the user-added annotation at a position other than the position of the target object, and indicate that the annotation corresponds to the translated target object through the association identifier.

In the embodiment of the present application, the translated target object may be displayed in an overlaying manner, or may be displayed in a moving manner. These two cases will be described separately below.

In a first possible implementation, the AR device may display the translated target object in an overlaid manner, that is, the translated target object is displayed at a position on the target image where the target object is located. Specifically, the background of the translated target object may be set to be opaque, and the translated target object may be displayed by being superimposed on the target object before translation (i.e., the aforementioned target object to be translated). In this way, since the background of the translated target object is opaque, the user can see the translated target object on the display unit of the AR device without seeing the target object before translation.

Further, as can be seen from the foregoing description, the target object may be a translatable object in the target image, may be a translatable object in a partial region of the target image, and may also be a keyword in the target image that meets the user's requirements. Then, when the target translation mode is the regional translation mode or the keyword translation mode, the translated target object may be displayed in a position overlapping the target object before translation, and other untranslated objects may be kept unchanged.

For example, assume that the AR device is AR glasses, the display unit of the AR device is a lens of the AR glasses, and the target translation mode is a regional translation mode. The processing unit may display the translated target object at a position on the lens corresponding to the target object and control other positions of the lens to remain transparent.

Specifically, a scene in which the user uses the AR glasses for translation may be as shown in fig. 3. Included in fig. 3 are target image 310, display image 320, eye 330, and imaging image 340. Wherein the target image 310 is a scene that the user sees when not wearing AR glasses; display image 320, the image actually displayed by the AR glasses; eyeball 330 is a schematic view of a user's eyeball; the imaging image 340 is a result obtained by imaging through the eyeball 330 after the target image 310 and the display image 320 are superimposed, that is, an image that a user actually sees after wearing AR glasses.

It is assumed that the target image 310 includes a target object 311 before translation, a pattern a312, a pattern B313, and a pattern C314. After translating the target object, the AR glasses may display the translated target object 321 in the display image 320. Wherein the relative position between the translated target object 321 and the display area 320 coincides with the relative position between the pre-translation target object 311 and the target image 310.

In this way, when the user views the external environment by wearing the AR glasses, the imaged image 340 can be seen, and the translated target object 341, the pattern a342, the pattern B343, and the pattern C344 are included in the imaged image 340. Therefore, the translated target object can be displayed in a covering mode at the position of the target object on the target image, and the user can see the translated target object on the basis of ensuring that the user can normally observe the external environment.

In a second possible implementation, the AR device may move to display the translated target object. The moving display means displaying the translated target object at a position different from the target object before translation, that is, displaying the translated target object at a position other than the target object in the target image.

Specifically, the display position of the translated target object may be determined from the target image. For example, a position adjacent to the position on the target image where the target object before translation is located may be determined as the display position of the target object after translation; it is also possible to recognize key information in the target image and to take a position not including the key information as a display position of the translated target object. For example, assuming that the target object is a piece of text written in a paper book, a blank page of the paper book may be determined as a display position of the translated target object.

After determining the display position of the translated target object, the translated target object may be displayed at a corresponding position on the display unit of the AR device. In order to embody the association relationship between the target objects before and after translation, an association identifier can be displayed between the target object after translation and the target object before translation, so that the user is prompted through the association identifier that the target object after translation is obtained by translating the target object before translation. For example, the AR device may add an underline on the display unit below the target object before translation, the other end of the underline being connected to the display position of the target object after translation.

For example, assume that the AR device is AR glasses, the display unit of the AR device is a lens of the AR glasses, and the target translation mode is a keyword translation mode. The processing unit may display the translated target object and associated identification on the lens and control other positions of the lens to remain transparent.

Specifically, the scene translated by the user using the AR glasses may be further shown in fig. 4, where fig. 4 includes a target image 410, a display image 420, an eyeball 430, and an imaging image 440. Wherein, the target image 410 is a scene seen by the user when the user does not wear the AR glasses; display image 420, which is the image actually displayed by the AR glasses; eyeball 430 is a schematic view of a user's eyeball; the imaging image 440 is a result of imaging through the eyeball 430 after the target image 410 and the display image 420 are superimposed, that is, an image that a user actually sees after wearing AR glasses.

It is assumed that the target image 410 that can be seen in the user's field of view includes a word a411 before translation, a word B412 before translation, a pattern a413, and a pattern B414. The AR glasses may display the translated word a421, the translated word B422, the identification information a423, and the identification information B424 in the display image 420 after translating the target object. Wherein neither the translated word a421 nor the translated word B422 overlaps or intersects with the display position of any element in the target image 410.

In this way, the imaged image 440 can be seen through when the user views the external environment while wearing the AR glasses. The imaged image 440 includes a word a441 before translation, a word B442 before translation, a pattern C443, a pattern D444, a word a445 after translation, a word B446 after translation, identification information a447, and identification information B448. In this way, the user can determine that the translated word a445 is the result of the translation of the word a441 before the translation based on the identification information a447 and that the translated word B446 is the result of the translation of the word B442 before the translation based on the identification information B448 without affecting the normal display of other parts in the target image.

As can be seen from the foregoing description, the target translation mode may be a video stream translation mode. Then, when the target translation mode is the video stream translation mode, the AR device may play the video stream in the first area of the display unit and present the translated target object in the second area of the display unit. Assuming that a video stream received by the AR device is a live data stream, where the live data stream includes a target object to be translated, the AR device may display the live data stream in a display unit, and display the translated target object in a position corresponding to the target object before translation in an overlaying manner.

Fig. 5 is a schematic structural diagram of a translation apparatus provided in an embodiment of the present application, where the embodiment may be applied to a case of translating a target object through AR glasses, and the apparatus specifically includes: a receiving unit 510, a processing unit 520 and a display unit 530.

A receiving unit 510, configured to acquire a target image through the AR device.

A processing unit 520 for selecting a target translation mode from a plurality of translation modes; determining a target object from the target image according to the target translation mode; and translating the target object to obtain a translated target object.

A display unit 530, configured to display the translated target object through a display unit of the AR device.

The translation device provided by the embodiment of the disclosure can execute the translation method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects for executing the translation method.

It should be noted that, in the foregoing embodiment, each included unit and each included module are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present disclosure.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., AR glasses with translation) 600 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in fig. 2. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

The electronic device provided by the embodiment of the present disclosure and the translation method provided by the embodiment of the present disclosure belong to the same inventive concept, and technical details that are not described in detail in the embodiment of the present disclosure may be referred to the embodiment of the present disclosure, and the embodiment of the present disclosure have the same beneficial effects.

The disclosed embodiments provide a computer storage medium having stored thereon a computer program that, when executed by a processor, implements the translation method provided by the above-described embodiments.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target image through the AR equipment; selecting a target translation mode from a plurality of translation modes; determining a target object from the target image according to the target translation mode; translating the target object to obtain a translated target object; and displaying the translated target object through a display unit of the AR device.

Computer readable storage media may be written with computer program code for performing the operations of the present disclosure in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a cell does not in some cases constitute a limitation on the cell itself, for example, an editable content display cell may also be described as an "editing cell".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, [ example one ] there is provided a translation method applied to an Augmented Reality (AR) device, the method including:

acquiring a target image through the AR equipment;

selecting a target translation mode from a plurality of translation modes;

translating the target object to obtain a translated target object;

displaying the translated target object through a display unit of the AR device

According to one or more embodiments of the present disclosure, [ example two ] there is provided a translation method, further comprising: optionally, the multiple translation modes include a regional translation mode, and the determining a target object from the target image according to the target translation mode includes:

in response to the target translation mode being the regional translation mode, determining a gazing point position of the user on the target image according to the regional translation mode;

and determining the target object from the target image according to the gazing point position.

According to one or more embodiments of the present disclosure, [ example three ] there is provided a translation method, further comprising: optionally, the determining the target object from the target image according to the gaze location comprises:

determining a first image according to the target image and the gazing point position, wherein the first image is an image which is positioned in a first area around the gazing point position on the target image;

identifying a target object included in the first image.

According to one or more embodiments of the present disclosure, [ example four ] there is provided a translation method, further comprising: optionally, the multiple translation modes include a keyword translation mode, and the determining a target object from the target image according to the target translation mode includes:

responding to the target translation mode as the keyword translation mode, and selecting one or more candidate words from the target image according to the keyword translation mode;

determining the one or more candidate words as the target object.

According to one or more embodiments of the present disclosure, [ example five ] there is provided a translation method, the method further comprising: optionally, the multiple translation modes include a preset scene translation mode, and the determining a target object from the target image according to the target translation mode includes:

in response to the target translation mode being the preset scene translation mode, identifying the target image according to the preset scene translation mode, and determining that the target image comprises a target object;

and determining the target object according to the target object.

According to one or more embodiments of the present disclosure, [ example six ] there is provided a translation method further comprising: optionally, the multiple translation modes include a full-text translation mode, and the determining a target object from the target image according to the target translation mode includes:

and determining all translatable objects in the target object as the target object.

According to one or more embodiments of the present disclosure, [ example seven ] there is provided a translation method, the method further comprising: optionally, the translation mode further includes a video stream translation mode, and the determining a target object from the target image according to the target translation mode includes:

and responding to the translation mode as the video stream translation mode, receiving a video stream, and taking each frame of picture in the video stream as the target object to be translated.

According to one or more embodiments of the present disclosure, [ example eight ] there is provided a translation method, further comprising: optionally, the determining a target translation mode from the plurality of translation modes comprises:

acquiring translation mode selection operation triggered by a user on a translation control, and determining a target translation mode from multiple translation modes according to the translation mode selection operation;

and/or receiving a voice instruction input by a user, and determining a target translation mode from a plurality of translation modes according to the voice instruction.

According to one or more embodiments of the present disclosure, [ example nine ] there is provided a translation method, the method further comprising: optionally, the translating the target object includes:

performing semantic division on the target object to obtain at least one target short sentence;

and translating each target short sentence in the at least one target short sentence to obtain the translated target object.

According to one or more embodiments of the present disclosure, [ example ten ] there is provided a translation method, the method further comprising: optionally, the displaying, by the display unit of the AR device, the translated target object includes:

displaying the translated target object on a display unit of the AR device at a location other than the location where the target object is located;

and displaying an association identifier between the translated target object and the target object to be translated.

According to one or more embodiments of the present disclosure, [ example eleven ] there is provided a translation method, the method further comprising: optionally, the method further comprises:

acquiring a label adding operation triggered by a user;

and displaying the label information on a display unit of the AR equipment according to the label adding operation.

According to one or more embodiments of the present disclosure, [ example twelve ] there is provided a translation method, the method further comprising: optionally, before displaying the translated target object through a display unit of the AR device, the method further includes:

determining a target background color according to the background color of the target object;

the displaying the translated target object comprises:

and displaying the target background color as the background color of the translated target object.

According to one or more embodiments of the present disclosure, [ example thirteen ] there is provided a translation method, further comprising: optionally, the determining a target background color according to the background color of the target object includes:

dividing the target image into at least one region according to the background color of the target object, wherein the difference between the background colors of the target objects in each region is smaller than a preset color threshold value;

and determining the target background color of each region according to the background color of the target object in the region.

According to one or more embodiments of the present disclosure, [ example fourteen ] there is provided a translation method, the method further comprising: optionally, before displaying the translated target object through a display unit of the AR device, the method further includes:

determining the size of a target font according to the size of the target font;

the displaying the translated target object comprises:

and displaying the target font size as the font size of the translated target object.

According to one or more embodiments of the present disclosure, [ example fifteen ] there is provided a translation method, the method further comprising: optionally, the determining a target font size according to the font size of the target object includes:

dividing the target image into at least one area according to the font size of the target object, wherein the difference between the font sizes of the target object in each area is smaller than a preset size threshold value;

and determining the target font size of the region according to the font size of the target object in each region in the at least one region.

According to one or more embodiments of the present disclosure, [ example sixteen ] there is provided a translation apparatus applied to an AR device, including:

a receiving unit configured to acquire a target image through the AR device;

According to one or more embodiments of the present disclosure, [ example seventeen ] there is provided an electronic device comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a translation method as described in any of the embodiments of the present application.

According to one or more embodiments of the present disclosure, [ example eighteen ] there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a translation method as described in any of the embodiments of the present application.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A translation method is applied to an AR (augmented reality) device and comprises the following steps:

acquiring a target image through the AR equipment;

selecting a target translation mode from a plurality of translation modes;

translating the target object to obtain a translated target object;

2. The method of claim 1, wherein the plurality of translation modes includes a regional translation mode, and wherein determining a target object from the target image according to the target translation mode comprises:

3. The method of claim 2, wherein the determining the target object from the target image according to the gaze location comprises:

identifying a target object included in the first image.

4. The method of claim 1, wherein the plurality of translation modes includes a keyword translation mode, and wherein determining a target object from the target image according to the target translation mode comprises:

determining the one or more candidate words as the target object.

5. The method of claim 1, wherein the plurality of translation modes includes a preset scene translation mode, and wherein determining a target object from the target image according to the target translation mode comprises:

and determining the target object according to the target object.

6. The method of claim 1, wherein the plurality of translation modes includes a full-text translation mode, and wherein determining a target object from the target image according to the target translation mode comprises:

7. The method of claim 1, wherein the translation mode further comprises a video stream translation mode, and wherein determining a target object from the target image according to the target translation mode comprises:

8. The method of claim 1, wherein determining the target translation mode from the plurality of translation modes comprises:

9. The method of claim 1, wherein translating the target object comprises:

10. The method of claim 1, wherein the displaying, by a display unit of the AR device, the translated target object comprises:

11. The method of claim 1, further comprising:

acquiring a label adding operation triggered by a user;

12. The method of claim 1, wherein prior to displaying the translated target object via a display unit of the AR device, the method further comprises:

the displaying the translated target object comprises:

13. The method of claim 12, wherein determining a target background color from the background color of the target object comprises:

14. The method of claim 1, wherein prior to displaying the translated target object via a display unit of the AR device, the method further comprises:

determining the size of a target font according to the size of the target font;

the displaying the translated target object comprises:

15. The method of claim 14, wherein determining a target font size based on the font size of the target object comprises:

16. A translation apparatus, applied to an AR device, includes:

a receiving unit configured to acquire a target image through the AR device;

17. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a translation method as claimed in any one of claims 1-15.

18. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the translation method according to any one of claims 1 to 15.