CN112269467A

CN112269467A - Translation method based on AR and AR equipment

Info

Publication number: CN112269467A
Application number: CN202010775242.3A
Authority: CN
Inventors: 罗亚军; 刘德生; 高自磊
Original assignee: Shenzhen Hongxiang Optical Electronics Co ltd
Current assignee: Shenzhen Hongxiang Optical Electronics Co ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2021-01-26

Abstract

The invention provides an AR-based translation method and AR equipment, which can realize local translation of a whole page document, meet the translation requirements of a user better and improve the user experience. The translation method comprises the following steps: the AR equipment determines target gesture operation input by a user, determines a first region to be translated according to the target gesture operation, translates characters included in the first region to be translated and outputs a translation result; the first region to be translated is a partial region or a whole region of the first region, and the first region is a whole region of the first document.

Description

Translation method based on AR and AR equipment

Technical Field

The invention belongs to the technical field of Augmented Reality (Augmented Reality), and particularly relates to a measure for assisting translation.

Background

Currently, there are many translation software, and these translation software mainly translate whole documents or whole pages of documents. However, sometimes the user simply does not know the meaning of a part of words in the whole page document, in which case the whole page document does not need to be translated. To this end, a need for local translation is addressed. How to realize local translation is a technical problem which needs to be solved urgently at present.

Disclosure of Invention

The invention provides an AR-based translation method and AR equipment, which can realize local translation of a whole page document, meet the translation requirements of a user better and improve the user experience.

In a first aspect, an embodiment of the present invention provides an AR-based translation method, which may be executed by an AR device or an AR chip disposed in the device. The method comprises the following steps: the AR equipment determines target gesture operation input by a user, determines a first region to be translated according to the target gesture operation, translates characters included in the first region to be translated and outputs a translation result; the first region to be translated is a partial region or a whole region of the first region, and the first region is a whole region of the first document.

Optionally, before the AR device determines the target gesture operation input by the user, the method further includes:

periodically acquiring a plurality of first images, wherein partial images or all images in the plurality of first images comprise target gesture operation performed by a user;

and respectively matching the plurality of first images with a plurality of preset gesture operations, and selecting at least one target image from the plurality of first images, wherein the matching degree of the target gesture operation included in the target image and the preset gesture operation is greater than or equal to a first preset threshold value.

Optionally, the determining, by the AR device, the first region to be translated according to the target gesture operation includes:

extracting a first area where the target gesture operation is located in the target image;

determining whether the working mode of the AR device is an association mode;

if the working mode of the AR equipment is the association mode, determining that the first area to be translated is larger than the first area; otherwise, determining the first region to be translated as the first region.

Optionally, after the AR device determines the first region to be translated according to the target gesture operation, the method further includes:

outputting and displaying the first region to be translated;

receiving a scaling operation of a user for the first to-be-translated region;

and determining a second region to be translated according to the scaling operation, wherein the second region to be translated is obtained by the scaling operation of the first region to be translated.

Optionally, before determining whether the operating mode of the AR device is the association mode, the method further includes: and receiving a selection instruction input by a user, wherein the selection instruction is used for selecting to enter the association mode.

In a second aspect, an AR device is provided, the AR device including a camera, the AR device further including:

the first determination module is used for determining target gesture operation input by a user;

the second determining module is used for determining a first region to be translated according to the target gesture operation, wherein the first region to be translated is a partial region or a whole region of the first region, and the first region is a whole region of the first document;

and the translation module is used for translating the characters included in the first region to be translated and outputting a translation result.

Optionally, the camera is configured to periodically acquire a plurality of first images, where a partial image or all images in the plurality of first images include a target gesture operation performed by a user;

the first determining module is specifically configured to match the multiple first images with multiple preset gesture operations respectively, and select at least one target image from the multiple first images, where a matching degree between a target gesture operation included in the target image and a preset gesture operation is greater than or equal to a first preset threshold.

Optionally, the second determining module is specifically configured to:

determining whether the working mode of the AR device is an association mode;

Optionally, the second determining module is further configured to:

outputting and displaying the first region to be translated;

receiving a scaling operation of a user for the first to-be-translated region;

Optionally, the first determining module is further configured to: and receiving a selection instruction input by a user, wherein the selection instruction is used for selecting to enter the association mode.

For technical effects of the AR device provided by the second aspect, reference may be made to technical effects of the foregoing implementation manners of the first aspect, and details are not described here.

In a third aspect, an AR device is provided, comprising:

at least one processor, and

a memory coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method of any one of the first aspect by executing the instructions stored by the memory.

The technical effects of the AR device provided in the embodiment of the present invention may refer to the technical effects of the implementation manners of the first aspect, and are not described herein again.

In a fourth aspect, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the first aspects.

The embodiment of the invention can translate the contents to be translated, namely perform local translation, based on the contents to be translated positioned by the AR equipment, and can meet the actual translation requirements of users.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention;

fig. 2 is a schematic flowchart of a translation method based on an AR device according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a gesture according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a gesture according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating a gesture according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an AR device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an AR device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

With the advent of more and more translation software, convenience is provided for users to read documents. However, most translation software translates the whole document or the whole page of document, and cannot realize local translation according to the requirements of users. For example, when a user reads a document, only a small part of words or sentences may affect the reading of the document, and the full-text translation obviously wastes time and is unnecessary. Or after full-text translation, the user needs to search the part of the user needing translation from the translated document, namely, further screening is needed, which is tedious.

Further, current translation software performs local translation requiring a user to input content to be translated to a specific location (also referred to as a translation box) on an interface provided by the translation software. However, if the content to be translated by the user is located on the paper material, the content to be translated needs to be manually input into the translation box, or the paper file needs to be scanned and converted into an electronic file through other electronic equipment for the user to use, which is also tedious.

In view of this, the embodiment of the present invention provides a translation method based on an AR technology, which can translate according to the content to be translated located by a user, thereby meeting the actual translation requirement of the user.

The technical scheme provided by the embodiment of the invention is described below by combining the accompanying drawings.

Please refer to fig. 1, which is an application scenario to which the translation method provided by the embodiment of the present invention is applied. The scene includes an AR device and a carrier of the document to be translated. The AR device may scan the document to be translated, may scan the entire area of the document to be translated, or may scan a local area of the document to be translated. It should be understood that, in a perspective of which region of the document to be translated needs to be translated, the AR device may scan the corresponding region, and in this regard, the region of the document to be translated scanned by the AR device may be referred to as a region to be translated, and the region to be translated may be the entire region of the document to be translated or may be a local region of the document to be translated.

In some embodiments, the AR device may be AR glasses, an AR helmet, or the like, and in the following description, the AR device is an AR glasses as an example. It should be noted that the AR device in the embodiment of the present invention is not limited to the AR glasses, as long as the user can view the document to be translated through the AR device. It should be understood that the AR device may be provided with an image capture device, such as a camera, and may also be provided with a display screen to display the document to be translated.

Referring to fig. 2, a schematic flow chart of a translation method provided in an embodiment of the present invention is shown, where the method may be executed by an AR glasses, and a specific flow of the method is described as follows.

S201, receiving gesture operation input by a user through the AR device, wherein the gesture operation can be used for indicating a translation area in a first document.

In the embodiment of the present invention, the translation region may be the entire region of the first document or may be a partial region of the first document. The AR glasses may scan the first document such that the user may view the entire first document through the AR glasses. If the user needs to translate a part of the content in the first document, a gesture operation may be performed on the first document to specify the content (translation area) to be translated from the first document.

Specifically, the user may turn on the AR device to view the document to be translated, and while viewing the first document, if the user wants to select the translation region from the document, the user may perform a gesture operation on the first document through the AR device, and the gesture operation may be used to indicate the translation region. The AR device generates a recognition instruction according to the gesture operation performed by the user, and executes the recognition instruction, so that the translation area corresponding to the gesture operation can be recognized.

The gesture operation mode for the user to select the translation area can be various. For example, referring to fig. 3, the user can directly define a certain area by combining the thumb with other fingers (fig. 3 is illustrated by a bold cloud coil). The area where the area is located can be regarded as a translation area, and the operation of the user for defining a certain area is a gesture operation for selecting the translation area. Alternatively, referring to fig. 4, the user may also perform a relatively far-away sliding operation with two fingers, and then the area where the sliding track formed by the two fingers (illustrated by the bold cloud coil in fig. 3) is located is in the translation, and then the sliding operation performed by the user is a gesture operation for selecting the translation area. Or, the user may also perform a sliding operation with a certain finger, and a sliding track formed by the sliding operation may define a certain area (illustrated by a bold cloud coil in fig. 5), where the area is located in the translation area, and the sliding operation performed by the user is a gesture operation for selecting the translation area.

For another example, the user may also click a certain position of the first document, and if the time length that the user clicks the certain position is longer than or equal to the preset time length, it may be considered that the user needs to translate the whole page of the first document, at this time, the translation area is the whole area of the whole page of the first document, and the click operation performed by the user is a gesture operation for selecting the translation area. The preset duration may be a preset fixed value, such as 5S, or may be other possible values.

S202, the AR equipment determines target gesture operation input by the user.

It should be understood that during the process of the user viewing the first document through the AR device, the user may perform various gesture operations, that is, the user may perform other gesture operations besides the translation gesture operation, for example, a gesture operation of turning pages, and the like.

For this reason, in the embodiment of the present invention, the AR device may periodically acquire a plurality of first images, and then some of the plurality of first images include a gesture operation for translation, and some of the plurality of first images may include a gesture operation other than translation. For convenience of description, the gesture operation for translation is referred to as a target gesture operation in the text. The AR device may select an image having a target gesture operation from the plurality of first images.

For example, a plurality of gesture operations may be preset, and these gesture operations may be regarded as gesture operations for translation, for example, gesture operations illustrated in fig. 3 to 5 and illustrated by bold cloud coils. The AR device can extract the gesture operation included in each first image, match the gesture operation with a preset gesture operation, and if the matching degree is larger than or equal to a preset threshold value, the gesture operation included in the first image can be regarded as a target gesture operation. Conversely, if the matching degree is less than the preset threshold, the gesture operation included in the first image is not the target gesture operation. The AR device may then sift out the plurality of first images for a first image that includes the target gesture operation.

S203, positioning a region to be translated in the first document according to the target gesture operation.

It should be understood that the region to be translated is typically a regular region, such as a rectangular region or the like. But the gesture operations performed by the user typically delineate irregular regions. To this end, in the embodiment of the present invention, the AR device may locate the region to be translated in the first document according to the recognized target gesture operation.

For example, with continued reference to fig. 3-5, the area circumscribed by the target gesture operation performed by the user is the area indicated by the bold cloud coil, and the AR device may determine that the area to be translated is greater than or equal to the area indicated by the bold cloud coil, so as to avoid losing the information to be translated.

As one example, horizontal increments and vertical increments may be pre-equalized, wherein the horizontal increments are increments in a first direction, the vertical increments are increments in a second direction, and the first and second directions are perpendicular. The AR device may extract a motion trajectory of the target gesture operation and delineate the first region according to the motion trajectory. And expanding the first region in a horizontal increment mode along the first direction and expanding the first region in a vertical increment mode along the second direction on the basis of the first region, and determining the first region expanded in the first direction and the second direction as a region to be translated.

Further, the AR device may output and display the region to be translated. Specifically, the AR device may output a region to be translated in a region where the target gesture operation is located, and the user may determine whether the region to be translated is appropriate. If the area to be translated is not suitable, the user can perform a scaling operation on the area to be translated. The zoom operation may be, for example, a sliding operation in a certain direction, and the AR device receives the zoom operation, zooms the region to be translated, and determines a final region to be translated. For convenience of description, the region to be translated determined by the AR device through the target gesture operation is referred to as a first region to be translated, and the finally determined region to be translated is referred to as a second region to be translated herein.

Or, generally speaking, the first region to be translated is small, some content to be translated may be missed. For this reason, the embodiment of the present invention may set an association pattern, which may be considered as content that is additionally located around the translation first region to be translated, in addition to the translation of the content of the region to be translated. For example, a plurality of ranges may be preset, and each range may be determined according to the distance from the center of the first region to be translated. After the AR device outputs the first to-be-translated region determined according to the target gesture operation, the user may input a selection instruction to the AR device to enter the association mode. The AR device, receiving the selection instruction, may output a plurality of ranges from which the user selects one range. It can also be understood that, when the AR device operates the determined region to be translated according to the target gesture, if the working mode of the AR device is detected as the association mode, the AR device may determine a final second region to be translated according to the range selected by the association mode and the initially determined first region to be translated.

And S204, translating the content of the second region to be translated and outputting a translation result.

After the AR device determines the second region to be translated, the content of the second region to be translated may be translated. Specifically, the AR device may intercept an image to be translated corresponding to the second region to be translated, and transmit the acquired image to an Optical Character Recognition (OCR) module provided in the AR device. It should be understood that the OCR module may process the image to be translated to obtain the content of the text and the like included in the image to be translated. The OCR module sends the obtained text content to a translation module, and the translation module can translate the received text content.

For example, the translation module may output prompt information for prompting the user to select a target language for translation, such as english-chinese translation, chinese-english translation, english-french translation, english-german translation, and the like, before translating the received text content. After the user can select the target language, the translation module translates the received text content according to the selection of the user and outputs a translation result.

The AR device may display the translation results. Specifically, the AR device may display the translation result in a preset target area. It should be understood that the preset target area may avoid blocking the viewing area as much as possible, for example, the preset target area may be located in a corner area of the display screen of the AR device, which may reduce blocking of the first document as much as possible, and improve user experience.

In the embodiment of the invention, the translation can be carried out based on the contents needing to be translated and positioned by the AR equipment, namely, the local translation is carried out, and the actual translation requirements of users can be met.

The following describes the apparatus provided by the embodiment of the present invention with reference to the drawings.

Referring to fig. 6, based on the same inventive concept, an embodiment of the present invention provides an AR device including a sensor such as a camera and an OCR module, and the VR device may include a first determining module 601, a second determining module 602, and a translating module 603. The first determining module 601, the second determining module 602, and the translating module 603 may implement related functional units through a hardware processor (hardware processor). Wherein the content of the first and second substances,

a first determining module 601, configured to determine a target gesture operation input by a user;

a second determining module 602, configured to determine a first region to be translated according to the target gesture operation, where the first region to be translated is a partial region or a whole region of a first region, and the first region is a whole region of a first document;

the translation module 603 is configured to translate the text included in the first region to be translated, and output a translation result.

the first determining module 601 is specifically configured to match the multiple first images with multiple preset gesture operations respectively, and select at least one target image from the multiple first images, where a matching degree between a target gesture operation included in the target image and a preset gesture operation is greater than or equal to a first preset threshold.

Optionally, the second determining module 602 is specifically configured to:

determining whether the working mode of the AR device is an association mode;

Optionally, the second determining module 602 is further configured to:

outputting and displaying the first region to be translated;

receiving a scaling operation of a user for the first to-be-translated region;

Optionally, the first determining module 601 is further configured to: and receiving a selection instruction input by a user, wherein the selection instruction is used for selecting to enter the association mode.

The AR device may be configured to execute the method provided in the embodiment shown in fig. 2, and therefore, for functions and the like that can be implemented by each functional unit in the AR device, reference may be made to relevant descriptions in the embodiment shown in fig. 2, which is not described in detail herein.

Referring to fig. 7, based on the same inventive concept, an embodiment of the present invention provides an AR device, which may include: at least one processor 701, wherein the processor 701 is configured to implement the steps of the method as shown in fig. 2 provided by the embodiment of the present invention when the processor 701 executes the computer program stored in the memory.

The processor 701 may specifically be a central processing unit, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits for controlling program execution, and is configured to control and manage actions of the AR device, and support the AR device to execute the method steps shown in fig. 2.

Optionally, the AR device further includes a Memory 702 connected to the at least one processor, and the Memory 702 may include a Read Only Memory (ROM), a Random Access Memory (RAM), and a disk Memory. The memory 702 is used for storing data required by the processor 701 in operation, that is, storing instructions executable by the at least one processor 701, and the at least one processor 701 executes the instructions stored in the memory 702 to perform the method shown in fig. 2. The number of the memory 702 is one or more. The memory 702 is shown in fig. 7, but it should be noted that the memory 702 is not an optional functional block, and is shown by a dotted line in fig. 7.

The entity devices corresponding to the first determining module 601, the second determining module 602, and the translating module 603 may be the processor 701. The cooking appliance may be used to perform the method provided by the embodiment shown in fig. 2. Therefore, regarding the functions that can be realized by each functional module in the device, reference may be made to the corresponding description in the embodiment shown in fig. 2, which is not repeated herein.

Embodiments of the present invention also provide a computer storage medium, where the computer storage medium stores computer instructions, and when the computer instructions are executed on a computer, the computer is caused to execute the method as described in fig. 1.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above technical solutions may essentially or partially contribute to the prior art and may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a read-only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in each embodiment or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An AR-based translation method, comprising:

the AR equipment determines a target gesture operation input by a user;

the AR equipment determines a first region to be translated according to the target gesture operation, wherein the first region to be translated is a partial region or a whole region of the first region, and the first region is a whole region of the first document;

and the AR equipment translates the characters included in the first area to be translated and outputs a translation result.

2. The method of claim 1, wherein prior to the AR device determining the target gesture operation of the user input, the method further comprises:

3. The method of claim 2, wherein the AR device determining a first region to be translated in accordance with the target gesture operation comprises:

determining whether the working mode of the AR device is an association mode;

4. The method of any of claims 1-3, wherein after the AR device determines the first region to be translated in accordance with the target gesture operation, the method further comprises:

outputting and displaying the first region to be translated;

receiving a scaling operation of a user for the first to-be-translated region;

5. The method of claim 1, wherein prior to determining whether the operating mode of the AR device is an association mode, the method further comprises:

and receiving a selection instruction input by a user, wherein the selection instruction is used for selecting to enter the association mode.

6. An AR device, the AR device comprising a camera, the AR device further comprising:

7. The AR device of claim 6, wherein the camera is to periodically capture a plurality of first images, wherein a portion or all of the plurality of first images comprise a target gesture operation by a user;

8. The AR device of claim 6, wherein said second determining module is specifically configured to:

determining whether the working mode of the AR device is an association mode;

9. The AR device of any of claims 6-8, wherein the second determination module is further to:

outputting and displaying the first region to be translated;

receiving a scaling operation of a user for the first to-be-translated region;

10. The AR device of claim 6, wherein the first determination module is further to: and receiving a selection instruction input by a user, wherein the selection instruction is used for selecting to enter the association mode.