CN114065783A

CN114065783A - Text translation method, device, electronic equipment and medium

Info

Publication number: CN114065783A
Application number: CN202111358100.8A
Authority: CN
Inventors: 蒋中博
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-02-18

Abstract

The disclosure provides a text translation method, a text translation device, electronic equipment and a text translation medium, relates to the technical field of computers, and particularly relates to the technical field of cloud mobile phones, cloud computing and cloud service. The specific implementation scheme is as follows: rendering video stream data acquired from cloud equipment to generate a current display interface; and according to the translation instruction of the current display interface, acquiring text information to be translated included in the current display interface, and translating the text information to be translated. The method and the device have the advantages that the effect of translating the rendered display interface of the cloud equipment is achieved, so that a user can still read smoothly when facing multilingual contents in the cloud equipment, the use experience of the cloud equipment is improved, and the customer retention rate of the cloud equipment is improved.

Description

Text translation method, device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a text translation method, an apparatus, an electronic device, and a medium, in particular, to the field of cloud phones, cloud computing, and cloud service technologies.

Background

Cloud equipment, as an emerging cloud service technology, is favored by more and more consumers because it is not constrained by local hardware conditions.

With the development of cloud device technology, cloud devices are beginning to adapt to application software in multiple languages.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device, and a medium for translating a rendered cloud device display interface.

According to an aspect of the present disclosure, there is provided a text translation method including:

rendering video stream data acquired from cloud equipment to generate a current display interface;

and according to the translation instruction of the current display interface, acquiring text information to be translated included in the current display interface, and translating the text information to be translated.

According to another aspect of the present disclosure, there is provided a text translation apparatus including:

the display interface generation module is used for rendering video stream data acquired from the cloud equipment to generate a current display interface;

and the translation module is used for acquiring the text information to be translated included in the current display interface according to the translation instruction of the current display interface and translating the text information to be translated.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the method of any one of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of some text translation methods disclosed in accordance with embodiments of the present disclosure;

FIG. 2 is a flow diagram of another disclosed method of text translation according to an embodiment of the present disclosure;

FIG. 3 is a schematic block diagram of some of the disclosed text translation devices according to embodiments of the present disclosure;

fig. 4 is a block diagram of an electronic device for implementing the text translation method disclosed in the embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the development process, the applicant finds that with the development of the cloud equipment technology, the cloud equipment is adapted to application software of multiple languages, such as game software, movie and television software, office software and the like. Particularly, foreign game software is preferred by vast young cloud equipment users, but most of the game software is not adapted to a domestic language pack, so that a domestic player cannot obtain better game experience due to language barrier in the game process, the use experience of the cloud equipment is indirectly influenced, and the client survival rate of the cloud equipment is reduced.

Fig. 1 is a flowchart of some text translation methods disclosed in an embodiment of the present disclosure, and this embodiment may be applied to a case of translating a cloud device display interface rendered by a client. The method of the present embodiment may be executed by the text translation apparatus disclosed in the embodiments of the present disclosure, and the apparatus may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capability.

As shown in fig. 1, the text translation method disclosed in this embodiment may include:

s101, rendering video stream data acquired from cloud equipment, and generating a current display interface.

The cloud equipment is equipment which applies a cloud computing technology to a server and realizes cloud service through the server, and can realize numerous functions through a network by means of a self-contained system and the server erected by a manufacturer. Types of cloud devices include, but are not limited to, cloud handsets, cloud computers, and the like. The video streaming data refers to video frame data received by the client, the video frame data is generated by the cloud equipment according to the current interface, for example, if the current interface of the cloud equipment is a game interface, the video streaming data is the video frame data of each game picture; for another example, if the current interface of the cloud device is a movie interface, the video stream data is video frame data of each movie picture.

In an embodiment, when a user performs a start operation on any application icon in a display interface of a client cloud device, for example, the user clicks the application icon, and the client responds to the start operation to generate a remote start instruction carrying an application identifier of a target application. The client represents an intelligent terminal device, and may be a smart phone, a smart watch, a tablet computer, a notebook computer or any electronic device with an intelligent operating system; the Application Identifier is an AID (Application Identifier), and has uniqueness, that is, any Application has a unique Application Identifier.

The client sends the remote starting instruction to the server, the server receives the remote starting instruction and analyzes the remote starting instruction to obtain the application identification, the application identification is matched with the application identification of each candidate application in the cloud equipment, and the target application is determined and started from the candidate applications according to the matching result.

After the target application is started, the server side obtains interface video data of the target application and sends each frame of interface video data to the client side in a video streaming mode. After receiving the video stream data from the server side, the client side renders the video stream data based on the graphics processor, further generates a current display interface, and displays the current display interface through the display equipment of the client side. For example, the server sequentially sends the game interface video data of the game application to the client according to the video stream mode. And the client renders the video data of each frame of game interface in sequence to display each frame of game interface in sequence.

S102, obtaining the text information to be translated included in the current display interface according to the translation instruction of the current display interface, and translating the text information to be translated.

The translation instruction represents a translation instruction for text information in the current display interface.

In one embodiment, the user performs a control operation on the current display interface of the client to trigger the client to generate the translation instruction.

Wherein the control operations include, but are not limited to: 1. the user performs touch operation on a preset area of the current display interface, including but not limited to clicking, double clicking or double clicking, and the like. And when the client identifies that the user implements the touch operation, triggering to generate a translation instruction. 2. The user implements a voice control instruction, such as 'translation', 'i want to see Chinese' or 'translate text', etc., on the client, after receiving the voice control instruction, the client performs voice recognition on the voice control instruction, and generates a translation instruction according to the voice recognition result. 3. The method comprises the steps that a user conducts gesture operation on a client, for example, "waving hands", "making a fist" or "erecting thumbs", and the like, the client collects gesture images of the user through a carried image collecting device, for example, a front camera, matches the gesture images of the user with a preset standard image, and if the gesture images of the user are determined to be matched with the standard image, a translation instruction is triggered to be generated.

After the translation instruction is obtained, the client generates interactive information on the current display interface according to the translation instruction, for example, "please determine a translation area", so as to remind a user to determine the translation area needing to be translated in the current display interface.

Within a preset time threshold of generating the interactive information on the current display interface, if the user selects a translation region from the current display interface based on a preset region selection operation, the selection may include, but is not limited to, dragging a selection box to select the translation region, or manually selecting the translation region as a selection region, and the like. And the client determines a translation area from the current display interface according to the area selection operation of the user. And if the user does not select the translation area within the preset time threshold of the interactive information generated by the current display interface, the client defaults to use the whole current display interface as the translation area.

After determining the translation area, the client continues to generate interactive information on the current display interface, for example, "please select the language you want to translate into", and present currently supported candidate languages to the user, and the user selects at least one of the candidate languages as the target language, wherein the candidate languages include, but are not limited to, chinese, english, korean, japanese, and the like. The client captures the image of the translation area to generate a translation image corresponding to the translation area, performs text recognition on the translation image based on a text recognition technology, and determines text information to be translated included in the translation image according to a text recognition result. Then, translating the text information to be translated according to the target language, wherein the translation includes but is not limited to uploading the text information to be translated to a translation server through network communication by a client, translating the text information to be translated into the target language based on the computing power of the translation server, and receiving a translation result sent by the translation server; or inputting the text information to be translated into the translation model through a translation model which is locally and pre-loaded by the client, translating the text information to be translated into the target language based on the translation model, and acquiring a translation result output by the translation model.

And after the translation result is obtained, the client displays the translation result in the current display interface, so that the user can read the translation result corresponding to each text message to be translated.

According to the method and the device, the video stream data acquired from the cloud equipment are rendered to generate the current display interface, the text information to be translated included in the current display interface is acquired according to the translation instruction of the current display interface, and the text information to be translated is translated, so that the effect of translating the rendered display interface of the cloud equipment is realized, a user can still read smoothly when facing multilingual contents in the cloud equipment, the use experience of the cloud equipment is improved, and the customer retention rate of the cloud equipment is improved.

Fig. 2 is a flowchart of other text translation methods disclosed according to the embodiments of the present disclosure, which are further optimized and expanded based on the above technical solutions, and may be combined with the above various alternative embodiments.

As shown in fig. 2, the text translation method disclosed in this embodiment may include:

s201, rendering video stream data acquired from cloud equipment, and generating a current display interface.

S202, triggering and generating the translation instruction according to touch operation of a suspension button in the current display interface.

Wherein the hover button is a pre-created function button for triggering a translation instruction. The user can set whether to start the function of the suspension button in the setting interface, and after the suspension button is started, the suspension button can be always displayed in the uppermost layer of the interface no matter what application the user uses in the client.

In one embodiment, the user opens the hover button in advance, and when the user needs to translate the current display interface, the user performs a touch operation on the hover button, including but not limited to clicking the hover button, double clicking the hover button, or pressing the hover button for a long time. And the client triggers and generates a translation instruction according to the touch operation of the user on the suspension button.

S203, acquiring an interface screenshot of the current display interface according to the translation instruction, and performing text recognition on the interface screenshot to acquire the text information to be translated.

In one embodiment, after the client obtains the translation instruction, an interface screenshot of the current display interface is generated according to the image data of the current display interface. And then, performing text recognition on the interface screenshot by adopting a preset text recognition algorithm, and determining text information to be translated contained in the interface screenshot.

Optionally, the step S203 of "obtaining the interface screenshot of the current display interface according to the translation instruction" includes:

calling a MediaProjecton interface according to the translation instruction, and creating a virtual screen through the MediaProjecton interface; and obtaining the image data of the current display interface according to the virtual screen, and rendering the image data to obtain an interface screenshot of the current display interface.

Among them, mediaproject denotes an API (Application Programming Interface) for recording or intercepting a screen. The virtual screen, i.e. virtualmenu, is used to encode the image data to generate a screenshot or a screen recording.

In one embodiment, the client first obtains the mediaproject manager, and obtains the mediaproject manager object through getsysteservice. Createsccreencapturelntent () is then called a dialog to ask whether the user authorizes the application to capture the screen. If the authorization is successful, the mediaproject interface is acquired.

After the mediaproject interface is acquired, a virtual window is created through the mediaproject interface, and parameters such as the name, the width, the height, the pixel density and the like of the virtual window are initialized. And after the initialization is finished, acquiring the image data of the current display interface through an ImageReader instruction, transmitting the image data to a Surface type buffer of a virtual screen, and rendering the image data to obtain an interface screenshot of the current display interface.

The method comprises the steps of calling a mediaproject interface according to a translation instruction, creating a virtual screen through the mediaproject interface, further obtaining image data of a current display interface according to the virtual screen, rendering the image data, and obtaining an interface screenshot of the current display interface, so that the effect of obtaining the interface screenshot of the current display interface is achieved, and a data base is laid for obtaining text information to be translated of the current display interface based on the interface screenshot subsequently.

Optionally, in S203, "performing text recognition on the interface screenshot to obtain the text information to be translated" includes:

carrying out bitmap format conversion on the interface screenshot, and inputting the interface screenshot after bitmap format conversion into a text recognition model; and acquiring the text information to be translated in the interface screenshot and the position information associated with the text information to be translated according to the output result of the text recognition model.

The bitmap format is referred to as bitmap format.

In one embodiment, the image format of the interface screenshot is converted into a bitmap format through image processing software preset by the client, wherein the image processing software includes but is not limited to Photoshop, Painter, Adobe Illustrator and the like. And inputting the interface screenshot after bitmap format conversion into a text Recognition model which is locally pre-established at the client, wherein the text Recognition model is an OCR (Optical Character Recognition) model, and the text Recognition model comprises a convolutional neural network model. OCR is carried out on the interface screenshot through the text recognition model, and according to the output result of the text recognition model, the text information to be translated in the interface screenshot and the position information corresponding to the text information to be translated are obtained, wherein the position information refers to the position of the text information to be translated in the image area of the interface screenshot.

The interface screenshot is subjected to bitmap format conversion, the interface screenshot after the bitmap format conversion is input into the text recognition model, and then the text information to be translated in the interface screenshot and the position information associated with the text information to be translated are obtained according to the output result of the text recognition model, so that the effects of identifying the text information to be translated based on the interface screenshot in the standard format are achieved, and the text recognition model is established locally at the client, so that the recognition is not required to be carried out by a network depending on a server, the problem that the recognition is slow or cannot be carried out due to large network delay is solved, and the reliability and timeliness of the text recognition are improved.

S204, inputting the text information to be translated into a translation model, and acquiring standard text information corresponding to the text information to be translated according to an output result of the translation model.

Wherein the translation model represents a machine translation model. The translation model in this embodiment includes, but is not limited to, a convolutional neural network translation model, and optionally a seq2seq model.

In one implementation mode, the acquired text information to be translated is input into a translation model which is locally pre-established at a client, the translation model is further controlled to translate the text information to be translated according to the target language, and standard text information in the target language form corresponding to the text information to be translated is acquired according to an output result of the translation model.

S205, generating a standard text image corresponding to the standard text information, and covering the standard text image in the interface screenshot according to the position information associated with the text information to be translated.

In one embodiment, the standard text information is rendered into a standard text image in an image format by using a preset rendering technology, for example, the standard text information is rendered into the standard text image by using a canvas technology, and the standard text image is covered on a position corresponding to the position information in the screenshot of the interface according to the determined position information associated with the text information to be translated in the screenshot of the interface. For example, assuming that the text information to be translated is "applet", the associated position information in the interface screenshot is (X, Y), and the standard text information corresponding to "applet" is "apple", the standard text image corresponding to "apple" is overlaid on the (X, Y) position in the interface screenshot.

According to the method and the device, the translation instruction is triggered and generated according to the touch operation of the suspension button in the current display interface, so that the translation instruction is triggered and generated quickly, and the translation efficiency is improved; the interface screenshot of the current display interface is obtained according to the translation instruction, and text recognition is carried out on the interface screenshot to obtain the text information to be translated, so that the effect of determining the text information to be translated in a text recognition mode on the interface screenshot is realized, and a data base is laid for subsequent translation; the standard text information corresponding to the text information to be translated is acquired according to the output result of the translation model, so that the effect of determining the standard text information through the translation model is realized, and because the translation model is established locally at the client, the text translation is not required to be carried out through a network depending server, the problem that the text translation is slow or cannot be carried out due to large network delay is avoided, and the reliability and timeliness of the text translation are improved; the standard text image corresponding to the standard text information is generated, and the standard text image is covered in the interface screenshot according to the position information associated with the text information to be translated, so that the user can more visually acquire the standard text information.

Fig. 3 is a schematic structural diagram of some text translation apparatuses disclosed in the embodiment of the present disclosure, which may be applied to a case of translating a cloud device display interface rendered by a client. The device of the embodiment can be implemented by software and/or hardware, and can be integrated on any electronic equipment with computing capability.

As shown in fig. 3, the text translation apparatus 30 disclosed in this embodiment may include a display interface generating module 31 and a translating module 32, where:

the display interface generating module 31 is configured to render video stream data acquired from the cloud device, and generate a current display interface;

and the translation module 32 is configured to obtain text information to be translated included in the current display interface according to the translation instruction for the current display interface, and translate the text information to be translated.

Optionally, the apparatus further includes a translation instruction generation module, specifically configured to:

and triggering and generating the translation instruction according to the touch operation of the suspension button in the current display interface.

Optionally, the translation module 32 is specifically configured to:

and acquiring an interface screenshot of the current display interface according to the translation instruction, and performing text recognition on the interface screenshot to acquire the text information to be translated.

Optionally, the translation module 32 is specifically further configured to:

calling a MediaProjecton interface according to the translation instruction, and creating a virtual screen through the MediaProjecton interface;

and obtaining the image data of the current display interface according to the virtual screen, and rendering the image data to obtain an interface screenshot of the current display interface.

Optionally, the translation module 32 is specifically further configured to:

carrying out bitmap format conversion on the interface screenshot, and inputting the interface screenshot after bitmap format conversion into a text recognition model;

and acquiring the text information to be translated in the interface screenshot and the position information associated with the text information to be translated according to the output result of the text recognition model.

Optionally, the translation module 32 is specifically further configured to:

and inputting the text information to be translated into a translation model, and acquiring standard text information corresponding to the text information to be translated according to an output result of the translation model.

Optionally, the apparatus further includes an image overlay module, specifically configured to:

and generating a standard text image corresponding to the standard text information, and covering the standard text image into the interface screenshot according to the position information associated with the text information to be translated.

The text translation apparatus 30 disclosed in the embodiment of the present disclosure can execute the text translation method disclosed in the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method. Reference may be made to the description in the method embodiments of the present disclosure for details that are not explicitly described in this embodiment.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 4 shows a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the device 400 can also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

A number of components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, or the like; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408 such as a magnetic disk, optical disk, or the like; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 401 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 401 executes the respective methods and processes described above, such as a text translation method. For example, in some embodiments, the text translation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the text translation method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the text translation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of text translation, comprising:

2. The method according to claim 1, before the obtaining, according to the translation instruction for the current display interface, the text information to be translated included in the current display interface, further comprising:

3. The method according to claim 1, wherein the obtaining text information to be translated included in the current display interface according to the translation instruction for the current display interface comprises:

4. The method of claim 3, wherein the obtaining an interface screenshot of the current display interface according to the translation instruction comprises:

5. The method of claim 3, wherein performing text recognition on the interface screenshot to obtain the text information to be translated comprises:

6. The method of claim 5, wherein translating the textual information to be translated comprises:

7. The method according to claim 6, further comprising, after obtaining the standard text information corresponding to the text information to be translated:

8. A text translation apparatus comprising:

9. The apparatus according to claim 8, further comprising a translation instruction generation module, specifically configured to:

10. The apparatus of claim 8, wherein the translation module is specifically configured to:

11. The apparatus of claim 10, wherein the translation module is further specifically configured to:

12. The apparatus of claim 10, wherein the translation module is further specifically configured to:

13. The apparatus of claim 12, wherein the translation module is further specifically configured to:

14. The apparatus according to claim 13, further comprising an image overlay module, in particular for:

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.