CN111783508A

CN111783508A - Method and apparatus for processing image

Info

Publication number: CN111783508A
Application number: CN201910800462.4A
Authority: CN
Inventors: 刘程雨; 桂创华; 刘海锋; 王培轩; 许亮; 秦慧娟
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2020-10-16

Abstract

Embodiments of the present disclosure disclose methods and apparatus for processing images. One embodiment of the method comprises: acquiring an image to be processed, wherein the image to be processed comprises a first character object belonging to a first language; identifying the first character object to generate a first character indicated by the first character object; translating the first characters to generate second characters belonging to a second language; and fusing the second characters and the image to be processed to generate a processed image. This embodiment enriches the way in which images are processed and helps to increase the speed at which text objects in an image are translated to generate a translated text image.

Description

Method and apparatus for processing image

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for processing an image.

Background

In the prior art, products on a website are often introduced through images containing text objects. The page used to present the images described above is commonly referred to as a detail page. However, the languages and texts used by users in different countries are also different, for example, the image in the detail page of a product for users in china usually contains chinese, and the detail page of the product usually needs to translate chinese in the image when being presented to overseas users.

At present, a manual translation mode is generally adopted to translate characters in an image. For example, an image containing characters is delivered to a translation company, manual image processing is performed by a translator of the company, and the translated image is written back to a website platform.

Disclosure of Invention

The present disclosure presents methods and apparatus for processing images.

In a first aspect, an embodiment of the present disclosure provides a method for processing an image, the method including: acquiring an image to be processed, wherein the image to be processed comprises a first character object belonging to a first language; identifying the first character object to generate a first character indicated by the first character object; translating the first characters to generate second characters belonging to a second language; and fusing the second characters and the image to be processed to generate a processed image.

In some embodiments, fusing the second text with the image to be processed includes: determining second style information of a second character based on first style information of a first character object in the image to be processed, wherein the first style information is used for indicating the style of the first character, and the second style information is used for indicating the style of the second character; and fusing the second characters at the positions of the first character objects in the image to be processed according to the style indicated by the second style information.

In some embodiments, the first style information and the second style information each include a font size; and determining second style information of a second character based on the first style information of the first character object in the image to be processed, wherein the second style information comprises: determining a target text box containing the first character object based on the font size and the number of characters of the first character object; and determining the character size of the second character based on the image area where the target text box is located and the number of the characters of the second character.

In some embodiments, determining the font size of the second word based on the image area where the target text box is located and the number of characters of the second word comprises: determining an image area for presenting the second characters based on the number of the characters of the second characters and the font size of the first character object; in response to the image area for presenting the second characters being larger than the image area where the target text box is located, reducing the font size of the first character object, and determining the reduced font size as the font size of the second characters; and in response to the image area for presenting the second characters being smaller than the image area where the target text box is located, increasing the font size of the first character object, and determining the increased font size as the font size of the second characters.

In some embodiments, fusing the second text with the image to be processed includes: determining a position of a target text box containing a first word object; based on the location, a second word is fused within the target text box.

In some embodiments, the method further comprises at least one of: changing the current presentation style of the target text box in response to detecting the click operation aiming at the target text box; in response to detecting the moving operation aiming at the target text box, moving the target text box and the second characters according to the indication of the moving operation; in response to detecting a zoom operation for the target text box, the target text box and the second word are zoomed as indicated by the zoom operation.

In some embodiments, fusing the second text with the image to be processed includes: determining the position of a first character object in an image to be processed; performing character erasing processing on a first character object in an image to be processed to obtain an erased image; and fusing the second characters at the position corresponding to the position of the first character object in the erased image.

In some embodiments, translating the first word to generate a second word in a second language comprises: inputting the first characters into a pre-trained translation model to generate second characters belonging to a second language, wherein the translation model is used for translating the input characters belonging to the first language into the characters belonging to the second language; and the translation model is obtained by training the following steps: acquiring a training sample set, wherein the training samples in the training sample set comprise characters belonging to a first language and characters belonging to a second language obtained by translating the characters belonging to the first language; and training to obtain a translation model by using a machine learning algorithm and using characters belonging to a first language in training samples in the training sample set as input data and characters belonging to a second language corresponding to the input data as expected output data.

In some embodiments, the method further comprises: in response to the detection of the character modification operation aiming at the second character, modifying the second character, and taking the modified character as a new second character; and taking the first characters corresponding to the new second characters as input data of the translation model, taking the new second characters as expected output data of the translation model, and training to obtain the new translation model.

In some embodiments, the first style information and the second style information respectively include at least one of a font, a color, a font size, an alignment, and a font thickness.

In a second aspect, an embodiment of the present disclosure provides an apparatus for processing an image, the apparatus including: an acquisition unit configured to acquire an image to be processed, wherein the image to be processed contains a first literal object belonging to a first language; the recognition unit is configured to recognize the first character object and generate a first character indicated by the first character object; a translation unit configured to translate the first character to generate a second character belonging to a second language; and the fusion unit is configured to fuse the second characters with the image to be processed to generate a processed image.

In some embodiments, the fusion unit comprises: the first determining subunit is configured to determine second style information of a second character based on first style information of a first character object in the image to be processed, wherein the first style information is used for indicating a style of the first character, and the second style information is used for indicating a style of the second character; and the fusion subunit is configured to fuse the second character to the position of the first character object in the image to be processed according to the style indicated by the second style information.

In some embodiments, the first style information and the second style information each include a font size; and the first determining subunit includes: a first determination module configured to determine a target text box containing a first literal object based on the font size and number of characters of the first literal object; and the second determining module is configured to determine the word size of the second word based on the image area where the target text box is located and the number of characters of the second word.

In some embodiments, the second determining module comprises: a first determination submodule configured to determine an image area for presenting the second letter based on the number of characters of the second letter and the font size of the first letter object; a second determining submodule configured to reduce the font size of the first character object in response to the image area for presenting the second character being larger than the image area where the target text box is located, and determine the reduced font size as the font size of the second character; and the third determining sub-module is configured to respond to the image area for presenting the second characters being smaller than the image area where the target text box is located, increase the word size of the first character object, and determine the increased word size as the word size of the second characters.

In some embodiments, the fusion unit comprises: a second determining subunit configured to determine a position of a target text box containing the first text object; a first fusing subunit configured to fuse the second word within the target text box based on the location.

In some embodiments, the apparatus further comprises at least one of: a changing unit configured to change a current presentation style of the target text box in response to detecting a click operation for the target text box; a moving unit configured to move the target text box and the second word in accordance with an instruction of a moving operation in response to detection of the moving operation for the target text box; and the zooming unit is configured to zoom the target text box and the second characters according to the indication of the zooming operation in response to the detection of the zooming operation aiming at the target text box.

In some embodiments, the fusion unit comprises: the third determining subunit is configured to determine the position of the first character object in the image to be processed; the erasing subunit is configured to perform character erasing processing on the first character object in the image to be processed to obtain an erased image; and the second fusion subunit is configured to fuse the second character at a position corresponding to the position of the first character object in the erased image.

In some embodiments, the translation unit comprises: a generation subunit, configured to input the first text to a pre-trained translation model, and generate a second text belonging to a second language, wherein the translation model is used for translating the input text belonging to the first language into the text belonging to the second language; and the translation model is obtained by training the following steps: acquiring a training sample set, wherein the training samples in the training sample set comprise characters belonging to a first language and characters belonging to a second language obtained by translating the characters belonging to the first language; and training to obtain a translation model by using a machine learning algorithm and using characters belonging to a first language in training samples in the training sample set as input data and characters belonging to a second language corresponding to the input data as expected output data.

In some embodiments, the apparatus further comprises: a modification unit configured to modify the second word in response to detecting a word modification operation for the second word, the modified word being a new second word; and the training unit is configured to train the first characters corresponding to the new second characters to obtain a new translation model by taking the new second characters as expected output data of the translation model.

In a third aspect, an embodiment of the present disclosure provides an electronic device for processing an image, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments of the method for processing images as described above.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium for processing an image, on which a computer program is stored, which when executed by a processor, implements the method of any of the embodiments of the method for processing an image as described above.

The method and the device for processing the image, provided by the embodiments of the present disclosure, acquire an image to be processed, where the image to be processed includes a first text object belonging to a first language, then recognize the first text object to generate a first text indicated by the first text object, then translate the first text to generate a second text belonging to a second language, and finally fuse the second text with the image to be processed to generate a processed image, so that the embodiments of the present disclosure may be completely executed by an electronic device (e.g., a server or a terminal device), or may be executed by a user (e.g., a person in charge of image design). In the case where embodiments of the present disclosure are performed entirely by an electronic device, embodiments of the present disclosure may increase the speed of translating a textual object in an image to generate a translated image of text (i.e., the processed image described above); when the image layout processing method can be executed through interaction with a user, the embodiment of the disclosure can avoid the layout problem of the generated processed image to some extent.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for processing an image according to the present disclosure;

FIG. 3 is a schematic illustration of an application scenario of a method for processing an image according to the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a method for processing an image according to the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of an apparatus for processing images according to the present disclosure;

FIG. 6 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 of an embodiment of a method for processing an image or an apparatus for processing an image to which embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or transmit data (e.g. images to be processed), etc. The

terminal devices

101, 102, 103 may have various client applications installed thereon, such as image processing software, video playing software, news information applications, image processing applications, web browser applications, shopping applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, they may be various electronic devices with a display screen, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as an image processing server that processes images transmitted by the

terminal apparatuses

101, 102, 103. The image processing server can perform processing such as character object recognition and translation on the acquired data such as the image to be processed, and fuse the translated characters with the received image to generate a processed image. Optionally, the server may also feed back the processing result (e.g., the processed image) to the terminal device. As an example, the server 105 may be a cloud server or a physical server.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be further noted that the method for processing the image provided by the embodiment of the present disclosure may be executed by the server, may also be executed by the terminal device, and may also be executed by the server and the terminal device in cooperation with each other. Accordingly, the various parts (e.g., the various units, sub-units, modules, and sub-modules) included in the apparatus for processing an image may be all disposed in the server, may be all disposed in the terminal device, and may be disposed in the server and the terminal device, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. The system architecture may only include electronic devices (e.g., servers or terminal devices) on which the method for processing images is run, when the electronic devices on which the method for processing images is run do not need to perform data transfer with other electronic devices.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for processing an image in accordance with the present disclosure is shown. The method for processing the image comprises the following steps:

step 201, acquiring an image to be processed.

In this embodiment, an execution subject (for example, a server or a terminal device shown in fig. 1) of the method for processing an image may acquire an image to be processed from a remote place or a local place by a wired connection manner or a wireless connection manner. The image to be processed comprises a first character object belonging to a first language. The image to be processed may be an image to be translated for the first textual object it contains. The first language may be a variety of languages, such as chinese, russian, german, timier, hindi, and so forth. The first text object may be a video of the first text presented in the image to be processed. The first word may be a word belonging to a first language.

Step 202, identifying the first character object, and generating a first character indicated by the first character object.

In this embodiment, the execution main body may recognize the first character object, and generate the first character indicated by the first character object.

Here, the executing entity may recognize the first text object by using an Optical Character Recognition (OCR) technology, so as to generate the first text indicated by the first text object.

Optionally, the executing main body may further execute the step 202 in the following manner:

and inputting the image to be processed into a pre-trained recognition model so as to recognize the first character object and generate a first character indicated by the first character object. The recognition model is used for representing the corresponding relation between a first character object contained in the input image and a first character indicated by the first character object.

The recognition model can be a convolutional neural network obtained by adopting a machine learning algorithm and training based on a training sample set. Wherein, the training samples in the training sample set include: an image containing a first textual object and a first text indicated by the first textual object.

Step 203, translating the first character to generate a second character belonging to a second language.

In this embodiment, the execution body may translate the first text to generate a second text belonging to a second language. Wherein the second language is a language different from the first language. As an example, the first language may be chinese and the second language may be a language other than chinese, e.g., the second language may be english. It will be appreciated that it is generally possible to determine which language the first language and the second language respectively are according to actual requirements. The second words may be words belonging to the second language. As an example, if the first text is "dress", then the second text may be "address".

For example, the executing agent may execute the step 203 by using the following method:

and inputting the first characters into a pre-trained translation model to generate second characters belonging to a second language. The translation model is used for translating the input characters belonging to the first language into the characters belonging to the second language.

Here, the translation model may be obtained by training as follows:

step one, a training sample set is obtained. The training samples in the training sample set comprise characters belonging to a first language and characters belonging to a second language obtained by translating the characters belonging to the first language.

Here, the words belonging to the second language in the training sample may be words obtained by translating words belonging to the first language in a machine translation or a manual translation manner.

And step two, training to obtain a translation model by using a machine learning algorithm and using characters belonging to a first language in training samples in the training sample set as input data and characters belonging to a second language corresponding to the input data as expected output data.

Specifically, the execution agent or the electronic device communicatively connected to the execution agent may train the initial model using a character in a first language in a training sample set as input data and a character in a second language corresponding to the input data as expected output data, so as to determine the initial model satisfying a predetermined training end condition as a trained translation model.

The initial model may include various model structures, such as AlexNet, ZFNet, OverFeat, VGG (Visual Geometry Group) Network, and so on. As an example, the initial model may be a convolutional neural network. The end-of-training conditions may include, but are not limited to, at least one of the following: the training time length exceeds the preset time length, the training times exceed the preset times, and the function value calculated based on the predetermined loss function is smaller than the preset threshold value.

It can be understood that, when the initial model does not satisfy the training end condition, the model parameters of the initial model may be adjusted by using an algorithm such as a gradient descent method, a back propagation method, or the like, so that the initial model after being adjusted by the model parameters satisfies the training end condition.

Optionally, the executing main body may further execute the step 203 by using the following method:

first, a technician may establish a two-dimensional table or database characterizing the correspondence between the first word and the second word through a number of statistics.

Then, the executing entity may search the two-dimensional table or the database for a second word having a corresponding relationship with the first word generated in step 202, and determine the searched second word as a second word generated by translating the first word and belonging to a second language.

And 204, fusing the second characters with the image to be processed to generate a processed image.

In this embodiment, the executing entity may fuse the second text with the image to be processed to generate a processed image.

It is understood that the processed image obtained after the image fusion includes a second character object indicating a second character. The second character object is an image of a second character displayed in the processed image.

As an example, the executing entity may fuse the second text with the image to be processed by using an algorithm such as a weighted average method, a feathering algorithm, or laplacian pyramid fusion, so as to generate a processed image. Optionally, the executing body may further adjust the background color of the second text to the background color of the first text object, so that the second text with the background color adjusted is directly superimposed on the position of the first text object in the image to be processed, and the processed image is obtained.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing an image according to the present embodiment. In the application scenario of fig. 3, the server 301 first acquires the to-be-processed image 3011 from the terminal 302. The to-be-processed image 3011 includes a first character object (shown as "wedding") belonging to a first language (shown as chinese). Then, the server 301 recognizes the first character object, and generates the first character 3012 indicated by the first character object. Then, the server 301 translates the first character 3012 to generate a second character 3013 in a second language (english in the figure). Finally, the server 301 fuses the second text 3013 with the image to be processed 3011, and generates a processed image 3014. Optionally, the server 301 may also send the processed image 3014 to the terminal 302.

It is understood that fig. 3 is only a schematic diagram of an application scenario of the method for processing an image according to the present embodiment, and should not perform any limiting function on the embodiments of the present disclosure. In practice, the execution subject for executing the embodiments of the present disclosure may be a server or a terminal, and furthermore, the method for processing the image for executing the embodiments of the present disclosure may be executed by the server and the terminal in cooperation with each other.

In the method provided by the above embodiment of the present disclosure, the image to be processed is obtained, where the image to be processed includes a first text object belonging to a first language, then the first text object is recognized to generate a first text indicated by the first text object, then the first text is translated to generate a second text belonging to a second language, and finally the second text is fused with the image to be processed to generate the processed image, so that the processing manner of the image is enriched. Furthermore, embodiments of the present disclosure may be performed entirely by an electronic device (e.g., a server or a terminal device), or may be performed by a user (e.g., a person in charge of image design) participating. In the case where embodiments of the present disclosure are performed entirely by an electronic device, embodiments of the present disclosure may increase the speed of translating a textual object in an image to generate a translated image of text (i.e., the processed image described above); when the image layout processing method can be executed through interaction with a user, the embodiment of the disclosure can avoid the layout problem of the generated processed image to some extent.

In some optional implementation manners of this embodiment, fusing the second text with the image to be processed may include the following steps:

the method comprises a first step of determining second style information of a second character based on first style information of a first character object in an image to be processed. The first style information is used for indicating the style of the first character, and the second style information is used for indicating the style of the second character.

In some optional implementations of the embodiment, the first style information and the second style information respectively include at least one of a font, a color, a font size, an alignment manner, and a font thickness.

As an example, the second style information may be the same as the style of the text indicated by the first style information. For example, if the first style information indicates that the style of the first text object in the image to be processed is a red font, the second style information may also indicate that the style of the second text is a red font.

And a second step of fusing the second characters at the positions of the first character objects in the image to be processed according to the style indicated by the second style information.

It is understood that the alternative implementation manner may determine the style information of the second character by using the style information of the first character as a basis, thereby facilitating the subsequent generation of the processed image which is more similar to the aspects of the style, layout, color, overall effect and the like of the image to be processed. Compared with the scheme of adjusting the style of the second character after manual translation, the optional implementation mode can improve the speed of generating the processed image.

In some optional implementations of the present embodiment, the first style information and the second style information each include a font size. And, the first step may include the following sub-steps:

the first sub-step, based on the font size and number of characters of the first literal object, determines the target text box containing the first literal object. The target text box may be a minimum text box containing the first text object, or may be a text box whose distance from the first text indicated by the first text object is a preset distance threshold (e.g., 0.5 cm, 0.3 cm, etc.). The font size of the first textual object may be the font size of the first text indicated by the first textual object.

And a second substep of determining the font size of the second character based on the image area where the target text box is located and the number of characters of the second character.

In some optional implementations of this embodiment, the executing body may execute the second substep by using the following method:

first, an image area for presenting the second letter is determined based on the number of characters of the second letter and the font size of the first letter object. The image area for presenting the second text may be: the image area in which each character included in the second character is presented in accordance with the same font size as that of the first character object may be: and the image area takes the height of the character size of the first character indicated by the first character object as the width and the product of the number of the characters of the second character and the length of the character size of the first character indicated by the first character object as the length.

Then, in response to the fact that the image area for presenting the second characters is larger than the image area where the target text box is located, the word size of the first character object is reduced, and the reduced word size is determined to be the word size of the second characters; and in response to the image area for presenting the second characters being smaller than the image area where the target text box is located, increasing the font size of the first character object, and determining the increased font size as the font size of the second characters.

As an example, if the image area for presenting the second word is larger than the image area where the target text box is located, and the font size of the current first word object is "four", the font size of the first word object may be reduced by 1, that is, by "five", and the reduced font size is determined as the font size of the second word, whereby the font size of the second word may be determined as "five".

Optionally, if the image area for presenting the second text is larger than the image area where the target text box is located, and the font size of the current first text object is "four", the font size of the first text object may be decreased by 1, and then it is determined whether the image area for presenting the second text according to the decreased font size is larger than the image area where the target text box is located, if still larger, the font size of the first text object is continuously decreased by 1, and so on, until the image area for presenting the second text is smaller than or equal to the image area where the target text box is located.

In addition, the execution main body can also determine the font size of the reduced first character object according to the size relationship between the length and the width of the image area for presenting the second character and the image area where the target text box is located, and determine the font size after reduction as the font size of the second character.

Similarly, in the case where the image area for presenting the second word is smaller than the image area in which the target text box is located, the font size of the second word may be determined in a manner similar to the method described above.

It should be noted that the image area of the second word smaller than the image area of the target text box may refer to: the length and the width of the image area of the second character are respectively smaller than those of the image area where the target text box is located; the image area of the second word larger than the image area of the target text box may refer to: the length and width of the image area of the second character are respectively greater than the length and width of the image area where the target text box is located.

It will be appreciated that decreasing the font size of the first textual object means: reducing the area of an image area corresponding to the first character for presenting the first character object indication; adding the font size of the first literal object means: and increasing the area of the image area corresponding to the first character for presenting the first character object indication.

Optionally, the executing body may also execute the second substep by using the following method:

searching the length and the width of the image area where the target text box is located and the character size corresponding to the number of the characters of the second character in a predetermined corresponding relation table; and determining the searched word size as the word size of the second character.

It can be understood that, in the optional implementation manner, the font size of the second character can be determined based on the image area where the target text box is located and the number of characters of the second character, so that the font size suitable for the presentation of the second character is obtained, and the problem of typesetting of subsequently generated processed images caused by too large or too small font size of the second character is avoided to a certain extent.

first, the position of a target text box containing a first word object is determined. The target text box may be a minimum text box containing the first text object, or may be a text box whose distance from the first text indicated by the first text object is a preset distance threshold (e.g., 0.5 cm, 0.3 cm, etc.). The location of the target text box may be characterized "(x, y, w, h)" in the following manner. X can represent the number of pixel points on the left side of the upper left corner of the target text box in the image to be processed; y can represent the number of pixel points above the upper left corner of the target text box in the image to be processed; w may characterize the width of the target text box; h may characterize the height of the target text box. Optionally, the position of the target text box may also be characterized by the position of the center point of the target text box.

And then, fusing the second characters in the target text box based on the position of the target text box.

It is understood that in the alternative implementation manner, the second word may be fused in the target text box, so as to avoid the typesetting problem of the subsequently generated processed image caused by the position of the second word.

In some optional implementations of this embodiment, the method further comprises at least one of:

the first item, in response to detecting a click operation for the target text box, changes a current presentation style of the target text box.

It can be understood that, when the user performs a click operation on the target text box, the user generally needs to perform an operation such as editing the content in the target text box. Accordingly, the execution subject may notify the user that the target text box is currently in an editable state by changing the current presentation style of the target text box. For example, the execution body may change the current presentation style of the target text box by changing the color of the text box or by changing the line of the text box (for example, changing the solid line to the dotted line or changing the dotted line to the solid line).

And a second item, responding to the detection of the movement operation aiming at the target text box, and moving the target text box and the second character according to the indication of the movement operation.

It can be understood that, in the case that the user performs the moving operation on the target text box, the user generally needs to move the target text box and the second word to adjust the position of the second word in the subsequently generated processed image.

And in response to detecting the zooming operation aiming at the target text box, zooming the target text box and the second characters according to the indication of the zooming operation.

It can be understood that, in the case that the user performs a scaling operation on the target text box, the user generally needs to scale the target text box and the second word to adjust the word size of the second word in the subsequently generated processed image. Here, the execution body may automatically adjust the size of the font size in the text box when the user adjusts the width and height of the text box, and automatically fill the text box, thereby reducing the user operation.

the first step is to determine the position of a first character object in the image to be processed.

And secondly, performing character erasing processing on the first character object in the image to be processed to obtain an erased image.

And a third step of fusing the second characters at the position corresponding to the position of the first character object in the erased image.

It can be understood that in the alternative implementation manner, the second text may be used to replace the first text object in the image to be processed, so as to reduce the mottle contained in the subsequently generated processed image.

In some optional implementation manners of this embodiment, the executing main body may further perform the following steps:

firstly, under the condition that the character modification operation aiming at the second character is detected, the second character is modified, and the modified character is used as a new second character.

Then, a new translation model is obtained by training using the first character corresponding to the new second character as input data of the translation model and using the new second character as expected output data of the translation model. Wherein the first word corresponding to the new second word may be: and translating the modified second words into the first words before translation.

It can be understood that, in the case that the user performs the text modification operation on the second text, the user generally needs to modify the content of the second text. In this case, it is usually assumed that there is an error in the second characters output by the trained translation model, and therefore, in the alternative implementation manner, the first characters corresponding to the new second characters may be used as the input data of the translation model, and the new second characters may be used as the expected output data of the translation model, so that the translation model continues to be trained, the purpose of adjusting the output of the training model is achieved, and the accuracy of the second characters output by the new translation model is further improved.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for processing an image is shown. The flow 400 of the method for processing an image comprises the steps of:

step 401, acquiring an image to be processed.

In this embodiment, step 401 is substantially the same as step 201 in the corresponding embodiment of fig. 2, and is not described here again.

Step 402, identifying the first character object, and generating a first character indicated by the first character object.

In this embodiment, step 402 is substantially the same as step 202 in the corresponding embodiment of fig. 2, and is not described herein again.

Step 403, translating the first character to generate a second character belonging to a second language.

In this embodiment, step 403 is substantially the same as step 203 in the corresponding embodiment of fig. 2, and is not described herein again.

And step 404, in response to detecting the clicking operation aiming at the target text box, changing the current presentation style of the target text box.

In the present embodiment, in the case where a click operation for a target text box is detected, an execution subject (e.g., a server or a terminal device shown in fig. 1) of the method for processing an image may change the current presentation style of the target text box.

Step 405, in response to detecting the moving operation for the target text box, moving the target text box and the second character according to the indication of the moving operation.

In this embodiment, when the moving operation for the target text box is detected, the execution main body may move the target text box and the second character according to the instruction of the moving operation.

And 406, in response to detecting the scaling operation for the target text box, scaling the target text box and the second character according to the indication of the scaling operation.

In this embodiment, when a zoom operation for the target text box is detected, the execution main body may zoom the target text box and the second character according to an instruction of the zoom operation.

It can be understood that, in the case that the user performs a scaling operation on the target text box, the user generally needs to scale the target text box and the second word to adjust the word size of the second word in the subsequently generated processed image.

And 407, fusing the second characters with the image to be processed to generate a processed image.

In this embodiment, step 407 is substantially the same as step 204 in the corresponding embodiment of fig. 2, and is not described herein again.

In practice, the target terminal may interact with the user through an editor. When the execution main body is a terminal device, the target terminal may be the execution main body; when the execution agent is a server, the target terminal may be a terminal device communicatively connected to the execution agent.

For example, the editor may be composed of three parts, a first part may be used to present a thumbnail of the image to be processed, a second part may be a main editing area, and a third part may be a toolbar. Wherein the first part can scale the picture equally by setting the label "max-width" of the Cascading Style Sheets (CSS). For example, the tag "max-width" may be set to "400 px". The second portion may be provided with a label "< div >" having a width and height equal to the original width and height of the image to be processed for rendering the processed image in real time. The third portion may present the currently selected target text box or the associated attribute values (e.g., font size, font, color, etc.) of the words within the target text box. The user may also modify the style of the currently selected target text box via a toolbar.

Here, when the user mouse moves to the first predetermined area of the above-mentioned target text box, the mouse cursor may become a movable cursor graphic. And when the user clicks the inside of the first preset area by a mouse and finishes moving, acquiring the coordinates of the top left vertex of the first preset area, modifying the coordinates of the target text box and the second character according to the coordinates, and synchronously modifying the attribute values displayed on the toolbar.

In addition, the user can adjust the width and height of the target text box by dragging the lower right corner of the target text box. And synchronously modifying the corresponding values of the width and the height on the toolbar when the width and the height of the target text box are adjusted. When the width and the height of the text box are adjusted, the execution main body can send a request to automatically calculate the size of the font size, and automatically change the size of the font according to the calculated font size, so that the text box can be filled with characters, and user operation is reduced.

It should be understood that the execution main body may also record history information of each operation (for example, a moving operation, a modifying operation, a zooming operation, and the like) on the image to be processed, so that when the user cancels the operation by himself, the image before the current operation is backed up.

It should be noted that, besides the above-mentioned contents, the embodiment of the present application may further include the same or similar features and effects as the embodiment corresponding to fig. 2, and details are not repeated herein.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for processing an image in the present embodiment highlights the step of interacting with the user (i.e., in case that a click operation, a move operation, a zoom operation are detected, the corresponding operations are performed). Therefore, the scheme described in the embodiment can reduce the probability of the generated processed image having the typesetting problem.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an apparatus for processing an image, the apparatus embodiment corresponding to the method embodiment shown in fig. 2, which may include the same or corresponding features as the method embodiment shown in fig. 2, in addition to the features described below, and produce the same or corresponding effects as the method embodiment shown in fig. 2. The device can be applied to various electronic equipment.

As shown in fig. 5, the apparatus 500 for processing an image of the present embodiment includes: an acquisition unit 501, a recognition unit 502, a translation unit 503, and a fusion unit 504. Wherein the obtaining unit 501 is configured to obtain an image to be processed, wherein the image to be processed contains a first literal object belonging to a first language; the recognition unit 502 is configured to recognize the first literal object, generating a first literal indicated by the first literal object; the translation unit 503 is configured to translate the first text to generate a second text belonging to a second language; the fusing unit 504 is configured to fuse the second text with the image to be processed, and generate a processed image.

In this embodiment, the acquiring unit 501 of the apparatus 500 for processing an image may acquire an image to be processed from a remote place or a local place by a wired connection manner or a wireless connection manner. The image to be processed comprises a first character object belonging to a first language. The image to be processed may be an image to be translated for the first textual object it contains. The first language may be a variety of languages, such as chinese, russian, german, timier, hindi, and so forth. The first text object may be a video of the first text presented in the image to be processed. The first word may be a word belonging to a first language.

In this embodiment, the identifying unit 502 may identify the first character object and generate the first character indicated by the first character object.

In this embodiment, the translation unit 503 may translate the first character to generate a second character in a second language. Wherein the second language is a language different from the first language. As an example, the first language may be chinese and the second language may be a language other than chinese, e.g., the second language may be english. It will be appreciated that it is generally possible to determine which language the first language and the second language respectively are according to actual requirements. The second words may be words belonging to the second language.

In this embodiment, the fusing unit 504 may fuse the second text with the image to be processed, so as to generate a processed image.

In some optional implementations of this embodiment, the fusion unit 504 includes: the first determining subunit (not shown in the figure) is configured to determine second style information of a second text based on first style information of a first text object in the image to be processed, wherein the first style information is used for indicating a style of the first text, and the second style information is used for indicating a style of the second text. The fusing subunit (not shown in the figure) is configured to fuse the second text to the position of the first text object in the image to be processed according to the style indicated by the second style information.

In some optional implementations of this embodiment, the first style information and the second style information respectively include a font size; and the first determining subunit includes: a first determination module (not shown in the figures) is configured to determine a target text box containing the first literal object based on the font size and number of characters of the first literal object; the second determining module (not shown in the figure) is configured to determine the font size of the second word based on the image area where the target text box is located and the number of characters of the second word.

In some optional implementations of this embodiment, the second determining module includes: a first determination submodule (not shown in the figure) configured to determine an image area for presenting the second letter based on the number of characters of the second letter and the font size of the first letter object; a second determining submodule (not shown in the figure) configured to reduce the font size of the first character object in response to the image area for presenting the second character being larger than the image area in which the target text box is located, and determine the reduced font size as the font size of the second character; a third determining sub-module (not shown in the figure) is configured to increase the font size of the first text object in response to the image area for presenting the second text being smaller than the image area in which the target text box is located, and determine the increased font size as the font size of the second text.

In some optional implementations of this embodiment, the fusion unit 504 includes: a second determining subunit (not shown in the figure) is configured to determine a position of a target text box containing the first text object; a first fusion subunit (not shown) is configured to fuse the second word within the target text box based on the location.

In some optional implementations of this embodiment, the apparatus 500 further comprises at least one of: a changing unit (not shown in the figure) configured to change a current presentation style of the target text box in response to detection of a click operation for the target text box; a moving unit (not shown in the figure) is configured to move the target text box and the second character according to the indication of the moving operation in response to detecting the moving operation for the target text box; the scaling unit (not shown in the figure) is configured to scale the target text box and the second word according to an instruction of a scaling operation in response to detecting the scaling operation for the target text box.

In some optional implementations of this embodiment, the fusion unit 504 includes: a third determining subunit (not shown in the figure) is configured to determine the position of the first text object in the image to be processed; the erasing subunit (not shown in the figure) is configured to perform character erasing processing on the first character object in the image to be processed to obtain an erased image; the second fusing subunit (not shown in the figure) is configured to fuse the second text at a position in the erased image corresponding to the position where the first text object is located.

In some optional implementations of this embodiment, the translation unit 503 includes: the generating subunit (not shown in the figure) is configured to input the first characters to a pre-trained translation model, and generate second characters belonging to a second language, wherein the translation model is used for translating the input characters belonging to the first language into the characters belonging to the second language; and the translation model is obtained by training the following steps: acquiring a training sample set, wherein the training samples in the training sample set comprise characters belonging to a first language and characters belonging to a second language obtained by translating the characters belonging to the first language; and training to obtain a translation model by using a machine learning algorithm and using characters belonging to a first language in training samples in the training sample set as input data and characters belonging to a second language corresponding to the input data as expected output data.

In some optional implementations of this embodiment, the apparatus 500 further includes: a modification unit (not shown in the figure) is configured to modify the second character in response to detecting a character modification operation for the second character, and take the modified character as a new second character; the training unit (not shown in the figure) is configured to train a new translation model by using the first character corresponding to the new second character as input data of the translation model and using the new second character as expected output data of the translation model.

In the apparatus provided by the foregoing embodiment of the present disclosure, the obtaining unit 501 obtains an image to be processed, where the image to be processed includes a first text object belonging to a first language, then the identifying unit 502 identifies the first text object to generate a first text indicated by the first text object, then the translating unit 503 translates the first text to generate a second text belonging to a second language, and finally, the fusing unit 504 fuses the second text and the image to be processed to generate a processed image, so as to enrich a processing manner of the image. Furthermore, embodiments of the present disclosure may be performed entirely by an electronic device (e.g., a server or a terminal device), or may be performed by a user (e.g., a person in charge of image design) participating. In the case where embodiments of the present disclosure are performed entirely by an electronic device, embodiments of the present disclosure may increase the speed of translating a textual object in an image to generate a translated image of text (i.e., the processed image described above); in the case where the execution is also performed by the user, the embodiment of the present disclosure may avoid the layout problem of the generated processed image to some extent.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device/server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure.

It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an image to be processed, wherein the image to be processed comprises a first character object belonging to a first language; identifying the first character object to generate a first character indicated by the first character object; translating the first characters to generate second characters belonging to a second language; and fusing the second characters and the image to be processed to generate a processed image.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a recognition unit, a translation unit, and a fusion unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the acquisition unit may also be described as a "unit that acquires an image to be processed".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for processing an image, comprising:

acquiring an image to be processed, wherein the image to be processed comprises a first character object belonging to a first language;

identifying the first character object to generate a first character indicated by the first character object;

translating the first characters to generate second characters belonging to a second language;

and fusing the second characters and the image to be processed to generate a processed image.

2. The method of claim 1, wherein the fusing the second text with the image to be processed comprises:

determining second style information of the second character based on first style information of the first character object in the image to be processed, wherein the first style information is used for indicating the style of the first character, and the second style information is used for indicating the style of the second character;

and fusing the second characters at the position of the first character object in the image to be processed according to the style indicated by the second style information.

3. The method of claim 2, wherein the first style information and the second style information respectively include a font size; and

the determining second style information of the second text based on the first style information of the first text object in the image to be processed includes:

determining a target text box containing the first literal object based on the font size and the number of characters of the first literal object;

and determining the word size of the second word based on the image area where the target text box is located and the number of the characters of the second word.

4. The method of claim 3, wherein the determining the font size of the second word based on the image area where the target text box is located and the number of characters of the second word comprises:

determining an image area for presenting the second character based on the number of characters of the second character and the font size of the first character object;

in response to the image area for presenting the second characters being larger than the image area where the target text box is located, reducing the font size of the first character object, and determining the reduced font size as the font size of the second characters;

and in response to the fact that the image area for presenting the second characters is smaller than the image area where the target text box is located, increasing the font size of the first character object, and determining the increased font size as the font size of the second characters.

5. The method of claim 1, wherein the fusing the second text with the image to be processed comprises:

determining a position of a target text box containing the first word object;

fusing the second word in the target text box based on the position.

6. The method of claim 5, wherein the method further comprises at least one of:

in response to detecting a click operation on the target text box, changing a current presentation style of the target text box;

in response to detecting a moving operation for the target text box, moving the target text box and the second words according to the indication of the moving operation;

in response to detecting a zoom operation for the target text box, zooming the target text box and the second word as indicated by the zoom operation.

7. The method of claim 1, wherein the fusing the second text with the image to be processed comprises:

determining the position of the first character object in the image to be processed;

performing character erasing processing on the first character object in the image to be processed to obtain an erased image;

and fusing the second characters at the position corresponding to the position of the first character object in the erased image.

8. The method of any of claims 1-7, wherein said translating the first word to generate a second word in a second language comprises:

inputting the first characters into a pre-trained translation model to generate second characters belonging to a second language, wherein the translation model is used for translating the input characters belonging to the first language into the characters belonging to the second language; and

the translation model is obtained by training the following steps:

acquiring a training sample set, wherein training samples in the training sample set comprise characters belonging to a first language and characters belonging to a second language obtained by translating the characters belonging to the first language;

and training to obtain a translation model by using a machine learning algorithm and using the characters belonging to the first language in the training samples in the training sample set as input data and the characters belonging to the second language corresponding to the input data as expected output data.

9. The method of claim 8, wherein the method further comprises:

in response to the detection of the character modification operation aiming at the second character, modifying the second character, and taking the modified character as a new second character;

and taking the first characters corresponding to the new second characters as input data of the translation model, taking the new second characters as expected output data of the translation model, and training to obtain a new translation model.

10. The method according to one of claims 2 to 7, wherein the first style information and the second style information respectively include at least one of a font, a color, a font size, an alignment, and a font thickness.

11. An apparatus for processing an image, comprising:

an acquisition unit configured to acquire an image to be processed, wherein the image to be processed contains a first literal object belonging to a first language;

the recognition unit is configured to recognize the first character object and generate a first character indicated by the first character object;

a translation unit configured to translate the first character to generate a second character belonging to a second language;

and the fusion unit is configured to fuse the second characters and the image to be processed to generate a processed image.

12. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.

13. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-10.