CN110544218B

CN110544218B - Image processing method, device and storage medium

Info

Publication number: CN110544218B
Application number: CN201910829520.6A
Authority: CN
Inventors: 陈锡显
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-03
Filing date: 2019-09-03
Publication date: 2024-02-13
Anticipated expiration: 2039-09-03
Also published as: CN110544218A

Abstract

The application provides an image processing method, an image processing device, image processing equipment and a storage medium; the method comprises the following steps: acquiring color characteristics and transparency characteristics of a character image to be processed and color characteristics and transparency characteristics of a template character image; performing migration learning on the trained neural network model based on the color features and the transparency features of the template text image to obtain an adjusted neural network model; and processing the color characteristics and the transparency characteristics of the character image to be processed through the adjusted neural network model to obtain a target character image, wherein the display effect of the target character image is the same as that of the template character image. According to the method and the device, the artistic word effect graph of RGBA with transparency information can be directly generated.

Description

Image processing method, device and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an image processing method, an image processing device, and a storage medium.

Background

At present, with the improvement of living standard, the demand of people on aesthetics is also higher and higher. And artistic words with artistic effects (also can be called artistic words) are one of propaganda tools necessary for social politics and economic development. For example, art words can be added to political banners, economic advertisements and the like, thereby attracting more attention. The artistic appeal of the artistic word also plays a role in beautifying the life of people, and can bring mental enjoyment to people.

More complex art words are generally designed and manually drawn by professional designers, and the effects of the art words cannot be applied to other words through a program, so that a great deal of manpower and material resources are required.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device and a storage medium, which can directly generate an artistic word effect diagram of RGBA with transparency information.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides an image processing method, which comprises the following steps:

acquiring color characteristics and transparency characteristics of a character image to be processed and color characteristics and transparency characteristics of a template character image;

performing migration learning on the trained neural network model based on the color features and the transparency features of the template text image to obtain an adjusted neural network model;

and processing the color characteristics and the transparency characteristics of the character image to be processed through the adjusted neural network model to obtain a target character image, wherein the display effect of the target character image is the same as that of the template character image.

An embodiment of the present application provides an image processing apparatus, including:

The first acquisition module is used for acquiring the color characteristics and the transparency characteristics of the character image to be processed and the color characteristics and the transparency characteristics of the template character image;

the adjustment module is used for performing migration learning on the trained neural network model based on the color characteristics and the transparency characteristics of the template text image to obtain an adjusted neural network model;

and the processing module is used for processing the color characteristics and the transparency characteristics of the character image to be processed through the adjusted neural network model to obtain a target character image, wherein the display effect of the target character image is the same as that of the template character image.

An embodiment of the present application provides an image processing apparatus, including at least:

a memory for storing executable instructions;

and the processor is used for realizing the method provided by the embodiment of the application when executing the executable instructions stored in the memory.

The embodiment of the application provides a storage medium, which stores executable instructions for implementing the method provided by the embodiment of the application when the executable instructions cause a processor to execute.

The embodiment of the application has the following beneficial effects:

When converting a word image to be processed into a target word image, firstly acquiring the word image to be processed and a template word image in RGBA format, further acquiring color characteristics and transparency characteristics of the word image to be processed and color characteristics and transparency characteristics of the template word image, and then performing migration learning on the trained neural network model based on the color characteristics and transparency characteristics of the template word image to obtain an adjusted neural network model, so that the word image to be processed can be converted through the adjusted neural network model, and the target word image with the same display effect as the template word image can be obtained.

Drawings

FIG. 1A is a diagram showing the comparison of original text and artistic words;

FIG. 1B is a graph showing an example comparison of the effects of RGB and RGBA superposition;

FIG. 1C is a comparison of an artificial artistic word and an automatically generated artistic word of the related art;

FIG. 1D is another comparative diagram of an artistically made artwork and an automatically generated artwork in the related art;

FIG. 1E is a schematic diagram of a network architecture of an image processing method according to an embodiment of the present application;

FIG. 1F is a schematic diagram of another network architecture of an image processing method according to an embodiment of the present application;

FIG. 2 is a schematic illustration of an alternative configuration of the apparatus provided by embodiments of the present application;

FIG. 3A is a schematic flow chart of an implementation of an image processing method according to an embodiment of the present application;

fig. 3B is a schematic flow chart of another implementation of the image processing method according to the embodiment of the present application;

FIG. 4 is a schematic flow chart of training a neural network model according to an embodiment of the present disclosure;

fig. 5 is a schematic flow chart of another implementation of the image processing method according to the embodiment of the present application;

FIG. 6 is a schematic diagram of an implementation flow of an art word generating method according to an embodiment of the present application;

fig. 7A is an artistic word effect diagram generated by using the artistic word manufacturing method provided by the embodiment of the present application;

FIG. 7B is a diagram of still another effect of an artistic word generated by the method for producing an artistic word according to the embodiments of the present application;

fig. 8 is an effect diagram obtained by adding the generated artistic word to the background diagram of Banner.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

If a similar description of "first/second" appears in the application document, the following description is added, in which the terms "first/second/third" are merely distinguishing between similar objects and not representing a particular ordering of the objects, it being understood that the "first/second/third" may be interchanged with a particular order or precedence, if allowed, so that the embodiments of the application described herein may be implemented in an order other than that illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

Before further describing embodiments of the present application in detail, the terms and expressions that are referred to in the embodiments of the present application are described, and are suitable for the following explanation.

1) Artistic word: the deformed fonts subjected to artistic processing by professional font designers have the characteristics of being accordant with the meaning of characters, attractive and interesting, easy to recognize, striking and the like, and are deformed with pattern meaning or decoration meaning. The artistic word can make reasonable deformation decoration for strokes and structure of Chinese character based on meaning, shape and structure characteristics of Chinese character so as to write beautiful and vivid variant word.

2) Banner: banner advertisements contained in mobile App software, web pages and the like are main advertisement forms for popularization of Internet products. Usually, more beautiful pictures designed by professional designers are used, and elements such as background, commodity, advertising document, propaganda document and the like are also used.

3) RGB: representing Red Green Blue, is a representation of a typical pixel, representing a picture format such as. Bmp, where a picture with RGB information alone is overlaid on top of another picture to create an occlusion.

4) RGBA: is a color space representing Red Green Blue and Alpha. It can also be considered that additional information is added to the RGB model, namely transparency information is added, a in RGBA represents Alpha, transparency characteristics of added pixels are characterized by transition from 0 to 255 from completely transparent (invisible) to completely opaque, representing that the picture format is. Png, and the transparency represents the fusion superposition of the two picture pixels of the picture and the other picture.

5) Transparency channel: an 8-bit grayscale channel that records transparency characteristics in an image in 256 grayscale, defining transparent, opaque, and translucent regions, where white indicates opaque, black indicates transparent, and gray indicates translucent.

6) One of the migration learning and machine learning methods is to reuse the model developed for task a as an initial point in the process of developing the model for task B. The transfer Learning may include Zero-order Learning (Zero-shot Learning), one-time Learning (One-shot Learning), and a small-amount Learning (Few-shot Learning).

7) Zero-order learning means that there is no sample of a certain class in the training set, but if a powerful mapping can be learned, this mapping can be seen even when training, but the features of this new class can also be obtained.

8) One learning means that each category in the training set has samples, but only a small number of samples (only one or a few). At this point, a generalized map can be learned over a larger dataset and then updated over a small dataset. It is also understood that a neural network model is trained with a large amount of training data, and that parameters of the neural network model can be updated with a small amount of samples.

In order to better understand the image processing method provided in the embodiment of the present application, first, a description will be given of an art word batch manufacturing scheme and existing drawbacks in the related art.

In the related art, the artistic word can be designed by a designer through a Photoshop style or a hollowed-out spliced base diagram mode, so that the effect can be rapidly applied to other words in batches without the need of the designer to reproduce. Fig. 1A is a comparison of the original text and the artistic word, and as can be seen from fig. 1A, the artistic word 113 has a rich texture effect relative to the original text 111, and the artistic word 112 also has effects such as shading, edges, etc. relative to the original text, which also changes (e.g. widens) the shape of the word. Therefore, in the related art, 60% of more complicated artistic word effects are manually drawn by a designer, and thus, these effects cannot be applied to other words through a program, for example, after the artistic effect of the "limited" word is obtained, the effect of the "spring-loaded doll" also needs to be drawn again by the designer, thus requiring a great deal of investment of manpower and material resources.

Fig. 1B is an example comparison diagram of the RGB and RGBA superimposing effects, and in fig. 1B, the blue portion is a background, 121 is an effect of superimposing an RGB image on the background image, and 122 is an effect of superimposing an RGBA image on the background image. As shown in fig. 1B, the RGB diagram is directly superimposed into the background diagram, and no artistic word is obtained. The existing art word making method is used for intelligently generating the RGB image, so that the RGB image cannot be directly used for synthesizing the background image of Banner.

In the related art, the scheme of generating RGB format art words mainly includes a rule class and a deep learning class.

1) The best method of rule class is: t-effects, awesome typography:statics-based text effects transfer.

2) The best method for deep learning class is: tetgan, i.e., TET-GAN, text Effects Transfer via Stylization and Destylization.

Fig. 1C is a comparison of an art word created manually and an art word created automatically in the related art, wherein part 131 in fig. 1C is given a word "art" and a corresponding art word created manually, and when a new word, such as "P", is given, an art word 132 having the same effect as "art" is created intelligently in the related art, and it can be seen through comparison of 131 and 132 that a large amount of speckle noise exists in the automatically created art word 132.

The disadvantages of the art word making method in the related art mainly include the following:

first, none of the presently disclosed methods is capable of generating transparent drawings, and thus is incapable of meeting the product requirements;

secondly, in terms of the generation effect, the current technical scheme for automatically generating artistic words has more speckle noise;

third, automatic generation of artistic words using related art is prone to errors in generation.

Fig. 1D is another comparison of an artificial artistic word and an automatically generated artistic word in the related art, in which a portion 141 in fig. 1D is given a word "hn" and a corresponding artificial artistic word, and when a new word, for example, "commander", is given, an artistic word 142 having the same effect as "hn" is intelligently generated in the related art, and it can be seen through comparison of 141 and 142 that the background color and the contour color of the automatically generated artistic word 142 are opposite, that is, the method of automatically generating the artistic word is utilized, and the automatic artistic word is not correctly generated, but an error occurs.

In the related art, the generated RGB format artistic word can be combined with matting and clipping to obtain the RGBA format artistic word, but the disadvantages mainly include:

First, the value of the a channel is not accurately estimated, only 0 and 255;

second, many art word effect figures, such as white word effect and white background, are difficult to separate;

thirdly, fine matting, such as flame words, is difficult;

fourth, the trained matting model is difficult to apply to new artwork matting because the new artwork is characterized by a large difference from the previous training set artwork in the characteristics used to distinguish text from background.

Based on the above, the embodiment of the application provides an image processing method, which uses an Attention (Attention) mechanism to extract and generate Alpha channels based on the existing RGB artistic word generation model; and the artistic character outline is reformed according to the extracted Alpha, so that not only can the artistic character of RGBA be intelligently generated, but also the speckle noise outside the artistic character can be reduced, the batch generation of the artistic character which can be actually used is realized, and the manpower and material resources are reduced.

An exemplary application of an apparatus implementing the embodiments of the present application is described below, where the apparatus provided in the embodiments of the present application may be implemented as a server. In the following, an exemplary application covering a server when the apparatus is implemented as a server will be described.

Referring to fig. 1E, fig. 1E is a schematic diagram of a network architecture of an image processing method according to an embodiment of the present application, and as shown in fig. 1E, the network architecture includes at least a terminal 100, a server 200 and a network 300. To enable supporting one exemplary application, terminal 100 is connected to server 200 via network 300, network 300 may be a wide area network or a local area network, or a combination of both, using a wireless link to enable data transmission.

When the image processing method provided in the embodiment of the present application is implemented by using the network architecture shown in fig. 1E, a user may send, through the terminal 100, a text image with a target display effect as a reference image and a text image to be converted into the target display effect as a to-be-processed image to a server, and after receiving the to-be-processed image and the reference image, the server performs one-shot learning on a trained neural network model through the reference image to adjust parameters of the neural network model, and inputs the to-be-processed image into the adjusted neural network model, so as to obtain the target image with the target display effect. After obtaining the target image, the server 200 may send the target image to the terminal 100, and after receiving the target image, the terminal 100 may display the target image on its own graphical interface.

In the embodiment of the present application, the reference image and the image to be processed are RGBA images, that is, the target image with transparency information is obtained through final processing, so that the reference image and the image to be processed can be displayed as separate images and can also be directly superimposed into the background image.

Fig. 1F is a schematic diagram of still another network architecture of the image processing method according to the embodiment of the present application, as shown in fig. 1F, in which only the terminal 100 is included. After obtaining the user requirement, that is, obtaining the template image with the target display effect and the image to be processed to be converted into the target display effect, the terminal 100 performs one-shot learning on the trained neural network model through the template image to adjust the parameters of the neural network model, and inputs the image to be processed into the adjusted neural network model, so that the target image with the target display effect can be obtained. After obtaining the target image, the terminal 100 may display the target image on its own graphical interface.

The apparatus provided in the embodiments of the present application may be implemented in hardware or a combination of hardware and software, and various exemplary implementations of the apparatus provided in the embodiments of the present application are described below.

The server 200 may be a single server, a server cluster formed by multiple servers, a cloud computing center, etc., and other exemplary structures of the server 200 may be foreseen according to the exemplary structure of the server 200 shown in fig. 2, so that the structures described herein should not be considered as limiting, for example, some components described below may be omitted, or components not described below may be added to adapt to specific requirements of some applications.

The server 200 shown in fig. 2 includes: at least one processor 210, a memory 240, at least one network interface 220, and a user interface 230. Each of the components in terminal 200 are coupled together by a bus system 250. It is understood that the bus system 250 is used to enable connected communications between these components. The bus system 250 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 250 in fig. 2.

The user interface 230 may include a display, keyboard, mouse, touch pad, touch screen, and the like.

The memory 240 may be volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM). The volatile memory may be random access memory (RAM, random Access Memory). The memory 240 described in the embodiments herein is intended to comprise any suitable type of memory.

The memory 240 in the embodiments of the present application is capable of storing data to support the operation of the server 200. Examples of such data include: any computer program for operating on server 200, such as an operating system and application programs. The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application may comprise various applications.

As an example of a method provided by an embodiment of the present application implemented in software, the method provided by an embodiment of the present application may be directly embodied as a combination of software modules executed by the processor 210, the software modules may be located in a storage medium, the storage medium is located in the memory 240, and the processor 210 reads executable instructions included in the software modules in the memory 240, and the method provided by an embodiment of the present application is completed in combination with necessary hardware (including, for example, the processor 210 and other components connected to the bus 250).

By way of example, the processor 210 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

Methods of implementing embodiments of the present application will be described in conjunction with the foregoing exemplary applications and implementations of an apparatus implementing embodiments of the present application.

For a better understanding of the method provided by the embodiments of the present application, first, artificial intelligence, various branches of the artificial intelligence, and application fields related to the method provided by the embodiments of the present application will be described.

Artificial intelligence (AI, artificial Intelligence) is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions. Each direction will be described below.

Computer Vision (CV) is a science of studying how to "look" at a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for human eyes to observe or transmit to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

Key technologies To Speech technology (Speech Technology) are automatic Speech recognition technology (ASR, automatic Speech Recognition) and Speech synthesis technology (TTS, text To Speech) and voiceprint recognition technology. The method can enable the computer to listen, watch, say and feel, is the development direction of human-computer interaction in the future, and voice becomes one of the best human-computer interaction modes in the future.

Natural language processing (NLP, nature Language processing) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Machine Learning (ML) is a multi-domain interdisciplinary discipline involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like.

The automatic driving technology generally comprises high-precision map, environment perception, behavior decision, path planning, motion control and other technologies, and has wide application prospect.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

The scheme provided by the embodiment of the application relates to artificial intelligence natural language processing and other technologies, and is specifically described through the following embodiments.

Referring to fig. 3A, fig. 3A is a schematic flowchart of an implementation of an image processing method according to an embodiment of the present application, which may be applied to an image processing apparatus, where the image processing apparatus may be the server 200 in fig. 1E or the terminal 100 in fig. 1F, and will be described with reference to the steps shown in fig. 3A.

In step S101, the image processing apparatus acquires the color feature and the transparency feature of the text image to be processed and the color feature and the transparency feature of the template text image.

Here, the text image to be processed and the template text image are both RGBA images. The template text image and the text image to be processed both comprise text characters, wherein the text characters can be Chinese characters, english letters, japanese, korean and the like, and the embodiment of the application does not limit the languages of the text characters.

The template text image refers to a text image with a certain artistic effect, that is, text characters in the template text image are not characters in a conventional format, but have a certain artistic effect, for example, can have a flame effect, have a stone-like texture effect, and the like. The text characters in the text image to be processed are conventional characters without artistic effect.

Color characteristics, which may be referred to as color characteristic information or color information, refer to RGB information of an image, that is, each channel value of R, G, B channels of each pixel point in the image. Transparency characteristics, which may also be referred to as transparency information, refer to channel values of the a-channels in the image.

After the image processing device acquires the text image to be processed and the template text image, the image may be subjected to feature extraction to extract excellent features and transparency features.

In step S102, the image processing apparatus acquires a trained neural network model.

Here, the trained neural network model is trained using at least a training image having transparency characteristics.

When step S102 is implemented by the server 200 in fig. 1E, it may be that the server acquires a neural network model trained by itself using a training image having transparency characteristics.

When step S102 is implemented by the terminal 100 in fig. 1F, the terminal 100 may acquire, from the server 200, a neural network model trained by the server using the training image having the transparency characteristic, or the terminal 100 may acquire a neural network model trained by itself using the training image having the transparency characteristic.

That is, the trained neural network model may be trained by a server or by a terminal, but is generally trained by a server because the calculation amount of training the neural network is large and the calculation capability of the device is required to be high.

And step S103, the image processing equipment carries out migration learning on the trained neural network model based on the color characteristics and the transparency characteristics of the template text image to obtain an adjusted neural network model.

Here, since the artistic effect in the training text image and the artistic effect in the template text image are generally different when training the neural network model, for example, when training the neural network, the text in the training text image is a flame effect, the trained neural network can only convert the conventional text image into the text image with the flame effect, and the user now wants to generate the text image with the stone texture effect, and obviously, the trained neural network cannot generate the text image with the stone texture effect.

At this time, the trained neural network model needs to be subjected to transfer learning through the color features and the transparency features of the template text image so as to adjust the parameters of the trained neural network, thereby enabling the adjusted neural network model to process the text image to be processed into the target text image with the same display effect of the template text image.

And step S104, the image processing equipment processes the color characteristics and the transparency characteristics of the character image to be processed through the adjusted neural network model to obtain a target character image.

Here, the display effect of the target text image is the same as the display effect of the template text image.

When the step S104 is implemented, the color features and the transparency features of the text image to be processed may be input into the adjusted neural network model, and the adjusted neural network model processes the color features and the transparency features of the text image to be processed, so as to obtain the target text image with the same display effect as the template text image.

In the image processing method provided by the embodiment of the application, firstly, the to-be-processed text image and the template text image in the RGBA format are acquired, the color feature and the transparency feature of the to-be-processed text image and the color feature and the transparency feature of the template text image are further acquired, then the trained neural network model is subjected to transfer learning based on the color feature and the transparency feature of the template text image, and the adjusted neural network model is obtained, so that the to-be-processed text image can be converted through the adjusted neural network model, and further, the target text image with transparency information is directly generated due to the fact that the to-be-processed image with transparency information is processed through the adjusted neural network, so that not only can the batch generation of the target text image be realized, but also the generation efficiency of artistic words can be improved, and the target text image in the RGBA format can be directly synthesized with the background image, and the synthesis efficiency of the synthesized image can be improved.

In some embodiments, before step S101, the preset neural network model may be trained through steps S21 to S24 as shown in fig. 4, to obtain a trained neural network model:

step S21, obtaining the actual color characteristics and the actual transparency characteristics of a plurality of training text images.

Wherein, the display effect of each training text image is the same. In some embodiments, before step S21, a plurality of training text images may be acquired, where the training text images may be training text images that have the same display effect and are crawled from a network, may be manually designed, may be generated by a design tool or design software, and may only generate training text images with a single display effect. After a plurality of training text images are obtained, feature extraction is performed on each training text image so as to obtain actual color features and actual transparency features of each training text image.

It should be noted that, in the embodiment of the present application, the actual color feature of the text image refers to the color feature extracted from the text image, and correspondingly, the actual transparency feature of the text image refers to the transparency feature extracted from the text image.

Step S22, acquiring source character images corresponding to the training character images, and acquiring actual color features and actual transparency features of the source character images.

Here, since the training text image is a text image having a certain artistic display effect, for example, a flame effect, a flower effect, a stone texture effect, etc., and is not a conventional text image, in order to be able to obtain a mapping relationship from a conventional source text image to the training text image, it is also necessary to obtain an actual color feature and an actual transparency feature of the source text image corresponding to the training text image at this time as a part of the training data.

And S23, determining the actual color characteristics and the actual transparency characteristics of the training text images, and the actual color characteristics and the actual transparency characteristics of the source text images as training data.

And step S24, training a preset neural network model based on the training data to obtain a trained neural network model.

Here, in the embodiment of the present application, a convolutional neural network model may be utilized, and further, a U-Net neural network model may be utilized.

When the step S24 is implemented, the training data may be used to train the preset neural network model, and when the neural network model can learn how to change from the source text image to the training text image, the trained neural network model may be considered to be obtained.

In some embodiments, the neural network model includes at least: the first skeleton extraction model, the source Wen Zifu original model, the target character production model, and the transparency generation model, respectively, step 24 shown in fig. 4 may be implemented by:

step S241, training the first skeleton extraction model, the second skeleton extraction model and the source character restoration model according to the actual color characteristics of the first source character image and the actual color characteristics of the first training character image in the training data so as to realize shape retention;

here, step S241 may be implemented by:

step S2411, inputting the actual color features of the first source text image in the training data into a first skeleton extraction model to obtain source text skeleton information.

Here, the source text skeleton information may be outline information of each stroke in the text characters included in the source text image, or may be centerline information of each stroke.

Step S2412, inputting the actual color features of the first training text image in the training data into a second skeleton extraction model to obtain first training text skeleton information.

Here, the first source text image and the first training text image are corresponding, that is to say the text characters in the first source text image are identical to the text characters in the first training image. For example, if the text character in the source text image is "yes", then the text character in the first training text image is also "yes", that is, the artistic word "yes" is included in the first training text image.

Correspondingly, the training text skeleton information can be outline information of each stroke in text characters contained in the training text image, and also can be midline information of each stroke. However, it should be noted that the source text skeleton information and the training text skeleton information are corresponding, that is, if the source text skeleton information is outline information of each stroke in the text characters included in the source text image, the training text skeleton information is outline information of each stroke in the text characters included in the training text image; if the source text skeleton information is the centerline information of each stroke in the text characters contained in the source text image, the training text skeleton information is the centerline information of each stroke in the text characters contained in the training text image.

Step S2413, performing prediction processing based on the source text skeleton information and the first training text skeleton information by using a source text restoration model to obtain a predicted color feature of the first source text image.

Here, in the implementation of step S2413, the source text skeleton information and the first training text skeleton information are respectively input into the source text restoration model, and the predicted color features of the first source text image are respectively obtained.

Step S2414, the difference between the predicted color feature and the actual color feature of the first source text image is counter-propagated in the first skeleton extraction model, the second skeleton extraction model and the source text recovery model to update the parameters of the first skeleton extraction model, the second skeleton extraction model and the source text recovery model.

Here, during training, the difference value between the predicted color feature and the actual color feature of the first source text image is counter-propagated in the first skeleton extraction model, the second skeleton extraction model and the source text restoration model, so as to update parameters (e.g., weights, thresholds, etc.) of the first skeleton extraction model, the second skeleton extraction model and the source text restoration model until the difference value between the predicted color feature and the actual color feature of the first source text image meets a preset first optimization target. The first skeleton extraction model, the second skeleton extraction model, and the source Wen Zifu original model, which can realize shape retention, can be obtained by steps S2411 to S2413.

Step S242, training the first skeleton extraction model, the second skeleton extraction model and the target text generation model according to the actual color features of the first source text image, the actual color features of the first training text image and the actual color features of the second training text image in the training data, so as to realize texture generation;

here, step S242 may be implemented by:

step S2421, the color characteristics of the second training text image are input into the second skeleton extraction model to obtain the second training text skeleton information.

Here, the second training text image is an image different from the first training text image. The second skeleton extraction model in step S2421 may be the second skeleton extraction model trained in step S241.

Step S2422, performing prediction processing based on the source text skeleton information and the second training text skeleton information by using a target text generation model, to obtain a predicted color feature of the first training text image.

Here, when step S2422 is implemented, the source text skeleton information and the second training text skeleton information are spliced, and the spliced skeleton information is input into the target text generation model, so as to obtain the predicted color feature of the first training text image.

Because the source text skeleton information and the second training text skeleton information are presented in the form of matrixes, the source text skeleton information and the second training text skeleton information are spliced, namely, the two matrixes are spliced, and correspondingly, the obtained spliced skeleton information is a spliced matrix obtained by splicing the two matrixes.

Step S2423, the difference between the predicted color feature and the actual color feature of the first training text image is back propagated in the first skeleton extraction model, the second skeleton extraction model and the target text generation model, so as to update the parameters of the first skeleton extraction model, the second skeleton extraction model and the target text generation model.

Here, when step S2423 is implemented, the difference between the predicted color feature and the actual color feature of the first source text image is counter-propagated in the first skeleton extraction model, the second skeleton extraction model, and the target text generation model, so as to update the parameters (e.g., the weights, the thresholds, etc.) of the first skeleton extraction model, the second skeleton extraction model, and the target text generation model until the difference between the predicted color feature and the actual color feature of the first source text image meets the preset second optimization objective. The first skeleton extraction model, the second skeleton extraction model, and the target character generation model, which can realize texture generation, can be obtained through steps S2421 to S2423.

Step S243, training the first skeleton extraction model, the second skeleton extraction model and the target character generation model according to the actual color characteristics of the first source character image, the actual color characteristics and the actual transparency characteristics of the first training character image so as to realize transparency information generation.

Here, step S243 may be implemented by:

step S2431, performing prediction processing based on the actual color feature of the first source text image and the actual color feature of the first training text image through a transparency generation model, and performing back propagation on the obtained difference between the predicted transparency feature of the first training text image and the actual transparency feature of the first training text image in the transparency generation model to update the parameters of the transparency generation model.

Here, when implementing step S2431, the actual color features of the first source text image and the actual color features of the first training text image are spliced and synthesized to obtain synthesized actual color features, and then the synthesized actual color features are input into the transparency generation model for prediction processing to obtain predicted transparency features of the first training text image; and then, the difference value between the predicted transparency characteristic and the actual transparency characteristic of the first training text image is back-propagated in the transparency generation model to update the parameters of the transparency generation model until the difference value between the predicted transparency characteristic and the actual transparency characteristic of the first training text image reaches a third optimization target.

Step S2432 obtains a first associated feature based on the predicted color feature and the predicted transparency feature of the first training text image.

Here, since the predicted color feature and the predicted transparency feature of the first training text image are both in matrix form, step S2432 may be implemented by performing an element-by-element product operation on the predicted color matrix corresponding to the predicted color feature and the transparency matrix corresponding to the predicted transparency feature of the first training text image, to obtain the first correlation matrix, that is, the first correlation feature. It should be noted that the dimensions of the predicted color matrix and the predicted transparency matrix are the same.

Step S2433 obtains a second associated feature based on the actual color feature and the actual transparency feature of the first training text image.

Here, when implementing step S2433, performing an element-by-element product operation on the actual color matrix corresponding to the actual color feature of the first training text image and the actual transparency matrix corresponding to the actual transparency feature, to obtain a second correlation matrix, that is, the second correlation feature.

And step S2434, back-propagating the first skeleton extraction model, the second skeleton extraction model and the target character generation model based on the difference value of the first association feature and the second association feature so as to update parameters of the first skeleton extraction model, the second skeleton extraction model and the target character generation model.

Here, when implementing step S2434, the difference value between the first associated feature and the second associated feature may be back-propagated in the first skeleton extraction model, the second skeleton extraction model, and the target text generation model, so as to update the parameters of the first skeleton extraction model, the second skeleton extraction model, and the target text generation model, until the difference value between the first associated feature and the second associated feature meets the fourth optimization objective. The first skeleton extraction model and the second skeleton extraction model used in step S2434 are the first skeleton extraction model and the second skeleton extraction model obtained through step S241 and step S242.

A neural network model capable of changing the source text image to the training text image is obtained through steps S241 to S243, and the obtained training text image is an RGBA image having transparency information.

In some embodiments, when training a preset neural network to obtain a trained neural network model, the method of performing batch training on the first skeleton extraction model, the second skeleton extraction model, the source Wen Zifu original model, the target text generation model and the transparency generation model through the actual color features and the actual transparency features of the source text image and the actual color features and the actual transparency features of the training text image, that is, the training data required by all models is read in once, and the parameters of each model are adjusted through a training algorithm until the whole neural network model converges to the specified precision, so that the phenomenon of network forgetting can be overcome, and the training speed is accelerated.

The RGBA-format target text image is obtained through steps S101 to S104, and can be output as a single image, and can be added into some background images for striking and displaying. Thus, in some embodiments, as shown in fig. 3B, after step S104, the method further comprises:

step S105, obtaining the placement positions of the background image and the target text image.

And step S106, based on the placement position, overlapping the target text image into the background image to obtain a composite image.

Here, since the target text image is an RGBA image having transparency information, it is sufficient to directly superimpose the target text image on the background image at the time of image synthesis without performing processing such as matting.

Step S107, outputting the synthesized image.

Here, the composite image may be output by being displayed in a graphical interface of an image processing apparatus, or may be transmitted to a terminal.

Through steps S105 to S107, the generated target text image can be directly superimposed on the background image to obtain the composite image, without performing matting processing on the target text image, so that the efficiency of image synthesis can be improved.

Based on the foregoing embodiments, the embodiments of the present application further provide an image processing method, and fig. 5 is a schematic implementation flow chart of the image processing method of the embodiments of the present application, as shown in fig. 5, where the method includes:

step S301, the terminal acquires word information to be processed and a template word image.

Here, the template text image may be a manually designed text image having a certain artistic effect. The text information to be processed may be text characters that are intended to be converted to the same display effect as the template text image, and in some embodiments may be text images of conventional display effect of the text characters.

Step S302, the terminal sends the word information to be processed and the template word image to the server.

The terminal sends the text information to be processed and the template text image to the server so as to request the server to convert text characters in the text information to be processed into the text image with the same effect as the template text image.

Step S303, the server acquires the color characteristics and the transparency characteristics of the acquired character image to be processed and the color characteristics and the transparency characteristics of the template character image.

Here, when the terminal transmits the character to be processed to the server, the server is required to convert the character to be processed into the character image to be processed based on the preset conversion condition before step S303.

The server performs feature extraction on the character image to be processed and the target character image to obtain color features and transparency features of the character image to be processed and color features and transparency features of the template character image.

Step S304, the server acquires a trained neural network model.

The trained neural network model is obtained by training the server by at least using the training image with the transparency characteristic, and can also be obtained by training other computer equipment by using the training image with the transparency characteristic.

And step S305, the server performs migration learning on the trained neural network model based on the color characteristics and the transparency characteristics of the template text image to obtain an adjusted neural network model.

Here, when implementing step S305, the server may perform one shot learning or few shot learning on the trained neural network model based on the color feature and the transparency feature of the template text image, to obtain the adjusted neural network model.

And step S306, the server processes the color characteristics and the transparency characteristics of the character image to be processed through the adjusted neural network model to obtain a target character image.

Here, since the adjusted neural network model is obtained by using the template text image to the trained neural network model, the target text image obtained by processing the color features and the transparency features of the text image to be processed through the adjusted neural network model is an image with the same display effect as the template text image.

In step S307, the server transmits the target text image to the terminal.

Here, after receiving the target text image, the terminal may directly output and display the target text image, or further synthesize the target text image with the background image through steps S308 to S309 to obtain a synthesized image, so that the target text in the synthesized image is more striking.

In step S308, the terminal acquires the background image and the placement position of the target text image.

Step S309, the terminal synthesizes the target text image and the background image based on the position information to obtain a synthesized image;

in step S310, the terminal outputs the composite image.

It should be noted that the same steps or concepts in the above-described image processing method as those in the other embodiments may be explained with reference to the descriptions in the other embodiments.

In the image processing method provided by the embodiment of the application, after the terminal acquires the to-be-processed text information and the template text image, the to-be-processed text information and the template text image are sent to the server, the server carries out migration learning on the trained neural network model according to the target text image to obtain an adjusted neural network model, and processes the to-be-processed text image through the adjusted neural network model, so that the target text image with the same display effect as the template text image is obtained, and the target text image is an RGBA format image, can be independently output and displayed and can be directly overlapped into a background image to obtain a synthesized image which does not block the background image, so that batch rapid generation of the text image with an artistic display effect can be realized, and the generated text image can be directly overlapped into the background image, thereby reducing the processing difficulty and further improving the processing efficiency.

Based on the foregoing embodiments, an embodiment of the present application further provides an image processing method, which is applied to the network architecture shown in fig. 1E, where the method includes:

and step 41, the terminal acquires the character image to be processed and the template character image.

Here, the text image to be processed acquired by the terminal may be obtained according to the text character to be processed input by the user. For example, the text characters to be processed input by the user may be converted into the text images to be processed according to a preset conversion rule.

And 42, the terminal acquires the color characteristics and the transparency characteristics of the acquired character image to be processed and the color characteristics and the transparency characteristics of the template character image.

The terminal performs feature extraction on the character image to be processed and the target character image to obtain color features and transparency features of the character image to be processed and color features and transparency features of the template character image.

And step 43, the terminal acquires the trained neural network model from the server.

Here, the trained neural network model is obtained by training the server at least using a training image having transparency characteristics, and the training process may refer to steps S21 to S24.

And step 44, the terminal performs migration learning on the trained neural network model based on the color characteristics and the transparency characteristics of the template text image to obtain an adjusted neural network model.

Here, when step 44 is implemented, the terminal may perform one shot learning or few shot learning on the trained neural network model based on the color feature and the transparency feature of the template text image, to obtain an adjusted neural network model.

And 45, the terminal processes the color characteristics and the transparency characteristics of the character image to be processed through the adjusted neural network model to obtain a target character image.

Similarly, after the terminal generates the target text image, the target text image may be directly output and displayed, or the target text image and the background image may be further synthesized through steps 46 to 48 to obtain a synthesized image, and output.

And step 46, the terminal acquires the background image and the position information of the target text image in the background image.

Step 47, the terminal synthesizes the target text image and the background image based on the position information to obtain a synthesized image;

the terminal outputs the composite image, step 48.

In the image processing method provided by the embodiment of the application, after the to-be-processed text image and the template text image are acquired, the terminal extracts the color characteristics and the transparency characteristics of the to-be-processed text image and the template text image, and carries out migration learning on the trained neural network model acquired from the server through the target text image so as to acquire the adjusted neural network model, and processes the to-be-processed text image through the adjusted neural network model, so that the target text image with the same display effect as the template text image is acquired, and the target text image is an RGBA format image, can be independently output and displayed and can be directly superimposed into a background image to acquire a synthetic image which does not block the background image, so that batch rapid generation of the text image with an artistic display effect can be realized, the generated text image can be directly superimposed into the background image, the processing difficulty can be reduced, and further processing efficiency can be improved.

In some embodiments, the terminal may send the template text image to the server, the server obtains the color feature and the transparency feature of the template text image, then the server adjusts the trained neural network model according to the color feature and the transparency feature of the template text image to obtain an adjusted neural network model, and then sends the adjusted neural network model to the terminal, the terminal may convert the text image to be processed determined by the user into a target text image with the same display effect as the template text image through the adjusted neural network model, and the obtained target text image may be output alone, or further superimpose the target text image into the background image to obtain a composite image and output.

In the following, an exemplary application of the embodiments of the present application in a practical application scenario will be described.

The embodiment of the application provides a technical scheme capable of intelligently generating the RGBA artistic words and being superior to the traditional method in the effect of the dimension of RGB. In the embodiment of the application, alpha channel extraction and generation are performed by using an Attention mechanism based on the existing RGB artistic word generation model; and carrying out artistic word contour reformation according to the extracted Alpha and enhancing by combining additional data. By the method provided by the embodiment of the application, the artistic word of RGBA can be directly and intelligently generated, and the speckle noise outside the artistic word is reduced; the method can also avoid the error of generating certain artistic words, realize batch generation of the artistic words which can be actually used, reduce manpower and material resources, and provide great product differentiation and competitiveness for related products of intelligent design.

Fig. 6 is a schematic diagram of an implementation flow of an art word generating method according to an embodiment of the present application, as shown in fig. 6, where the method includes the following steps:

step S601, an artistic word effect diagram in RGBA format is obtained.

Here, the RGBA format of the effect diagram of the artistic word includes RGB information of the artistic word and transparency feature a. In the embodiment of the application, RGB information of an art word in RGBA format is expressed asThe corresponding transparency characteristic is denoted +.>. Meanwhile, the original characters corresponding to the artistic words in the RGB format are required to be obtained, and the RGB information and the transparency characteristics of the original characters are respectively marked as +.>And->. In this way the first and second light sources,x、x _m 、y、y _m thus formingTraining data is provided.

In the embodiments of the present application, it is desirable to build a model that can learn how to learn from x,]to [ ]>,/>]. The artistic word effect diagram in the RGBA format in step S601 may be downloaded from the internet, may be generated by a certain algorithm, or may be manufactured manually, and in this embodiment, the source of the artistic word effect diagram in the RGBA format is not limited. Of course, the number of the obtained artistic word effects is not limited, but it is ensured that a trained model can be obtained by using training data, and in the embodiment of the application, 3 ten thousand or more artistic word effect diagrams in RGBA format are used.

Step S602, shape retention.

In combination with the general U-net, we can define three neural network models、/>And->Wherein, can pass throughAnd->Main skeleton information of original character x and artistic character y is extracted respectively, and then ∈10->Or alternativelyRestoring original text->. Training parameters of the respective neural network model by means of training data to reduce +.>Difference from the actual original text x up to +.>The difference from the actual original text x meets the optimization objective.

In step S603, a texture is generated.

Given original text x, another original textOriginal text->Corresponding artistic word->Step S603 can be implemented by adding +.>And->Splicing to obtain a spliced matrix, and taking the spliced matrix as a neural network model +.>Generates the artistic word ++corresponding to the original word x>. I.e. by the following network combination->Generating the artistic word ++corresponding to x>. By continuously training the neural network model, the +.>And errors of the real artistic word y, and finally obtaining a trained neural network model +.>、/>And->And generating artistic words corresponding to the original words through the trained neural network model.

In step S604, transparency is generated.

Here, step S604 may be implemented by a neural network modelTo generate->And by training the network model ∈ ->To continuously decrease->And->The gap is left. In addition, there is a need for reduction through training networksAnd->Error between, wherein->Is an element-wise product.

Step S605, one-shot learning.

Through the implementation process of steps S601 to S604, for each style of artistic word generated, a certain amount of artistic word of the corresponding style is required as training data. In practice, however, the sample designer for each style that can be trained typically provides only one or two. The neural network model obtained by training on more than 3 ten thousand graphs in the steps from S601 to S604 can be saved. When a pair of new style samples exist, the stored neural network model can iterate for a plurality of times on the pair of new style samples so as to adjust parameters of the neural network model, an adjusted neural network model is obtained, and the adjusted neural network model can be used for generating artistic words of the style.

Step S606, outputting a generation result.

Fig. 7A and 7B are artwork effect diagrams generated by using the artwork manufacturing method provided in the embodiments of the present application. In fig. 7A, 701 is an artistic word of an artificial design, 702 is an artistic word generated by using the method provided by the embodiment of the present application, 711 is an artistic word of an artificial design, 712 is an artistic word generated by using the method provided by the embodiment of the present application, and as can be seen by comparing 701 and 702 with 711 and 712, the artistic word generated by using the method provided by the embodiment of the present application has no redundant noise points, and has the same effect as the artificial design artistic word, and no generation error.

In addition, the artistic word effect graphs 702 and 712 generated by the embodiment of the application are RGBA effect graphs with transparency characteristics, so that the effect graph can be directly dragged to a background graph of Banner to realize non-occlusion display. Fig. 8 is an effect diagram obtained by adding the generated art word "water chestnut-containing bag" with flame effect to the background image of Banner.

Because the existing design product of the artificial intelligence generation Banner comprises an deer class system in the Ari, the function of making artistic words is not provided, but the embodiment of the application can provide a method for generating RGBA artistic words, further automatic, batch and intelligent foundation is laid for subsequent intelligent design, and great product differentiation and competitiveness are provided for the design product; moreover, by the aid of the method, artistic words can be manufactured in batches rapidly and effectively, and accordingly manpower and material resources can be greatly reduced. In addition, the artistic word generated by the embodiment of the application has the transparency characteristic, so that the artistic word making tool derived by the embodiment of the application can be used as an independent product, and can also enable an intelligent Banner design, thereby bringing great economic benefit.

An exemplary architecture of software modules is described below, and in some embodiments, as shown in fig. 2, the software modules in the apparatus 440, that is, the image processing apparatus 80, may include:

a first obtaining module 81, configured to obtain color features and transparency features of a text image to be processed and color features and transparency features of a template text image;

a second acquisition module 82 for

The adjustment module 83 is configured to perform migration learning on the trained neural network model based on the color features and the transparency features of the template text image, so as to obtain an adjusted neural network model;

and the processing module 84 is configured to process the color feature and the transparency feature of the text image to be processed through the adjusted neural network model to obtain a target text image, where the display effect of the target text image is the same as the display effect of the template text image.

In some embodiments, the apparatus further comprises:

the third acquisition module is used for acquiring the actual color characteristics and the actual transparency characteristics of a plurality of training text images, wherein the display effects of the training text images are the same;

a fourth obtaining module, configured to obtain source text images corresponding to the plurality of training text images, and obtain actual color features and actual transparency features of the source text images;

The first determining module is used for determining the actual color characteristics and the actual transparency characteristics of the training text images and the actual color characteristics and the actual transparency characteristics of the source text images as training data;

and the training module is used for training a preset neural network model based on the training data to obtain a trained neural network model.

In some embodiments, the neural network model includes at least: the training module is further configured to:

training the first skeleton extraction model, the second skeleton extraction model and the source character recovery model according to the actual color characteristics of the first source character image and the actual color characteristics of the first training character image in the training data so as to realize shape retention;

training the first skeleton extraction model, the second skeleton extraction model and the target character generation model according to the actual color characteristics of the first source character image, the actual color characteristics of the first training character image and the actual color characteristics of the second training character image in the training data so as to realize texture generation;

Training the first skeleton extraction model, the second skeleton extraction model and the target character generation model according to the actual color characteristics of the first source character image, the actual color characteristics and the actual transparency characteristics of the first training character image so as to realize transparency information generation.

In some embodiments, the training module is further to:

inputting the actual color characteristics of a first source text image in the training data into a first skeleton extraction model to obtain source text skeleton information;

inputting the actual color characteristics of the first training text image in the training data into a second skeleton extraction model to obtain first training text skeleton information;

performing prediction processing based on the source character skeleton information and the first training character skeleton information through a source character resetting model to obtain a predicted color characteristic of the first source character image;

and carrying out back propagation on the difference value between the predicted color characteristic and the actual color characteristic of the first source character image in the first skeleton extraction model, the second skeleton extraction model and the source character restoration model so as to update the parameters of the first skeleton extraction model, the second skeleton extraction model and the source character restoration model.

In some embodiments, the training module is further to:

inputting the color characteristics of the second training text image into a second skeleton extraction model to obtain second training text skeleton information;

performing prediction processing based on the source character skeleton information and the second training character skeleton information through a target character generation model to obtain a predicted color characteristic of the first training character image;

and carrying out back propagation on the difference value between the predicted color characteristic and the actual color characteristic of the first training text image in the first framework extraction model, the second framework extraction model and the target text generation model so as to update the parameters of the first framework extraction model, the second framework extraction model and the target text generation model.

In some embodiments, the training module is further to:

performing prediction processing based on the actual color characteristics of the first source character image and the actual color characteristics of the first training character image through a transparency generation model, and performing back propagation in the transparency generation model to update parameters of the transparency generation model by using the obtained difference value between the predicted transparency characteristics of the first training character image and the actual transparency characteristics of the first training character image;

Obtaining a first associated feature based on the predicted color feature and the predicted transparency feature of the first training text image;

obtaining a second associated feature based on the actual color feature and the actual transparency feature of the first training text image;

and back-propagating the first skeleton extraction model, the second skeleton extraction model and the target character generation model based on the difference value of the first association characteristic and the second association characteristic so as to update parameters of the first skeleton extraction model, the second skeleton extraction model and the target character generation model.

In some embodiments, the apparatus further comprises:

a fifth acquisition module for acquiring a background image;

the synthesizing module is used for superposing the target text image into the background image to obtain a synthesized image;

and the output module is used for outputting the synthesized image.

As an example of a hardware implementation of the method provided by the embodiments of the present application, the method provided by the embodiments of the present application may be performed directly by the processor 410 in the form of a hardware decoding processor, e.g., by one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex programmable logic devices (CPLDs, complex Programmable Logic Device), field programmable gate arrays (FPGAs, fields-Programmable Gate Array), or other electronic components.

The present embodiments provide a storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform the methods provided by the embodiments of the present application, for example, the methods shown in fig. 3A, 3B, 4, and 5.

In some embodiments, the storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and scope of the present application are intended to be included within the scope of the present application.

Claims

1. An image processing method, the method comprising:

acquiring actual color features and actual transparency features of a plurality of training text images, wherein the display effects of the plurality of training text images are the same;

acquiring source character images corresponding to the training character images, and acquiring actual color features and actual transparency features of the source character images;

determining the actual color features and the actual transparency features of the training text images and the actual color features and the actual transparency features of the source text images as training data;

training a preset neural network model based on the training data to obtain a trained neural network model, wherein the neural network model at least comprises: the training method comprises the steps of training a preset neural network model based on training data to obtain a trained neural network model, wherein the training method comprises the following steps of: training the first skeleton extraction model, the second skeleton extraction model and the source character recovery model according to the actual color characteristics of the first source character image and the actual color characteristics of the first training character image in the training data; training the first skeleton extraction model, the second skeleton extraction model and the target character generation model according to the actual color characteristics of the first source character image, the actual color characteristics of the first training character image and the actual color characteristics of the second training character image in the training data; training the first skeleton extraction model, the second skeleton extraction model and the target character generation model according to the actual color characteristics of the first source character image, the actual color characteristics and the actual transparency characteristics of the first training character image;

obtaining a trained neural network model, wherein the trained neural network model is obtained by training at least using a training image with transparency characteristics;

performing migration learning on the trained neural network model based on the color characteristics and the transparency characteristics of the template text image to obtain an adjusted neural network model;

2. The method of claim 1, wherein training the first skeleton extraction model, the second skeleton extraction model, and the source text recovery model based on the actual color features of the first source text image and the actual color features of the first training text image in the training data comprises:

3. The method of claim 2, wherein training the first skeleton extraction model, the second skeleton extraction model, and the target word generation model based on the actual color features of the first source word image, the actual color features of the first training word image, and the actual color features of the second training word image in the training data comprises:

4. A method as claimed in claim 3, wherein the training of the first skeleton extraction model, the second skeleton extraction model, and the target character generation model based on the actual color features of the first source character image, the actual color features of the first training character image, and the actual transparency features comprises:

5. The method according to any one of claims 1 to 4, further comprising:

acquiring a background image and a placement position of the target text image;

based on the placement position, overlapping the target text image into the background image to obtain a composite image;

and outputting the synthesized image.

6. An image processing apparatus, characterized in that the apparatus comprises:

the training module is used for training a preset neural network model based on the training data to obtain a trained neural network model, and the neural network model at least comprises: the training module is further configured to: training the first skeleton extraction model, the second skeleton extraction model and the source character recovery model according to the actual color characteristics of the first source character image and the actual color characteristics of the first training character image in the training data so as to realize shape retention; training the first skeleton extraction model, the second skeleton extraction model and the target character generation model according to the actual color characteristics of the first source character image, the actual color characteristics of the first training character image and the actual color characteristics of the second training character image in the training data so as to realize texture generation; training the first skeleton extraction model, the second skeleton extraction model and the target character generation model according to the actual color characteristics of the first source character image, the actual color characteristics and the actual transparency characteristics of the first training character image so as to realize transparency information generation;

the second acquisition module is used for acquiring a trained neural network model, wherein the trained neural network model is obtained by training at least by using a training image with transparency characteristics;

7. An image processing apparatus, characterized by comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 5 when executing executable instructions stored in said memory.

8. A storage medium having stored thereon executable instructions for causing a processor to perform the method of any one of claims 1 to 5.