CN115050035A

CN115050035A - Image processing method and device, computer equipment and storage medium

Info

Publication number: CN115050035A
Application number: CN202110253249.3A
Authority: CN
Inventors: 徐培; 黄珊
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-03-08
Filing date: 2021-03-08
Publication date: 2022-09-13

Abstract

The application discloses an image processing method, an image processing device, computer equipment and a storage medium, wherein the method comprises the following steps: the computer equipment identifies the Mongolian image to be identified after the angle transformation by using a trained target Mongolian identification model to obtain a Mongolian identification result, wherein the target Mongolian identification model is obtained by training a first Mongolian sample image set and a second Mongolian sample image set, the first Mongolian sample image included in the first Mongolian sample image set is constructed according to a Mongolian character string and a background image, and the second Mongolian sample image included in the second Mongolian sample image set is obtained by carrying out image interception on a Mongolian text object.

Description

Image processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.

Background

The method for identifying Mongolian images is a relatively popular research topic in recent years, and has a great technical problem in identification due to the particularity of Mongolian character strings and Mongolian writing formats.

At present, the Mongolian image recognition method also comprises a plurality of methods: one is to cut the real Mongolian text line image into Mongolian words according to blank spaces, then take the whole word picture as input, and regard the Mongolian vocabulary words with different visual fonts as each different category; one is to recognize words or text line images based on a sequence recognition method. Both methods are based on real Mongolian images to carry out model training so as to realize Mongolian recognition. If the number of real Mongolian images is small, the trained model is insufficient in recognition accuracy. Therefore, how to improve the accuracy of the Mongolian image recognition is an urgent technical problem to be solved.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, computer equipment and a storage medium, which can improve the accuracy of Mongolian image recognition.

An embodiment of the present application discloses an image processing method, which includes:

obtaining a Mongolian image to be recognized, and carrying out angle transformation on the Mongolian image to be recognized;

identifying the Mongolian image to be identified after the angle transformation by using the target Mongolian identification model to obtain a Mongolian identification result;

the target Mongolian recognition model is obtained by utilizing a first Mongolian sample image set and a second Mongolian sample image set through training, a first Mongolian sample image included in the first Mongolian sample image set is obtained through construction according to a Mongolian character string and a background image, and a second Mongolian sample image included in the second Mongolian sample image set is obtained through image interception aiming at a Mongolian text object.

An embodiment of the present application discloses an image processing apparatus, which includes:

the acquisition unit is used for acquiring a Mongolian image to be recognized;

the processing unit is used for carrying out angle transformation on the Mongolian image to be recognized;

the processing unit is also used for identifying the Mongolian image to be identified after the angle transformation by using the target Mongolian identification model to obtain a Mongolian identification result;

the target Mongolian recognition model is obtained by utilizing a first Mongolian sample image set and a second Mongolian sample image set in a training mode, a first Mongolian sample image included in the first Mongolian sample image set is obtained by construction according to a Mongolian character string and a background image, and a second Mongolian sample image included in the second Mongolian sample image set is obtained by image capture aiming at a Mongolian text object.

An embodiment of the present application discloses a computer device, in one aspect, the computer device includes:

a processor adapted to implement one or more instructions; and (c) a second step of,

a computer storage medium storing one or more instructions adapted to be loaded and executed by the processor to perform the image processing method described above.

In one aspect, the present application discloses a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program performs the image processing method.

In one aspect, embodiments of the present application disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing method described above.

In the embodiment of the application, a computer device identifies a Mongolian image to be identified after angle transformation by using a trained target Mongolian identification model to obtain a Mongolian identification result, wherein the target Mongolian identification model is obtained by using a first Mongolian sample image set and a second Mongolian sample image set through training, the first Mongolian sample image included in the first Mongolian sample image set is constructed according to a Mongolian character string and a background image, the second Mongolian sample image included in the second Mongolian sample image set is obtained by carrying out image interception on a Mongolian text object, the training of the target Mongolian identification model is realized by the method, the data volume during model training can be increased, meanwhile, the style of the Mongolian image is enriched by a mode of being fused into the background image, the target Mongolian identification model obtained based on the training of the method, when a real Mongolian image is tested, the accuracy of Mongolian recognition can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a block diagram of an image processing system according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of an image processing method disclosed in an embodiment of the present application;

FIG. 3 is a schematic diagram of a prediction flow of a target Mongolian prediction model disclosed in an embodiment of the present application;

FIG. 4 is a schematic flowchart of another image processing method disclosed in the embodiments of the present application;

FIG. 5 is a schematic flow chart of a synthesized Mongolian text disclosed in an embodiment of the present application;

FIG. 6 is a schematic flow chart of another synthetic Mongolian text disclosed in the embodiments of the present application;

FIG. 7 is a schematic diagram of a fused Mongolian image disclosed in an embodiment of the present application;

FIG. 8 is a schematic diagram of another fused Mongolian image disclosed in embodiments of the present application;

FIG. 9 is a schematic flow chart diagram illustrating a further image processing method disclosed in an embodiment of the present application;

fig. 10 is a schematic structural diagram of an image processing apparatus disclosed in an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device disclosed in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The application relates to computer vision technology and machine learning belonging to the artificial intelligence technology. Computer Vision technology (CV) is a science for researching how to make a machine see, and further means that a camera and a Computer are used for replacing human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The scheme provided by the embodiment of the application relates to the computer vision technology of artificial intelligence, the machine learning technology and the like, and is specifically explained by the following embodiments:

and acquiring a Mongolian image to be recognized, carrying out angle transformation on the Mongolian image to be recognized, and recognizing the Mongolian image to be recognized after the angle transformation by using the target Mongolian recognition model to obtain a Mongolian recognition result. The target Mongolian recognition model is obtained by utilizing a first Mongolian sample image set and a second Mongolian sample image set through training based on a computer vision technology and a machine learning technology, the first Mongolian sample image included in the first Mongolian sample image set is obtained through construction according to a Mongolian character string and a background image, and the second Mongolian sample image included in the second Mongolian sample image set is obtained through image interception aiming at a Mongolian text object. The method can be used for training the target Mongolian recognition model, the data volume during model training can be increased, meanwhile, the style of Mongolian images is enriched by means of integrating background images, and the accuracy of Mongolian recognition can be improved when the real Mongolian images are tested by the target Mongolian recognition model obtained through training based on the method.

Referring to fig. 1, fig. 1 is a schematic architecture diagram of an image processing system according to an embodiment of the present disclosure, and as shown in fig. 1, an architecture diagram 100 of the image processing system may include a terminal device 101 and a computer device 102, where the terminal device 101 and the computer device 102 may implement a communication connection.

In one possible implementation manner, the terminal device 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart car, and the like; the computer device 102 may be a server, which may be an independent physical server, a server cluster or a distributed system formed by multiple physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. Optionally, in this embodiment of the application, the device for implementing the function of the terminal device 101 may be a smart phone or other devices; it may also be a device, such as a chip system, capable of supporting the terminal device to implement the function, and the device may be installed in the terminal device.

In one possible implementation, the computer device 102 is specifically configured to: obtaining a Mongolian image to be recognized, and carrying out angle transformation on the Mongolian image to be recognized; identifying the Mongolian image to be identified after the angle transformation by using the target Mongolian identification model to obtain a Mongolian identification result; the target Mongolian recognition model is obtained by utilizing a first Mongolian sample image set and a second Mongolian sample image set through training, a first Mongolian sample image included in the first Mongolian sample image set is obtained through construction according to a Mongolian character string and a background image, and a second Mongolian sample image included in the second Mongolian sample image set is obtained through image interception aiming at a Mongolian text object. The method can be used for training the target Mongolian recognition model, the data volume during model training can be increased, meanwhile, the style of the Mongolian image is enriched in a mode of integrating a background image, and the accuracy of Mongolian recognition can be improved when the real Mongolian image is tested by the target Mongolian recognition model obtained through training based on the method.

In a possible implementation manner, the terminal device 101 is mainly configured to send, to the computer device 102, a second Mongolian sample image obtained by image capture for the Mongolian text object and a Mongolian image to be recognized for testing. Optionally, the terminal device 101 is further configured to receive a result of the Mongolian recognition returned by the computer device 102.

Based on the image processing system, an embodiment of the present application discloses an image processing method, please refer to fig. 2, which is a flowchart illustrating the image processing method disclosed in the embodiment of the present application, and the image processing method can be executed by a computer device. The image processing method may specifically include the steps of:

s201, obtaining a Mongolian image to be recognized, and carrying out angle transformation on the Mongolian image to be recognized.

The Mongolian image to be recognized may be obtained from a database, or may be captured from an article or a web map when a user wants to recognize Mongolian.

In a possible implementation manner, after the computer device acquires the Mongolian image to be recognized, angle transformation is performed on the image to be recognized. Because the special feature of the Mongolian is that the commonly acquired Mongolian images are all vertically displayed, before the Mongolian images are recognized, the Mongolian images need to be subjected to angle transformation. Specifically, the Mongolian image is rotated, and the rotation angle may be 90 degrees counterclockwise.

S202, identifying the Mongolian image to be identified after the angle transformation by using the target Mongolian identification model to obtain a Mongolian identification result.

In a possible implementation manner, after determining the Mongolian image to be recognized after the angle transformation, the computer device inputs the Mongolian image to be recognized after the angle transformation into the target Mongolian recognition model, and a recognition module of the target Mongolian recognition model recognizes the Mongolian image to be recognized after the angle transformation to obtain a Mongolian recognition result. The Mongolian recognition result may be specifically a Mongolian character string in the Mongolian image, or may be a single Mongolian word in the Mongolian image.

The target Mongolian recognition model is obtained by training through a first Mongolian sample image set and a second Mongolian sample image set. The first Mongolian sample images included in the corresponding first Mongolian sample image set are constructed according to the Mongolian character strings and the background images, namely, the non-real data are synthesized through the technology. The second Mongolian sample image included in the corresponding second Mongolian sample image set is obtained by image interception of a Mongolian text object, namely real data, and the Mongolian text object can refer to a text line object or a character string object.

The above steps S201 to S202 correspond to a prediction process after obtaining the trained target Mongolian recognition model. As shown in fig. 3, two to-be-recognized Mongolian images are obtained, namely a to-be-recognized Mongolian image 1 and a to-be-recognized Mongolian image 2 (namely, vertically displayed), then the to-be-recognized Mongolian image 1 and the to-be-recognized Mongolian image 2 are subjected to angle transformation, namely, the to-be-recognized Mongolian image 1 and the to-be-recognized Mongolian image 2 are rotated by 90 degrees counterclockwise, so that the to-be-recognized Mongolian image 1 subjected to angle transformation and the to-be-recognized Mongolian image 2 subjected to angle transformation (namely, horizontally displayed) are obtained, then the to-be-recognized Mongolian image 1 subjected to angle transformation and the to-be-recognized Mongolian image 2 subjected to angle transformation are used as input of a target Mongolian recognition model for recognition, and recognition results 1 and 2 of the Mongolian image 1 and the Mongolian image 2 subjected to angle transformation are obtained respectively. The Mongolian image to be recognized is recognized through the target Mongolian recognition model, so that the recognition accuracy can be improved.

In the embodiment of the application, computer equipment acquires a Mongolian image to be recognized and carries out angle transformation on the Mongolian image to be recognized; identifying the Mongolian image to be identified after the angle transformation by using the target Mongolian identification model to obtain a Mongolian identification result; the target Mongolian recognition model is obtained by utilizing a first Mongolian sample image set and a second Mongolian sample image set through training, a first Mongolian sample image included in the first Mongolian sample image set is obtained through construction according to a Mongolian character string and a background image, and a second Mongolian sample image included in the second Mongolian sample image set is obtained through image interception aiming at a Mongolian text object. The method can be used for training the target Mongolian recognition model, the data volume during model training can be increased, meanwhile, the style of Mongolian images is enriched by means of integrating background images, and the accuracy of Mongolian recognition can be effectively improved when the real Mongolian images are tested by the target Mongolian recognition model obtained through training based on the method.

Based on the image processing system and the image processing method, an embodiment of the present application discloses another image processing method, please refer to fig. 4, which is a schematic flow chart of another image processing method disclosed in the embodiment of the present application, and the image processing method can be executed by a computer device. The image processing method may specifically include the steps of:

s401, a first Mongolian sample image set and a second Mongolian sample image set are obtained, wherein the first Mongolian sample image set comprises a plurality of first Mongolian sample images, and the second Mongolian sample image set comprises a plurality of second Mongolian sample images.

In a possible implementation manner, the first mongolian sample image included in the first mongolian sample image set acquired by the computer device is constructed by the computer device based on the mongolian character string and the background image, and specifically, the construction of the first mongolian sample image includes two parts, one part is to determine a synthesized mongolian text according to the mongolian character string, and the other part is to determine the first mongolian sample image according to the synthesized mongolian text and the background image to be fused in the background image. The following is set forth in two parts:

the method comprises the steps that a computer device determines and synthesizes Mongolian texts according to Mongolian character strings, and the Mongolian texts can be synthesized by a data synthesis module, wherein the data synthesis module adopts a Linux text editor pango-view, the bottom layer is an OpenType intelligent font technology, deformation display of character patterns after Mongolian characters are connected in series can be supported, and specifically, the Mongolian texts can be directly connected in series after the Mongolian character strings and attribute parameters corresponding to the Mongolian character strings are determined by adopting the OpenType intelligent font technology. The specific implementation process may include: the method comprises the steps that computer equipment obtains Mongolian character strings to be synthesized and synthesis parameters, and synthesizes the Mongolian character strings to be synthesized according to the synthesis parameters to obtain synthesized Mongolian texts, wherein the obtained Mongolian character strings to be synthesized are obtained from a Mongolian character string database, and the specific implementation process is as follows: the computer equipment respectively selects any integer L with the Mongolian words number between the ranges of [2, 4], [4, 12], [12, 20], [20, 30] as the word length according to the probability of [0.15, 0.45, 0.3, 0.1], randomly selects any text line from Mongolian text line material, selects L continuous words from the line material as the generated text, and if the line material length is less than L, all the words are used as the generated text.

Further, the computer equipment adjusts the attribute of the Mongolian character string to be synthesized according to the synthesis parameter, and synthesizes the Mongolian character string to be synthesized after the attribute adjustment to obtain the synthesized Mongolian text. Wherein, the synthesis parameters include one or more of font parameters, color parameters, rotation angle parameters and text format parameters, and the adjusting the attributes of the Mongolian character string to be synthesized includes one or more of the following:

the computer equipment adjusts the Font of the Mongolian character string to be synthesized according to Font (Font) parameters, randomly selects any Mongolian Font from a Mongolian Font library, uniformly distributes the random parameters obeying to [0,1], and adjusts the Mongolian character string to be synthesized into any Mongolian Font in the Mongolian Font library after the Font is determined.

The computer equipment adjusts the color of the Mongolian character string to be synthesized according to the color parameter, wherein the color of the Mongolian character string to be synthesized can refer to the font color of the Mongolian character string to be synthesized, namely the Foreground color (Foreground); the color of the Mongolian character string to be synthesized may refer to the Background color Background of the Mongolian character string to be synthesized). Wherein, the background color is specifically set as an RGB value (255, 255, 255); the specific setting of the foreground color comprises: firstly, selecting an R value x according to 90 beta (2, 5) distribution, then obtaining a G value y according to the deviation randomly increased in the range of [ -10, 10] on the basis of the R value, obtaining a B value z in the same way, and finally forming an RGB value (x, y, z) to be used as a foreground color. Here, RGB is a color representing three channels of red, green and blue, where R represents red, G represents green and B represents blue.

The computer equipment adjusts the placement position of the Mongolian character string to be synthesized according to a rotation angle (Rotate) parameter, the Mongolian character string to be synthesized needs to be rotated because the Mongolian is displayed vertically so as to enable the Mongolian character string to be synthesized to be close to 90 degrees, the specific execution step is that the computer randomly selects any decimal between [88 and 92], and the random parameter is uniformly distributed according to [0 and 1 ].

And the computer equipment adjusts the character format of the Mongolian character string to be synthesized according to the text format parameters, wherein the character format can refer to line margin, font size, character spacing and the like. The line margin setting comprises an upper margin, a lower margin, a left margin and a right margin, any decimal in the range of [0, 4] is randomly selected, and random parameters are uniformly distributed in the range of [0,1 ].

The attribute of the Mongolian character string to be synthesized is adjusted in various ways, and corresponding adjustment can be performed according to actual conditions in the process of actually constructing the Mongolian text. The Mongolian text synthesized by the method can be as shown in FIG. 5, and the Mongolian character string to be synthesized and the synthesis parameter are input into the data synthesis module, so that the synthesized Mongolian text as shown in FIG. 5 is obtained. The method adopts the OpenType font technology, so that the deformation display of the font after Mongolian character strings are connected in series can be supported. If the method is based on image pixel rendering, the characters in the character string are pixel rendered one by one according to the character level, and it is difficult to support the deformation display from the character string to the character string formed by connecting a plurality of characters under the Mongolian language rule, as shown in FIG. 6, the synthesized text line cannot be correctly displayed.

And secondly, the computer equipment obtains a first Mongolian sample image after fusing the Mongolian text and the background image in the background image, and the Mongolian text style is amplified in the part to obtain the Mongolian text with complex and various backgrounds. The part can be realized by a style augmentation module, and comprises: acquiring a background image to be fused from a background image library according to the reference size; adjusting the size of the background image to be fused according to the size of the synthesized Mongolian text; and carrying out fusion processing on the background image to be fused and the synthesized Mongolian text after the size adjustment to obtain a fusion Mongolian image, and then carrying out fuzzy processing on the fusion Mongolian image to obtain a first Mongolian sample image. The specific implementation process can be as follows: the computer equipment randomly selects an image from an initialized background image library as a reference background image, the reference background image has a corresponding width bg _ w and a corresponding height bg _ h, and if the width w and the height h of the synthesized Mongolian text are assumed, the background image to be fused of the Mongolian text is selected from the reference background image according to a reference size, and is [ start _ x: end _ x, start _ y: end _ y ], wherein start _ x represents a start x coordinate and is any integer randomly selected with equal probability in a range of [0, bg _ w//2], bg _ w//2 represents an integer division and integration result, an end x coordinate end _ x min (bg _ w, start _ x + w), a start y coordinate is any integer randomly selected with equal probability in a range of [0, | start _ bg _ h |, and an end y coordinate end _ y is min (bg _ h), start _ y + h), determining that the image of [ start _ x: end _ x, start _ y: end _ y ] is the background image to be fused, | end _ x-start _ x | is the width of the background image to be fused, and | end _ y-start _ y | is the height of the background image to be fused.

Since the size of the background image to be fused obtained according to the reference size is not necessarily the same as the size of the synthesized Mongolian text, the size of the background image to be fused needs to be adjusted, and specifically, the size of the background image to be fused is adjusted to the same size as the synthesized Mongolian text by using an opencv resize method, that is, the width | end _ x-start _ x | is adjusted to w, and the height | end _ y-start _ y | is adjusted to h. Further, the computer equipment performs fusion processing on the background image to be fused and the synthesized Mongolian text after the size adjustment to obtain a fusion Mongolian image.

In a possible implementation manner, the computer device may perform fusion processing on the size-adjusted background image to be fused and the synthesized Mongolian text by using a transparency superposition method to obtain a fused Mongolian image, where the transparency superposition method is called Addup, and mainly uses an addWeighted function of opencv to superimpose a pure white mask with a weighting coefficient in a range of [0.3, 0.6] on the size-adjusted background image to be fused, and then uses the addWeighted function again to superimpose the synthesized Mongolian text with a weighting coefficient in a range of [0.5,0.9] on the size-adjusted background image to be fused to obtain a fused Mongolian image after fusing the background, where as shown in FIG. 7, multiple fused Mongolian images are shown in FIG. 7, and corresponding backgrounds are different.

In a possible implementation manner, the computer device may perform fusion processing on the size-adjusted background image to be fused and the synthesized Mongolian text by using an image Poisson fusion method to obtain a fused Mongolian image, where the image Poisson fusion method is called Mixup, and is mainly to perform Poisson fusion on the synthesized Mongolian text and the size-adjusted background image to be fused according to an MIXED _ CLONE manner by using a seamlessClone function of opencv to obtain a fused Mongolian image, as shown in FIG. 8, where FIG. 8 shows a plurality of fused Mongolian images, and corresponding backgrounds are different.

In the process of actually constructing the first Mongolian sample image, any one of the two methods can be selected to realize the fusion of the background image to be fused and the synthesized Mongolian text.

In a possible implementation manner, in order to obtain smoother edges of the Mongolian characters in the fused Mongolian image, the computer device further needs to perform Gaussian blur processing on the fused Mongolian image to obtain a first Mongolian sample image. And performing Gaussian blur processing, namely GaussBlur, randomly selecting any numerical value from [3, 5, 7 and 9] as a Gaussian kernel _ size by using a GaussianBlur function of opencv, and performing Gaussian blur operation on the fused Mongolian image to obtain a first Mongolian sample image so as to obtain smoother character edges. Optionally, the gaussian blur processing may not be performed, and the fused Mongolian image is directly used as the first Mongolian sample image, so that the effect of the fused Mongolian image after the gaussian blur processing is performed is relatively better.

The method for fusing the synthesized Mongolian text and the background image to be fused aims at performing online amplification and random rich patterns when a target Mongolian recognition model is trained, and effectively solves the problems that the background style of the synthesized Mongolian text data is simple and single, and the difference between the background style and the real data is large.

In a possible implementation manner, the second mongolian language sample image included in the second mongolian language sample image set acquired by the computer device is obtained by image interception of a mongolian language text object, that is, data in the second mongolian language sample image set are all real data, and can be understood as a mongolian language text recorded in an online article or a book, and the corresponding mongolian language text can be a text line or a character string.

S402, performing preliminary training on the initial Mongolian recognition model by using the first Mongolian sample image set to obtain the preliminarily trained Mongolian recognition model.

In a possible implementation manner, the computer device performs preliminary training on an initial Mongolian recognition model by using a first Mongolian sample image set to obtain a preliminarily trained Mongolian recognition model, where the initial Mongolian recognition model may be a model architecture based on a Convolutional Recurrent Neural Network (CRNN), and has the following advantages: learning can be done directly from sequence tags (e.g., words) without the need for detailed labels (e.g., characters); the image data learning information has the same property as a deep neural network when being directly expressed, and does not need manually acquired features or preprocessing steps, such as binaryzation/segmentation, component positioning and the like; has the same properties as a convolutional neural network, can produce a series of labels; the length of the class sequence object is not restricted, and the height only needs to be normalized in a training stage and a testing stage. Specifically, the model architecture comprises a convolutional layer, a cyclic layer and a transcription layer from bottom to top, wherein the convolutional layer automatically extracts a feature sequence from each input image, the cyclic layer predicts each frame of the feature sequence output by the convolutional layer, and then the transcription layer at the top is adopted to convert each frame of prediction of the cyclic layer into a tag sequence.

Therefore, in the embodiment of the present application, the constructed first Mongolian sample image set is input into the CRNN model set forth above for preliminary training, so as to obtain a Mongolian recognition model after the preliminary training on Mongolian.

And S403, retraining the primarily trained Mongolian recognition model by using the second Mongolian sample image set, and adjusting model parameters of the primarily trained Mongolian recognition model to obtain the target Mongolian recognition model.

Because the data trained in step S402 is constructed, and has a certain difference from the real data, the real data is needed to be used to perform fine tuning on the primarily trained Mongolian recognition model obtained in step S402, so as to obtain the target Mongolian recognition model.

In a possible implementation manner, the computer device trains the primarily trained Mongolian recognition model again by using the acquired second Mongolian sample image set (real data) in combination with the first Mongolian sample image set again, so that the recognition result of the first Mongolian sample image set is similar to the recognition result of the second Mongolian sample image set, meanwhile, the model parameters of the primarily trained Mongolian recognition model are adjusted in the training process, and the primarily trained Mongolian recognition model after parameter adjustment is used as the target Mongolian recognition model.

Specifically, as shown in fig. 9, the steps S401 to S403 may include a data synthesis module 901, a style augmentation module 902, and an identification module 903, where the data synthesis module 901 is mainly used to obtain a synthesized Mongolian text, the style augmentation module 902 is mainly used to obtain a first Mongolian sample image, and the identification module 903 is mainly used to identify the first Mongolian sample image and a second Mongolian sample image, so as to achieve the purpose of training a model.

S404, obtaining a Mongolian image to be recognized, and carrying out angle transformation on the Mongolian image to be recognized.

S405, identifying the Mongolian image to be identified after the angle transformation by using the target Mongolian identification model to obtain a Mongolian identification result.

Some possible implementations related to steps S404 to S405 may refer to the description of the relevant steps in the embodiment in fig. 2, and are not described herein again.

In the embodiment of the application, computer equipment obtains a first Mongolian sample image set and a second Mongolian sample image set, the first Mongolian sample image set comprises a plurality of first Mongolian sample images, the second Mongolian sample image set comprises a plurality of second Mongolian sample images, the initial Mongolian recognition model is preliminarily trained by using the first Mongolian sample image set, a Mongolian recognition model after preliminary training is obtained, the Mongolian recognition model after preliminary training is retrained again by using the second Mongolian sample image set, model parameters of the Mongolian recognition model after preliminary training are adjusted to obtain a target Mongolian recognition model, and the Mongolian recognition model is used for recognizing the Mongolian images to be recognized to obtain a Mongolian recognition result. The target Mongolian recognition model is trained through the first Mongolian sample image set and the second Mongolian sample image set, and therefore the accuracy of Mongolian recognition can be improved.

Based on the above method embodiment, the embodiment of the present application further provides a schematic structural diagram of an image processing apparatus. Referring to fig. 10, a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure is shown. The image processing apparatus 1000 shown in fig. 10 can operate as follows:

an acquiring unit 1001 configured to acquire a Mongolian image to be recognized;

the processing unit 1002 is configured to perform angle transformation on the to-be-recognized Mongolian image;

the processing unit 1002 is further configured to identify the angle-transformed Mongolian image to be identified by using the target Mongolian identification model, so as to obtain a Mongolian identification result;

In a possible implementation manner, the obtaining unit 1001 is further configured to obtain a first Mongolian sample image set and a second Mongolian sample image set, where the first Mongolian sample image set includes a plurality of first Mongolian sample images, and the second Mongolian sample image set includes a plurality of second Mongolian sample images;

the processing unit 1002 is further configured to:

carrying out preliminary training on an initial Mongolian recognition model by utilizing the first Mongolian sample image set to obtain a preliminarily trained Mongolian recognition model;

and retraining the primarily trained Mongolian recognition model by using the second Mongolian sample image set, and adjusting model parameters of the primarily trained Mongolian recognition model to obtain a target Mongolian recognition model.

In a possible implementation manner, the obtaining unit 1001 is further configured to obtain a Mongolian character string to be synthesized and a synthesis parameter;

the processing unit 1002 is further configured to synthesize the Mongolian character string to be synthesized according to the synthesis parameter, so as to obtain a synthesized Mongolian text;

the obtaining unit 1001 is further configured to obtain a background image to be fused from a background image library;

the processing unit 1002 is further configured to perform fusion processing on the background image to be fused and the synthesized Mongolian text to obtain a fused Mongolian image;

a determining unit 1003, configured to determine a first Mongolian sample image according to the fused Mongolian image.

In a possible implementation manner, the processing unit 1002 synthesizes the Mongolian character string to be synthesized according to the synthesis parameter to obtain a synthesized Mongolian text, and is specifically configured to:

adjusting the attribute of the Mongolian character string to be synthesized according to the synthesis parameters;

synthesizing the attribute-adjusted Mongolian character string to be synthesized to obtain a synthesized Mongolian text;

the synthesis parameters comprise one or more of font parameters, color parameters, rotation angle parameters and text format parameters, and the adjustment of the attributes of the Mongolian character strings to be synthesized comprises one or more of the following steps:

adjusting the font of the Mongolian character string to be synthesized according to the font parameters, adjusting one or more of the font color and the background color of the Mongolian character string to be synthesized according to the color parameters, adjusting the placement position of the Mongolian character string to be synthesized according to the rotation angle parameters, and adjusting the character format of the Mongolian character string to be synthesized according to the text format parameters.

In a possible implementation manner, the obtaining unit 1001 obtains a background image to be fused from a background image library, and the processing unit 1002 performs fusion processing on the background image to be fused and the synthesized Mongolian text to obtain a fused Mongolian image, where the fusion processing includes:

acquiring a background image to be fused from a background image library according to the reference size;

carrying out size adjustment on the size of the background image to be fused according to the size of the synthesized Mongolian text;

and carrying out fusion processing on the background image to be fused after the size adjustment and the synthesized Mongolian text to obtain a fusion Mongolian image.

In a possible implementation manner, the processing unit 1002 performs a fusion process on the background image to be fused after the size adjustment and the synthesized Mongolian text to obtain a fused Mongolian image, including:

fusing the background image to be fused and the synthesized Mongolian text after the size adjustment by using a transparency superposition method to obtain a fused Mongolian image; or, carrying out fusion processing on the background image to be fused after the size adjustment and the synthesized Mongolian text by using an image Poisson fusion method to obtain a fused Mongolian image.

In a possible implementation manner, the determining unit 1003 determines, according to the fused Mongolian image, a first Mongolian sample image, including:

and carrying out Gaussian blur processing on the fused Mongolian image to obtain a first Mongolian sample image.

According to an embodiment of the present application, the steps involved in the image processing methods shown in fig. 2 and 4 may be performed by units in the image processing apparatus shown in fig. 10. For example, step S201 in the image processing method shown in fig. 2 may be performed by the acquisition unit 1001 in the image processing apparatus shown in fig. 10, and step S202 may be performed by the processing unit 1002 in the image processing apparatus shown in fig. 10; as another example, steps S401 and S404 in the image processing method shown in fig. 4 may be performed by the acquisition unit 1001 in the image processing apparatus shown in fig. 10, and steps S402 to S403 and S405 may be performed by the processing unit 1002 in the image processing apparatus shown in fig. 10.

According to another embodiment of the present application, the units in the image processing apparatus shown in fig. 10 may be respectively or entirely combined into one or several other units to form the image processing apparatus, or some unit(s) may be further split into multiple units with smaller functions to form the image processing apparatus, which may achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the image processing apparatus may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present application, the image processing apparatus as shown in fig. 10 may be configured by running a computer program (including program codes) capable of executing the steps involved in the respective methods as shown in fig. 2 and fig. 4 on a general-purpose computing device such as a computer including a processing element and a storage element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, and the image processing method of the embodiment of the present application may be realized. The computer program may be embodied on a computer-readable storage medium, for example, and loaded into and executed by the above-described computing apparatus via the computer-readable storage medium.

In the embodiment of the application, the acquisition unit 1001 acquires a Mongolian image to be recognized, and the processing unit 1002 performs angle transformation on the Mongolian image to be recognized; identifying the Mongolian image to be identified after the angle transformation by using the target Mongolian identification model to obtain a Mongolian identification result; the target Mongolian recognition model is obtained by utilizing a first Mongolian sample image set and a second Mongolian sample image set through training, a first Mongolian sample image included in the first Mongolian sample image set is obtained through construction according to a Mongolian character string and a background image, and a second Mongolian sample image included in the second Mongolian sample image set is obtained through image interception aiming at a Mongolian text object. The method can be used for training the target Mongolian recognition model, the data volume during model training can be increased, meanwhile, the style of Mongolian images is enriched by means of integrating background images, and the accuracy of Mongolian recognition can be effectively improved when the real Mongolian images are tested by the target Mongolian recognition model obtained through training based on the method.

Based on the method and the device embodiment, the embodiment of the application provides computer equipment. Referring to fig. 11, a schematic structural diagram of a computer device according to an embodiment of the present application is provided. The computer device 1100 shown in fig. 11 comprises at least a processor 1101, an input interface 1102, an output interface 1103, a computer storage medium 1104, and a memory 1105. The processor 1101, the input interface 1102, the output interface 1103, the computer storage medium 1104, and the memory 1105 may be connected by a bus or other means.

A computer storage medium 1104 may be stored in the memory 1105 of the computer device 1100, the computer storage medium 1104 being for storing a computer program, the computer program comprising program instructions, the processor 1101 being for executing the program instructions stored by the computer storage medium 1104. The processor 1101 (or CPU) is a computing core and a control core of the computer device 1100, and is adapted to implement one or more instructions, and in particular to load and execute one or more computer instructions to implement corresponding method flows or corresponding functions.

Embodiments of the present application also provide a computer storage medium (Memory), which is a Memory device in the computer device 1100 and is used to store programs and data. It is understood that the computer storage media herein can include both built-in storage media in the computer device 1100 and, of course, extended storage media supported by the computer device 1100. The computer storage media provides storage space that stores an operating system for the computer device 1100. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for being loaded and executed by the processor 1101. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.

In one embodiment, the computer storage medium may be loaded with one or more instructions by processor 1101 and executed to implement the corresponding steps described above with respect to the image processing method shown in fig. 2 and 3. In particular implementations, one or more instructions in the computer storage medium are loaded by processor 1101 and perform the following steps:

acquiring a Mongolian image to be recognized, and carrying out angle transformation on the Mongolian image to be recognized;

In one possible implementation, the processor 1101 is further configured to:

obtaining a first Mongolian sample image set and a second Mongolian sample image set, wherein the first Mongolian sample image set comprises a plurality of first Mongolian sample images, and the second Mongolian sample image set comprises a plurality of second Mongolian sample images;

and utilizing the second Mongolian sample image set to train the primarily trained Mongolian recognition model again, and adjusting model parameters of the primarily trained Mongolian recognition model to obtain a target Mongolian recognition model.

In one possible implementation, the processor 1101 is further configured to:

acquiring a Mongolian character string to be synthesized and synthesis parameters;

synthesizing the Mongolian character string to be synthesized according to the synthesis parameters to obtain a synthesized Mongolian text;

acquiring a background image to be fused from a background image library, and fusing the background image to be fused and the synthesized Mongolian text to obtain a fused Mongolian image;

and determining a first Mongolian sample image according to the fused Mongolian image.

In a possible implementation manner, the processor 1101 synthesizes the Mongolian character string to be synthesized according to the synthesis parameter to obtain a synthesized Mongolian text, including:

In a possible implementation manner, the processor 1101 obtains a background image to be fused from a background image library, and performs fusion processing on the background image to be fused and the synthesized Mongolian text to obtain a fused Mongolian image, including:

In a possible implementation manner, the processor 1101 performs a fusion process on the resized background image to be fused and the synthesized Mongolian text to obtain a fused Mongolian image, including:

In one possible implementation, the processor 1101 determines a first Mongolian sample image from the fused Mongolian image, including:

In the embodiment of the application, the processor 1101 acquires a to-be-recognized Mongolian image, and performs angle transformation on the to-be-recognized Mongolian image; identifying the Mongolian image to be identified after the angle transformation by using the target Mongolian identification model to obtain a Mongolian identification result; the target Mongolian recognition model is obtained by utilizing a first Mongolian sample image set and a second Mongolian sample image set through training, a first Mongolian sample image included in the first Mongolian sample image set is obtained through construction according to a Mongolian character string and a background image, and a second Mongolian sample image included in the second Mongolian sample image set is obtained through image interception aiming at a Mongolian text object. The method can be used for training the target Mongolian recognition model, the data volume during model training can be increased, meanwhile, the style of the Mongolian image is enriched in a mode of integrating a background image, and the accuracy of Mongolian recognition can be effectively improved when the real Mongolian image is tested by the target Mongolian recognition model obtained through training based on the method.

According to an aspect of the present application, the present application also provides a computer product or a computer program, which includes computer instructions, and the computer instructions are stored in a computer readable storage medium. The processor 1101 reads the computer instructions from the computer-readable storage medium, and the processor 1101 executes the computer instructions, so that the computer apparatus 1100 performs the image processing method shown in fig. 2 and 4.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image processing method, characterized in that the method comprises:

2. The method of claim 1, further comprising:

3. The method according to claim 1 or 2, characterized in that the method further comprises:

4. The method of claim 3, wherein the synthesizing the Mongolian character string to be synthesized according to the synthesis parameters to obtain a synthesized Mongolian text comprises:

synthesizing the Mongolian character string to be synthesized after the attribute is adjusted to obtain a synthesized Mongolian text;

5. The method according to claim 3, wherein the obtaining a background image to be fused from a background image library, and performing fusion processing on the background image to be fused and the synthesized Mongolian text to obtain a fused Mongolian image comprises:

6. The method according to claim 5, wherein the fusing the resized background image to be fused and the synthesized Mongolian text to obtain a fused Mongolian image comprises:

fusing the background image to be fused and the synthesized Mongolian text after the size adjustment by using a transparency superposition method to obtain a fused Mongolian image; or,

and carrying out fusion processing on the background image to be fused after the size adjustment and the synthetic Mongolian text by using an image Poisson fusion method to obtain a fusion Mongolian image.

7. The method of claim 3, wherein determining a first Mongolian sample image from the fused Mongolian image comprises:

8. An image processing apparatus, characterized in that the apparatus comprises:

the acquisition unit is used for acquiring a Mongolian image to be recognized;

9. A computer device, characterized in that the computer device comprises:

a processor adapted to implement one or more instructions; and the number of the first and second groups,

a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to perform the image processing method according to any of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the image processing method according to any one of claims 1 to 7.