CN117152022A

CN117152022A - Image processing method and electronic equipment

Info

Publication number: CN117152022A
Application number: CN202311392512.2A
Authority: CN
Inventors: 孙佳男; 姚通; 陈铎
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-10-25
Filing date: 2023-10-25
Publication date: 2023-12-01

Abstract

The application provides an image processing method and electronic equipment. The method comprises the following steps: receiving a first image, and performing text detection on the first image; when a first character is detected in the first image, extracting first structural information corresponding to the first character; determining a first gesture corresponding to the first text according to the first structure information; generating a first text image according to the first gesture and the first image; and carrying out fusion processing on the first text image and the first image to obtain a first target image corresponding to the first image. Therefore, the definition of characters in the image can be improved on the basis of better protecting the texture details of the image, so that the image quality containing the character content is improved.

Description

Image processing method and electronic equipment

Technical Field

The present application relates to the field of terminal devices, and in particular, to an image processing method and an electronic device.

Background

The photographing function is a function commonly existing in current electronic devices, and is also a function used by users more. Electronic devices often do not include sophisticated anti-shake devices due to volume limitations. Therefore, the filming (i.e., taking a picture) of such electronic devices that do not include an anti-shake device is often affected by shake, resulting in blurring of an image.

Currently, in order to eliminate the influence of dithering on an image, global deblurring operation is performed on the image by a deblur (deblur) algorithm, so that the dithering problem of most scenes can be solved, however, the resolution of the deblur algorithm can expose an unnatural trace of the image, so that the visual effect of the whole image is affected, and the problem is particularly serious in the scenes with characters in the image.

Disclosure of Invention

In order to solve the technical problems, the application provides an image processing method and electronic equipment, which can improve the definition of characters in an image on the basis of better protecting the texture details of the image, thereby improving the image quality containing the character content.

In a first aspect, the present application provides an image processing method applied to an electronic apparatus. The method comprises the following steps: receiving a first image, and performing text detection on the first image; when a first character is detected in the first image, extracting first structural information corresponding to the first character; determining a first gesture corresponding to the first text according to the first structure information; generating a first text image according to the first gesture and the first image; and carrying out fusion processing on the first text image and the first image to obtain a first target image corresponding to the first image. Therefore, the definition of characters in the image can be improved on the basis of better protecting the texture details of the image, so that the image quality containing the character content is improved.

According to a first aspect, extracting first structural information corresponding to a first text includes: and detecting key points of the first characters in the first image to obtain key point information of the first characters, and taking the key point information as first structure information corresponding to the first characters.

According to a first aspect, performing keypoint detection on a first text in a first image to obtain keypoint information of the first text, including: and detecting key points of the first characters in the first image by adopting a corner detection mode to obtain key point information of the first characters.

According to a first aspect, performing keypoint detection on a first text in a first image to obtain keypoint information of the first text, including: and detecting key points of the first characters in the first image by utilizing a feature extraction mode in scale invariant feature transform SIFT to obtain key point information of the first characters.

According to a first aspect, the gesture comprises a communication relationship between a text direction and a key point in the text.

According to a first aspect, determining a first gesture corresponding to a first text according to first structural information includes: and inputting the first structure information into a trained gesture estimation model, and outputting a first gesture corresponding to the first text by the gesture estimation model.

According to a first aspect, extracting first structural information corresponding to a first text includes: and carrying out stroke detection on the first text in the first image to obtain stroke information of the first text, and taking the stroke information as first structure information corresponding to the first text.

According to a first aspect, fusion processing is performed on a first text image and a first image to obtain a first target image corresponding to the first image, including: and replacing the pixel value of a second pixel point which is positioned at the same position as the first pixel point in the first image with the pixel value of the first pixel point in the first character image to obtain a first target image.

According to the first aspect, the first image is an original image photographed using a high-magnification camera.

According to a first aspect, the electronic device is a mobile phone or tablet.

In a second aspect, the present application provides an electronic device comprising: a memory and a processor, the memory coupled to the processor; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the image processing method of any one of the first aspects.

In a third aspect, the present application provides a computer readable storage medium comprising a computer program which, when run on an electronic device, causes the electronic device to perform the image processing method of any one of the preceding first aspects.

Drawings

Fig. 1 is a schematic structural diagram of an exemplary electronic device 100;

fig. 2 is a software architecture block diagram of an electronic device 100 of an exemplary illustrated embodiment of the present application;

fig. 3 is an exemplary diagram of a picture taken using a mobile phone, which is exemplary;

FIG. 4A is a schematic diagram illustrating a procedure of image processing using blind deblurring in the present embodiment;

FIG. 4B is a schematic diagram illustrating a process of deblurring image processing using an event-based camera in the present embodiment;

fig. 5 is a flowchart illustrating an exemplary image processing method in the present embodiment;

fig. 6 is an exemplary diagram of an image processing procedure in the present embodiment exemplarily shown;

fig. 7 is an exemplary diagram of a high definition text obtaining process in the image processing process of the present embodiment exemplarily shown.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.

The terms first and second and the like in the description and in the claims of embodiments of the application, are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.

The embodiment provides an image processing method, which can be applied to electronic devices such as mobile phones and tablets. Of course, not limited to these electronic devices.

In this embodiment, the structure of the electronic device may be as shown in fig. 1.

Fig. 1 is a schematic diagram of an exemplary illustrated electronic device 100. It should be understood that the electronic device 100 shown in fig. 1 is only one example of an electronic device, and that the electronic device 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in fig. 1 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

Referring to fig. 1, an electronic device 100 may include: processor 110, internal memory 121, universal serial bus (universal serial bus, USB) interface 130, charge management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headset interface 170D, sensor module 180, indicator 192, camera 193, etc.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (display serial interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing functions of electronic device 100. The processor 110 and the display 194 communicate via a DSI interface to implement the display functionality of the electronic device 100.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

Of course, in addition to the above components, the electronic device 100 may include other hardware components, which are not listed here.

It should be understood that the components in the hardware configuration shown in fig. 1 do not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or less hardware components than those illustrated, and the application is not limited.

The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.

Fig. 2 is a software structural block diagram of the electronic device 100 of the exemplary embodiment of the present application.

The layered architecture of the electronic device 100 divides the software into several layers, each with a distinct role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system may include an application layer, an application framework layer, a system library, a kernel layer, and the like.

The application layer may include a series of application packages.

As shown in fig. 2, the application package may include camera, gallery, map, WLAN, music, short message, talk, navigation, bluetooth, video, etc. applications. Of course, these applications are merely exemplary, and in other embodiments, the application program layer may or may not include applications not shown in FIG. 2.

The camera application can comprise an image processing module, wherein the image processing module is used for executing the image processing method of the embodiment of the application. In one example, the image processing module may be located in a camera application. Of course, the above-mentioned positions of the image processing module are merely exemplified, and in other embodiments, the image processing module may be disposed at other positions according to practical application requirements, which is not limited in this embodiment.

As shown in FIG. 2, the application framework layer may include a window manager, resource manager, view system, phone manager, content provider, notification manager, and the like.

The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

Android Runtime (Android run) includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), two-dimensional graphics engines (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

A two-dimensional graphics engine is a drawing engine for 2D (two-dimensional) drawing.

The kernel layer is a layer between hardware and software.

As shown in fig. 2, the kernel layer may include a display driver, wi-Fi driver, audio driver, sensor driver, bluetooth driver, etc.

It will be appreciated that the layers and components contained in the layers in the software structure shown in fig. 2 do not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer layers than shown and may include more or fewer components per layer, as the application is not limited.

Some electronic devices (such as mobile phones, tablet computers, etc., but not limited to these devices) often cannot contain a precise anti-shake device due to volume limitation, so that the film formation of the electronic devices is often affected by shake, and an image blurring phenomenon exists.

Fig. 3 is an exemplary diagram of a picture taken using a mobile phone, which is exemplary shown. Referring to fig. 3, a shake of a camera of the mobile phone occurs when photographing due to a shake of a user's hand during photographing the picture shown in fig. 3, thereby photographing the picture shown in fig. 3. As can be seen from fig. 3, the text in the picture is blurred, and the visual experience for the user is poor.

In some embodiments, the global defuzzification may be performed by an algorithm for cases where text is not clear in the image as shown in fig. 3. Although the global deblurring operation can solve the jitter problem of most scenes, the resolution of the algorithm exposes unnatural traces, thereby affecting the visual effect of the whole image. This is particularly severe in text scenes.

The global defuzzification operation may be handled in several ways:

(1) Blind deblurring

Fig. 4A is a schematic diagram illustrating a procedure of the image processing method using blind deblurring in the present embodiment.

Referring to fig. 4A, in this embodiment, the whole image (blurred image) is deblurred directly, i.e. the whole image is deblurred, so as to obtain a clear image.

(2) Event camera based deblurring

Event-based camera (Event-based camera) is a novel, biologically inspired visual sensor, sometimes referred to as dynamics EVS (Event vision sensor).

Fig. 4B is a schematic diagram illustrating a procedure of the deblurring image processing manner using the event-based camera in the present embodiment. Referring to fig. 4B, in this embodiment, with the assistance of the EVS information, deblurring operation is performed on the entire blurred image to obtain a clear image.

The above-described image processing method of performing deblur (i.e., deblur) operation on the entire image is affected by artifacts. And the full-image deblur algorithm tends to output smoothly, even with EVS information assistance, since the resolution of EVS information is often much smaller than RGB, the recovery effect for high frequencies like text is poor.

In the image processing method for performing the deblurring operation on the entire image, although the subjective feeling of the image can be solved, the image processing method can show unnatural feeling for high-frequency details which are easily recognized such as characters. Therefore, the natural sense of detail texture is considered in the debur algorithm, so that the capability of the debur algorithm is limited.

It can be seen that both blind deblurring and event camera-based deblur are full-image deblur algorithms that destroy the texture details of the image.

In one example, for the blurred picture containing text as shown in fig. 3, the following process may also be performed:

reference-based text enhancement

In the mode, a text reference database is established, and the similarity between the reference text path and data in the database is used for guiding text generation.

This method of using additional databases to assist text generation has a significant limitation because the final effect of the method depends on the richness of the reference database, and the similarity matching and enhancement are performed according to the image blocks during processing, so that a cracking sense is likely to occur between different blocks.

In order to solve the above-described problems, the present embodiment proposes an image processing scheme as follows.

Although the debur algorithm may cause an unnatural feel to the text scene, it is still possible to guess from the image which text is roughly. This means that the text, although distorted, still contains sufficient structural information to help identify the text. Therefore, in this embodiment, the structure information of the distorted text is extracted (the connection between the key points) first, then the specific text is generated again according to the Raw graph and the structure information, and then the text distorted in the original graph is replaced according to the corresponding key points.

Fig. 5 is a flowchart illustrating an exemplary image processing method in the present embodiment. The flow of the image processing method can be applied to electronic equipment such as mobile phones, tablets and the like, and is certainly not limited to the method.

Referring to fig. 5, in the present embodiment, the flow of the image processing method may include the following steps:

s501, receiving a first image, and performing text detection on the first image.

Wherein the first image may be an original RGB (red green blue) image.

The first image may be an image of an article photographed by a user using an electronic device such as a mobile phone or a tablet, for example, a picture of an express package with address information, a picture of a test paper with a large amount of text content, and the like. The first image may be an image captured by a high-magnification camera.

In the case where a precise anti-shake device is not included in the electronic apparatus, since a user cannot fully control shake of the electronic apparatus when photographing using the electronic apparatus, an image photographed by the electronic apparatus is often affected by shake. As such, images captured by electronic devices are often blurred. Especially when the image contains text content, the text in the image is blurred, and bad experience is brought to the user. If the characters in the image are the key objects focused by the user, the characters are blurred, so that poor experience is brought to the user. Therefore, in the case that the first image contains text content, the text in the first image is made clear through the subsequent steps, so that the user experience is improved.

In this step, the purpose of text detection on the first image is to detect whether text exists in the image, and if text exists in the image, S502 is executed. If no text exists in the image, the subsequent steps are not performed.

Any text detection mode in the related art may be used for text detection, and details are not repeated here.

S502, when a first character is detected in the first image, first structure information corresponding to the first character is extracted.

The first image may be an original RGB (red green blue) image acquired by the electronic device.

The first text is text in a first image.

For example, when a certain language test paper is photographed by a mobile phone, the photographed picture of the language test paper is the first image, and all the characters in the picture of the language test paper are the first characters.

When there are a plurality of characters in the first image, the structure information of each character is extracted.

The first structure information is information for representing a text structure. Such structural information may be, for example, key point information, stroke information, and the like. Of course, not limited thereto.

In one example, extracting first structural information corresponding to a first text includes:

and detecting key points of the first characters in the first image to obtain key point information of the first characters, and taking the key point information as first structure information corresponding to the first characters.

The manner of detecting the key points is not limited in this embodiment.

For example, in one example, performing keypoint detection on a first text in a first image to obtain keypoint information of the first text may include:

And detecting key points of the first characters in the first image by adopting a corner detection mode to obtain key point information of the first characters.

Corner Detection (Corner Detection) is one method used to obtain image features. Corner points are generally defined as the intersection of two edges. Strictly speaking, a local neighborhood of a corner point should have boundaries of different directions for two different regions. In this embodiment, the corner point may include not only the intersection point of two sides, but also other image points having specific features. For example, image points with certain gradient features, image points with local maximum or minimum gray scale, may all be used as corner points.

The algorithm used for corner detection may be, for example, corner detection based on gray scale images, corner detection based on binary images, corner detection based on contour curves, and the like.

In this embodiment, when the original image is an RGB image and the key points of the original image are obtained by using the corner detection method, the original RGB image may be converted into a gray scale image, and then the key points in the image are obtained by using the corner detection based on the gray scale image.

Alternatively, the original RGB image may be converted into a binary image, and then the key points in the image may be obtained by detecting the corner points based on the binary image.

For the detailed process of corner detection, please refer to the description in the related art, and the detailed description is omitted here.

For another example, in another example, performing keypoint detection on a first text in a first image to obtain keypoint information of the first text may include:

and detecting key points of the first characters in the first image by utilizing a feature extraction mode in Scale-invariant feature transform (SIFT) to obtain key point information of the first characters.

The SIFT algorithm is an algorithm based on local points of interest, insensitive to the scale of the image and to the rotation of the image. Moreover, the SIFT algorithm has better robustness on the influence of noise, illumination and the like.

The SIFT algorithm may mainly comprise the following steps:

(1) Scale space extremum detection

This step identifies potential points of interest in which the scale and direction are unchanged by searching for image locations on all scales using a gaussian difference function.

(2) Key point positioning

This step determines the location and scale at each candidate location using a fitting fine model, the choice of key points being dependent on their degree of stability.

(3) Direction matching (Direction is given to each key point)

The step is to allocate one or more directions for each key point position based on the gradient direction of the local image, and the subsequent operation on the image data is based on the direction, scale and position of the relative key point, so as to obtain invariance of the direction and scale.

(4) Keypoint descriptor

This step computes local image gradients at selected scales within each keypoint domain, which gradients are transformed into a representation that allows for relatively large local shape deformations and illumination variations.

In another example, extracting the first structural information corresponding to the first text may include:

and carrying out stroke detection on the first text in the first image to obtain stroke information of the first text, and taking the stroke information as first structure information corresponding to the first text.

S503, determining a first gesture corresponding to the first text according to the first structure information.

In one example, the first structural information may be a keypoint of the text.

In another example, the first structural information may be stroke information of text.

The first gesture may include a communication relationship between a text direction and a key point in the text.

For example, if it is determined that there are nine key points A1, A2, A3, … … A9 in a certain text by key point detection, then a connection relationship between the key points is formed by which point communicates with which point and which point does not communicate with which point. Taking the point A1 as an example, if the points A1 and A3 are communicated and are not communicated with other points, a connecting line is added between the points A1 and A3, and the other points are processed in the same way. After the communication relationship between the keypoints in a word is determined, the pose of the word is determined.

In one example, determining, according to the first structural information, a first gesture corresponding to the first text may include:

and inputting the first structure information into a trained gesture estimation model, and outputting a first gesture corresponding to the first text by the gesture estimation model.

Wherein the pose estimation model is a model that has been trained.

The pose estimation model may be, for example, a machine learning model, a deep learning model, or the like.

In this embodiment, the gesture estimation model is adopted to obtain gesture information corresponding to the text, so that accuracy of gesture recognition can be improved.

The process of generating the attitude estimation model may include the following steps:

Creating a first model, which may be, for example, a convolutional neural network model CNN, and setting initial parameters of the first model;

collecting sample data;

training the first model by using the sample data to obtain a trained first model, and taking the trained first model as a posture estimation model.

Wherein, the collected sample data may be, for example:

for a known character, the key points of the character are marked manually, the communication relation among the key points is determined, the communication relation among the key points is used as the gesture of the character, and the (character key points and character gesture) is used as a group of sample data. In this way, a large amount of sample data corresponding to the characters can be acquired.

The training of the first model by using the sample data may be performed as follows:

taking the first model trained by the ith (i is a natural number) group as a first model corresponding to the (i+1) th group, wherein the first model corresponding to the (1) th group is an initial first model (the initial first model is the first model created by adopting initial parameters), and executing the following operations:

inputting text key points in the sample data of the group into a first model corresponding to the group so that the first model corresponding to the group outputs gesture information;

Determining the difference value between the output gesture information of the first model corresponding to the group and the character gesture in the sample data of the group;

according to the difference value, adjusting the parameter value of the first model to obtain a first model after the training of the group;

judging whether the training convergence condition is met currently, if so, taking the first model with the trained set as the first model with the trained set, otherwise, if not, executing the training of the next set of sample data.

The training convergence condition may be, for example, that the preset training times are reached, or that a difference value between the output gesture information of the first model corresponding to the set of the first model and the text gesture in the sample data of the set of the first model is smaller than a preset threshold value, or the like.

S504, generating a first text image according to the first gesture and the first image.

For example, according to a first pose of a certain character, a pixel belonging to the character (hereinafter referred to as a character pixel) is determined in a first image, then a first character image including only the character and the background is generated, and the pixel value of the character pixel in the first character image is set to a value greatly different from the pixel value of the background pixel in the first image. For example, if the pixel value of the background pixel point in the first text image is smaller, the pixel value of the text pixel point in the first text image is set to a larger value. Alternatively, if the pixel value of the background pixel point in the first text image is larger, the pixel value of the text pixel point in the first text image is set to a smaller value. Thus, a first text image can be obtained.

S505, fusion processing is carried out on the first text image and the first image, and a first target image corresponding to the first image is obtained.

In one example, performing fusion processing on the first text image and the first image to obtain a first target image corresponding to the first image may include:

and replacing the pixel value of a second pixel point which is positioned at the same position as the first pixel point in the first image with the pixel value of the first pixel point in the first character image to obtain a first target image.

For example. Taking pixel point C and pixel point D included in the first image as an example. The pixel point in the first text image, which is the same as the pixel point C in the first image, is the pixel point C ', and the pixel point in the first text image, which is the same as the pixel point D in the first image, is the pixel point D'.

Assuming that the pixel point C 'is the pixel point where the text in the first text image is located (i.e., text pixel point), the pixel value of the pixel point C in the first image is modified to be the pixel value of the pixel point C' in the first text image.

Assuming that the pixel point D' is not the pixel point where the text in the first text image is located, but the pixel point of the background portion in the first text image, the pixel value of the pixel point D in the first image remains unchanged.

The above is an exemplary description of the fusion processing of the first text image and the first image. In addition, other suitable manners may be adopted for fusion in this embodiment, which will not be described herein.

The image processing method of the embodiment can better protect the image texture details because the image is not subjected to deblurring operation, so that the high-frequency details in the image, namely the characters, can be more natural to the visual perception of people.

To more intuitively explain the image processing procedure in the present embodiment, the present embodiment provides another exemplary diagram of the image processing procedure in the present embodiment—fig. 6.

Fig. 6 is an exemplary diagram of an image processing procedure in the present embodiment exemplarily shown. As shown in fig. 6, in the present embodiment, the image processing process may include the steps of:

receiving an original RGB image (Raw 2 RGB);

detecting a text scene based on the received original RGB image;

performing key point detection and gesture estimation of characters;

according to the detected character key points and the estimated gesture, combining an original Raw image (namely, a received original RGB image) to obtain a clear character scene RGB (namely, an RGB image of clear characters);

And fusing the RGB image of the clear text with the original RGB image to obtain a processed image with clear text corresponding to the original RGB image.

These steps may be sequentially performed in the direction of the arrow shown in fig. 6.

Fig. 7 is an exemplary diagram of a high definition text obtaining process in the image processing process of the present embodiment exemplarily shown. As shown in fig. 7, in the image processing process of the present embodiment, the high definition text obtaining process includes the following steps:

detecting characters from an original image through character detection;

then, detecting key points of the detected characters to obtain character key points;

then determining the character gesture according to the character key points;

and finally, obtaining the image of the high-definition text according to the text gesture and the original image (original raw image).

The image processing method of the embodiment of the application can be applied to shooting more text scenes, and can ensure the readability of the text while improving the film forming rate. Thus, a user who pays attention to the Chinese content in the image can obtain a good use experience.

In one example, an embodiment of the present application further provides an apparatus, which may include: the processor and transceiver/transceiver pins, optionally, also include a memory.

The various components of the device are coupled together by buses, including a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are referred to in the figures as buses.

Alternatively, the memory may be used to store instructions in the image processing method embodiments described above. The processor is operable to execute instructions in the memory and control the receive pin to receive signals and the transmit pin to transmit signals.

The apparatus may be a chip inside the electronic device in the above-described image processing method embodiment.

All relevant contents of each step in any of the foregoing embodiments may be cited to the functional descriptions of the corresponding functional modules, which are not described herein.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory is coupled with the processor, the memory stores program instructions, and when the program instructions are executed by the processor, the electronic equipment can make the electronic equipment execute the image processing method.

It will be appreciated that the electronic device, in order to achieve the above-described functions, includes corresponding hardware and/or software modules that perform the respective functions. The present application can be implemented in hardware or a combination of hardware and computer software, in conjunction with the example algorithm steps described in connection with the embodiments disclosed herein. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The present embodiment also provides a computer storage medium having stored therein computer instructions which, when executed on an electronic device, cause the electronic device to perform the above-described related method steps to implement the image processing method in the above-described embodiments.

The present embodiment also provides a computer program product which, when run on a computer, causes the computer to perform the above-described related steps to implement the image processing method in the above-described embodiments.

In addition, the embodiment of the application also provides a device, which can be a chip, a component or a module, and can comprise a processor and a memory which are connected; the memory is used for storing computer-executable instructions, and when the device is running, the processor can execute the computer-executable instructions stored in the memory, so that the chip executes the image processing method in each method embodiment.

The electronic device, the computer storage medium, the computer program product, or the chip provided in this embodiment are used to execute the corresponding methods provided above, so that the beneficial effects thereof can be referred to the beneficial effects in the corresponding methods provided above, and will not be described herein.

It will be appreciated by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts shown as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Any of the various embodiments of the application, as well as any of the same embodiments, may be freely combined. Any combination of the above is within the scope of the application.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

The steps of a method or algorithm described in connection with the present disclosure may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access Memory (Random Access Memory, RAM), flash Memory, read Only Memory (ROM), erasable programmable Read Only Memory (Erasable Programmable ROM), electrically Erasable Programmable Read Only Memory (EEPROM), registers, hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Claims

1. An image processing method, applied to an electronic device, comprising:

Receiving a first image, and performing text detection on the first image;

when a first character is detected in the first image, extracting first structural information corresponding to the first character;

determining a first gesture corresponding to the first text according to the first structure information;

generating a first text image according to the first gesture and the first image;

and carrying out fusion processing on the first text image and the first image to obtain a first target image corresponding to the first image.

2. The method of claim 1, wherein extracting the first structural information corresponding to the first text comprises:

and detecting key points of the first characters in the first image to obtain key point information of the first characters, wherein the key point information is used as first structure information corresponding to the first characters.

3. The method of claim 2, wherein performing keypoint detection on the first text in the first image to obtain keypoint information of the first text comprises:

4. The method of claim 2, wherein performing keypoint detection on the first text in the first image to obtain keypoint information of the first text comprises:

and detecting key points of the first characters in the first image by utilizing a feature extraction mode in scale invariant feature transform SIFT to obtain key point information of the first characters.

5. The method of claim 1, wherein the first gesture comprises a connected relationship between a literal direction and a keypoint in the literal.

6. The method of claim 1, wherein determining a first pose corresponding to the first text based on the first structural information comprises:

and inputting the first structural information into a trained gesture estimation model, and outputting a first gesture corresponding to the first text by the gesture estimation model.

7. The method of claim 1, wherein extracting the first structural information corresponding to the first text comprises:

8. The method of claim 1, wherein performing fusion processing on the first text image and the first image to obtain a first target image corresponding to the first image comprises:

and replacing the pixel value of a second pixel point which is the same as the first pixel point in the first image with the pixel value of the first pixel point where the characters are in the first character image to obtain a first target image.

9. The method of claim 1, wherein the first image is an original RGB image captured using a high magnification camera.

10. The method of claim 1, wherein the electronic device is a cell phone or tablet.

11. An electronic device, comprising:

a memory and a processor, the memory coupled with the processor;

the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the image processing method of any one of claims 1 to 10.

12. A computer readable storage medium comprising a computer program, characterized in that the computer program, when run on an electronic device, causes the electronic device to perform the image processing method according to any one of claims 1 to 10.