CN112882678A

CN112882678A - Image-text processing method, display method, device, equipment and storage medium

Info

Publication number: CN112882678A
Application number: CN202110276188.2A
Authority: CN
Inventors: 龙云翔; 姚刚
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2021-06-01
Anticipated expiration: 2041-03-15
Also published as: CN112882678B

Abstract

The application discloses an image-text processing method, an image-text display device, image-text display equipment and a storage medium, relates to the technical field of image processing, and particularly relates to an artificial intelligence technology and a computer vision technology. The specific implementation scheme is as follows: according to the screen display size, carrying out image cutting processing on the target image in which the images and texts are arranged in a mixed mode to form at least two sub-images; performing character recognition on the text in the sub-picture to obtain a character recognition result; establishing a corresponding position relation between a character recognition result and the text position of the text in the target picture; and the character recognition result is used for displaying the character recognition result according to the corresponding position relation in the process of displaying the target picture in a rolling mode in a screen. The technical scheme of the embodiment of the disclosure can effectively process pictures with mixed pictures and texts and provide the matching display effect of the character recognition result.

Description

Image-text processing method, display method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an artificial intelligence technology and a computer vision technology.

Background

Intelligent terminal devices have become widely popular in people's life and work, and can be used to browse various forms of information, such as text, audio, video, pictures, and the like. In various applications, different format types of information may also be provided to the user as desired.

For the information of the picture type, there is a problem that the screen size of the terminal device needs to be adapted. For example, when a picture is displayed on a small-screen terminal device, the picture is typically compressed or otherwise adjusted to fit the small screen due to the small screen size.

However, the user cannot see the content information in the picture clearly by compressing the picture, and the browsing experience is not good. If the layout of the picture content is readjusted, an extra data processing amount is increased, and it is difficult to deal with a large amount of picture information to be displayed.

Disclosure of Invention

The disclosure provides an image-text processing method, an image-text display device, image-text display equipment and a storage medium.

According to an aspect of the present disclosure, there is provided an image-text processing method, including:

according to the screen display size, carrying out image cutting processing on the target image in which the images and texts are arranged in a mixed mode to form at least two sub-images;

performing character recognition on the text in the sub-picture to obtain a character recognition result;

establishing a corresponding position relation between a character recognition result and the text position of the text in the target picture; and the character recognition result is used for displaying the character recognition result according to the corresponding position relation in the process of displaying the target picture in a rolling mode in a screen.

According to another aspect of the present disclosure, there is provided a method for displaying pictures and texts, applied to a client, the method including:

loading a target picture in which pictures and texts are mixed, a character recognition result of the target picture and a corresponding position relation between the character recognition result and the target picture;

and displaying the target picture in a rolling way in a screen of the terminal where the client is positioned, and displaying the character recognition result according to the corresponding position relation in the process of displaying the target picture in a rolling way.

According to another aspect of the present disclosure, there is provided an image-text processing apparatus including:

the image cutting module is used for carrying out image cutting processing on the target image in image-text mixed arrangement according to the screen display size so as to form at least two sub-images;

the character recognition module is used for carrying out character recognition on the text in the sub-picture to obtain a character recognition result;

the position relation establishing module is used for establishing a corresponding position relation between a character recognition result and the text position of the text in the target picture; and the character recognition result is used for displaying the character recognition result according to the corresponding position relation in the process of displaying the target picture in a rolling mode in a screen.

According to another aspect of the present disclosure, there is provided a graphics presentation apparatus configured at a client, the apparatus including:

the data loading module is used for loading a target picture in which pictures and texts are mixed, a character recognition result of the target picture and a corresponding position relation between the character recognition result and the target picture;

and the data display module is used for displaying the target picture in a rolling way in a screen of the terminal where the client is positioned, and displaying the character recognition result according to the corresponding position relation in the process of displaying the target picture in a rolling way.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a teletext processing method or a teletext presentation method provided in any of the embodiments of the disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a teletext processing method or a teletext presentation method provided in any embodiment of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, which when executed by a processor, implements a teletext processing method or a teletext presentation method provided in any embodiment of the present disclosure.

The technical scheme of the embodiment of the disclosure can effectively process pictures with mixed pictures and texts and provide the matching display effect of the character recognition result.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flowchart of an image-text processing method according to an embodiment of the present application;

fig. 2A is a flowchart of another image-text processing method provided in the embodiment of the present application;

FIG. 2B is a schematic diagram of a target picture according to an embodiment of the present disclosure;

fig. 3 is a flowchart of another image-text processing method provided in the embodiment of the present application;

fig. 4 is a flowchart of another image-text processing method provided in the embodiment of the present application;

fig. 5 is a flowchart of an image-text displaying method according to an embodiment of the present application;

fig. 6 is a block diagram of an image-text processing apparatus according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a graphic display device according to an embodiment of the present disclosure;

FIG. 8 is a block diagram of an electronic device used to implement an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flowchart of an image-text processing method according to an embodiment of the present application, and the embodiment is suitable for processing an image-text mixed picture to adapt to display of the image on a screen of a terminal device. The technical scheme of the embodiment of the application is particularly suitable for the situation that a long picture is written on a small-size screen. The present embodiment may be implemented by a graphics processing apparatus implemented in hardware and/or software, which may be configured on an electronic device having data processing capabilities. The electronic equipment can be a server side, namely, the server side processes pictures in advance and then the pictures are loaded and displayed by the terminal. Or, the electronic device may also be a terminal, and the long picture loaded to the local is processed and then displayed by the terminal.

As shown in fig. 1, the method includes:

s110, carrying out image cutting processing on the target image in image-text mixed arrangement according to the screen display size to form at least two sub-images;

the screen display size refers to a size of a display window of the terminal screen for displaying the picture content, and is generally rectangular and can be represented by a length and a width. According to the habit of viewing the content of the user, the user usually scrolls up and down in a vertical screen state to view the content in the display window, the picture is usually in a long picture mode, namely the length of the picture is far larger than the height of the screen, and the picture is browsed by the up-and-down scrolling of the user. Therefore, the screen display size is mainly considered to be the height size of the screen. Of course, it will be understood by those skilled in the art that other dimensions of the screen presentation, such as width, may be considered if the scrolling is lateral.

The embodiment of the application mainly aims at target pictures with mixed pictures and texts, such as advertisement pictures, cartoon pictures and the like, and the samples of picture and text combination exist.

Before character recognition is needed to be carried out on a target picture, because the recognition area is limited, picture cutting processing needs to be carried out firstly, and the target picture is cut into a plurality of sub-pictures. Specifically, the target picture can be segmented from the height dimension according to the height of the screen display size, and the height of the sub-picture is slightly greater than, the same as or slightly smaller than the height of the screen display size.

Optionally, the graph cutting operation specifically includes: and in the loading process of the target picture, carrying out image cutting processing on the loaded part of the target picture according to the screen display size to form at least two sub-pictures.

That is, in the technical solution of this embodiment, the cutting may be performed during the loading process of the target picture, and it is not necessary to perform the cutting after the target picture is added. The rule of the graph cut may be to cut according to the size of the target picture itself, for example, to divide equally according to the total length of the target picture. The embodiment preferably determines the segmentation rule according to the screen display size, and the target picture can be segmented as long as the loaded part of the target picture meets the segmentation condition, without waiting for the completion of loading all pictures. The technical scheme is particularly suitable for the condition that the client loads the pictures from the server side for displaying. The client loads the pictures from the server, or loads and displays the pictures at the same time, if the user does not interest in browsing to the middle part, the user may stop browsing and exit, and at this time, the loading may be stopped. The processing is carried out in the loading process, so that the pictures can be provided for the client side to be displayed more timely without waiting for the completion of all loading.

In this embodiment, there are many specific cutting rules, and optionally, the cutting processing on the target picture in which the images and texts are arranged in a mixed manner according to the screen display size to form at least two sub-pictures includes:

performing image cutting processing on a target image in image-text mixed arrangement according to a slice cutting height determined based on the screen display size to form at least two sub-images;

and the slice intercepting height is a set numerical value which is larger than, equal to or smaller than the screen display size.

The above operation is picture cropping, and after obtaining an original target picture, especially a long picture, in order to increase the end loading speed and match the small screen size, the picture cropping and compression are required. Namely, the cartoon is formed by splicing N sub-pictures in sequence. The slice truncation heights may be the same or different during a single slice.

S120, performing character recognition on the text in the sub-picture to obtain a character recognition result;

and performing character recognition on each sub-picture after segmentation to recognize and obtain characters. The character-based recognition result can be directly determined, and can also be further processed as required to obtain a character recognition result. For example, the text is further converted into audio, which can also be used as a text recognition result. There are two ways of text-to-audio (TTS): firstly, offline storage is performed in advance. And secondly, synthesizing and playing the voice by TTS in real time according to the browsing condition of the picture.

Specifically, performing character recognition on the text in the sub-picture to obtain a character recognition result includes:

performing character recognition on the text in the sub-picture to obtain recognized characters;

and carrying out voice conversion on the recognized characters to obtain the character recognition result in a voice form.

There are various Character Recognition means, for example, an Optical Character Recognition (OCR) means is used. And the character recognition interface provided by other programs can be called to complete the character recognition function. If the OCR software is called, the OCR software interface returns corresponding line coordinates (namely the coordinates of the upper left corner of the rectangle where the current recognition line is located) and characters contained in the line according to the recognition line. The method can adopt the sub-picture of the original picture to carry out OCR recognition, and can also compress the picture according to proper resolution ratio to adapt to small-screen equipment, thereby saving data flow.

For text character recognition, a text box in the sub-picture is generally determined, for example, a line text box where the text is located, a minimum rectangular box surrounding the text, or other outline text boxes surrounding the text. More distant text may be divided into different text boxes. And then recognizing the characters in the text box.

S130, establishing a corresponding position relation between a character recognition result and the text position of the text in the target picture;

and the character recognition result is used for displaying the character recognition result according to the corresponding position relation in the process of displaying the target picture in a rolling mode in a screen.

The text corresponding to the character recognition result may be an absolute position coordinate or a relative position relationship in the text position in the target picture, which is not limited in the embodiment of the present application as long as the position relationship between the original text and the target picture can be indicated. For example, if the character recognition result is corresponding to a text box, the corresponding position relationship may be the position of the text box in the target picture.

The character recognition result is used for being displayed in a matched mode when the target picture is displayed in the screen, and the character recognition result is displayed when the target picture is rolled to the corresponding position, so that a user can browse the target picture and synchronously acquire the character recognition result, and the effect of browsing the target picture by the user is effectively supplemented or enhanced.

The technical scheme of the embodiment of the application is particularly suitable for terminal equipment with a small-size screen (for example, the screen size is less than 3 inches). For example, wearable small-screen smart devices such as smartwatches, children's smart toy devices, etc. typically have a smaller display screen than a cell phone screen. If a long image is displayed on a small-sized screen, if the width of the image is made to be the same as the width of the screen, the image is often reduced, and the user cannot see the image content clearly. If the length of the picture is too large, it is inconvenient for the user to zoom to view the text. If the long images are subjected to the rearrangement suitable for the screen size, the calculation amount for respectively rearranging the massive long images to be displayed is extremely large, and the requirement for efficiently and quickly displaying the images is not met. According to the technical scheme of the embodiment of the application, the target picture is adapted to the width size of the screen for rolling display, and the character recognition result is synchronously displayed in the rolling process, so that extra subtitles can be displayed, or the voice generated by the characters can be played. Therefore, the effect of assisting the user to know the picture and the text can be achieved. According to the technical scheme, the standardization scheme of cutting, character recognition and position correspondence is adopted for various long graphs, the method and the device have universal applicability for the long graphs of various contents, the processing amount is low, and the method and the device can be used for processing the long graphs quickly to provide the long graphs for users.

When the server provides the loading and displaying service of the target picture for the client, the server can execute the scheme of the embodiment in advance to process the target picture in advance to form a character recognition result; or, the server side can provide a processing process of character recognition in real time in the loading process to form a character recognition result; the technical scheme of the embodiment of the application can be further completed by the image-text processing plug-in configured in the client, when the client loads the target image to the local, the local image-text processing plug-in is called to perform image cutting, character recognition and position matching, and therefore the character recognition result is displayed in the image browsing process.

Fig. 2A is a flowchart of another image-text processing method provided in the embodiment of the present application, and the present embodiment further introduces an alternative implementation of the image cutting operation process based on the technical solution provided in the foregoing embodiment.

In this embodiment, there is an overlapping redundant area before two adjacent sub-pictures; the redundant area is located at the upper edge and/or the lower edge of the sub-picture. For the sliced sub-picture, there may be some text spanning the two sub-pictures, as shown in fig. 2B, the long picture is sliced into 4 sub-pictures, and the text "watch cartoon" spans the 1 st and 2 nd sub-pictures. If the pictures are split in this way, it is difficult to correctly recognize the text content of the watch cartoon in the 1 st and 2 nd sub-pictures. Therefore, in the present embodiment, a redundant area is provided, and at the upper edge and/or the lower edge of the sub-picture, a redundant area is provided, which overlaps with the adjacent sub-picture. As shown in fig. 2B, a redundant area (20 pixels) exists at the lower edge of the 1 st sub-picture, and belongs to the 1 st sub-picture. Therefore, the 1 st sub-picture completely comprises characters of 'watch cartoon', and the characters can be conveniently and accurately identified.

As shown in fig. 2A, the present embodiment includes:

s210, determining the size of the redundant area according to the content type and/or the text font size of the target picture;

the redundant area is set to avoid the characters from crossing two sub-pictures, so the redundant area can be set according to the size of the text font. Since the text in the picture may have various sizes, it may be considered that the size of the redundant area is set, for example, according to the maximum text font. The content type of the target picture can also indirectly reflect the size of the text font. For example, the font size of text in caricatures is generally different from the font size of text in children's training textbooks. Therefore, the size of the redundant area can also be set according to the content type of the target picture. The content type and/or the text font size may be dynamically identified after the picture is obtained, but it is preferable that the target picture generally has a preset content tag, and the size of the redundant area can be directly determined according to the preset content tag.

S220, according to the screen display size, carrying out image cutting processing on the target image in image-text mixed arrangement to form at least two sub-images;

s230, performing character recognition on the text in the sub-picture to obtain a character recognition result;

s240, establishing a corresponding position relation between a character recognition result and the text position of the text in the target picture; and the character recognition result is used for displaying the character recognition result according to the corresponding position relation in the process of displaying the target picture in a rolling mode in a screen.

According to the technical scheme of the embodiment of the application, the accuracy of character recognition can be fully considered, and the redundant area is reserved.

Fig. 3 is a flowchart of another image-text processing method according to an embodiment of the present application, and the present embodiment further introduces a determination method for determining a corresponding position relationship between a text recognition result and a target image based on the foregoing embodiment. The embodiment comprises the following steps:

s310, according to the screen display size, carrying out image cutting processing on the target image in image-text mixed arrangement to form at least two sub-images;

s320, performing character recognition on the text in the sub-picture to obtain a character recognition result;

S330, adjusting the absolute coordinate position of the text of the character recognition result in the sub-picture to be the absolute coordinate position in the target picture according to the position of each sub-picture in the target picture;

for the target picture, the coordinate value may be used to identify the location of the content in the picture. The starting position of the coordinate system may be set to be (0,0) point, for example, at the upper left corner of the target icon, and the coordinate unit may be a pixel point. Referring to FIG. 2B, the coordinate system has an x-axis to the left along the starting point and a y-axis to the down along the starting point. The absolute coordinate position of each position in the picture can be identified by the coordinate value of the coordinate system. The position of each sub-picture in the target picture can be expressed by the absolute coordinate position of the upper left corner of the sub-picture and the length of the sub-picture. For example, as shown in fig. 2B, the position of the 2 nd sub-picture is (0,60), the upper left corner is (0,60), and the length is 60 pixels.

Based on the character recognition result recognized by the sub-picture, the position of which is relative to the sub-picture, for example, as shown in fig. 2B, the character recognition result of "broken coordinates" has the coordinates of the upper left corner of the text box in the 2 nd sub-picture of (10,10), which is the coordinates determined with the upper left corner of the 3 rd sub-picture as the origin (0, 0). Then the absolute coordinate position of the word recognition result in the target picture can be adjusted to (10,130) according to the absolute coordinate position (10,10) of the word recognition result in the sub-picture and the position (0,120) of the 3 rd sub-picture in the target picture. Therefore, each character recognition result is equivalent to a target picture with uniform coordinate position expression.

S340, clustering the character recognition result according to the absolute coordinate position of the text of the character recognition result in the target picture;

before determining the position relationship, the text is also required to be clustered based on semantics. For example, OCR is a unit of line to recognize text without semantic concepts. It is necessary to aggregate the characters of the line units into semantically related sentences. Some sentences may occupy multiple rows. In the picture with mixed pictures and texts, the position of the text box is random, for example, two text boxes may be arranged side by side and are different sentences, so that clustering processing is required. For example, a dialogue bubble in a cartoon is text with different semantics.

And S350, determining the attribution relationship between the clustered word recognition result and the sub-picture as the corresponding position relationship according to the absolute coordinate position of the text of the clustered word recognition result in the target picture.

The clustered text box may be in the shape of a rectangle, and the absolute coordinate position of the text box in the target picture can be expressed by the upper left corner of the rectangle and size information such as height and width. And further determining which sub-picture the character recognition result belongs to according to the absolute coordinate position. Optionally, determining that the text of the clustered character recognition result belongs to the sub-picture with the largest occupied area according to the absolute coordinate position of the text of the clustered character recognition result in the target picture. Alternatively, when the area occupied by the text box in the sub-picture exceeds one half of the total area of the text box, the text box can be determined to belong to the sub-picture.

And after determining the sub-picture to which the character recognition result belongs, displaying the character recognition result belonging to the currently displayed sub-picture when the character recognition result is used for displaying the sub-picture in the target picture in a rolling manner in a screen. That is, when the target picture is scrolled and browsed in the screen, if it is monitored that a certain sub-picture enters the screen range, the character recognition result corresponding to the sub-picture is displayed. The area of the sub-picture entering the screen reaches a set proportion, for example, one half, and the sub-picture can be regarded as entering the screen range.

Text located in the redundant area may be included in both of the adjacent sub-pictures. For example, as shown in fig. 2B, the "watch cartoon" belongs to the 1 st sub-picture and the 2 nd sub-picture at the same time, and the text including the "watch cartoon" is repeated in the text recognition results of the two sub-pictures. At this time, two character recognition results for recognizing the "watch cartoon" may be generated, and after the absolute coordinate positions of the text boxes where the two character recognition results are located are respectively determined, deduplication processing may be performed based on the absolute coordinate positions of the texts before or after clustering. When the outline and position of two text boxes are highly similar, the two text boxes are considered to be the same text box, and the character recognition result of one repeated text box can be removed.

According to the technical scheme, the position of the character recognition result can be effectively expressed and adjusted, semantic-based clustering and sub-picture attribution are performed in the aspect, and therefore the character recognition result is synchronously displayed when the sub-pictures are displayed in a rolling mode.

Fig. 4 is a flowchart of another image-text processing method provided in the embodiment of the present application, and the present embodiment further introduces an implementation scheme for semantic clustering on a text based on the foregoing embodiment. The method comprises the following steps:

s410, according to the screen display size, carrying out image cutting processing on the target image in image-text mixed arrangement to form at least two sub-images;

s420, performing character recognition on the text in the sub-picture by taking a set unit area as an object to obtain recognition characters in the set unit area;

wherein the set unit area is a row, a column or an area with set shape and size. Optionally, the text box in the sub-picture is determined in units of lines, and character recognition is performed.

S430, according to the position of each sub-picture in the target picture, adjusting the absolute coordinate position of the text of the character recognition result in the sub-picture to be the absolute coordinate position in the target picture;

s440, clustering at least two set unit areas according to the absolute coordinate position of the text of the character recognition result in the target picture;

taking a line as an example of the setting unit area, the text recognition result at this time is a line of text, and the line text with a distance satisfying the setting requirement can be clustered according to the position of the line text in the target picture. In general, closer proximity is understood to mean a focused expression.

Optionally, the clustering at least two set unit regions according to the absolute coordinate position of the text of the character recognition result in the target picture includes:

traversing the character recognition results of each set unit area from the set starting point position in the target picture according to the absolute coordinate position of the text of the character recognition result in the target picture;

clustering the set unit areas with the spacing distance in the set direction smaller than the distance threshold value; wherein the set direction includes at least one of a lateral direction, a vertical direction, and an oblique direction.

For example, in the above scheme, the line texts with smaller spacing distance may be clustered. The distance direction may be at least one of a lateral direction, a vertical direction, and a diagonal direction. That is, a sentence can be considered to be composed of several lines of recognized text that are the closest in distance.

S450, merging the identification characters in each set unit area of the cluster, and obtaining a character identification result according to the identification characters after merging;

for example, all the text lines of the OCR result are grouped into a KD-tree (K-dimensional tree) according to the restored absolute coordinate positions. The KD-tree is a high-dimensional index tree data structure for performing Nearest Neighbor lookups (Nearest neighbors) and Approximate Nearest Neighbor lookups (Approximate Nearest neighbors) in a large-scale high-dimensional data space. The set starting point position may be the origin of coordinates (0, 0). A closest point (for example, the position of the upper left corner of the text box where the text recognition result is located) is found from the (0,0) point as the first line a of the first sentence S1 (in this case, a is also the last line of S1), a is used as the current line to find the closest line b, and if the distance between the line a and the line b does not exceed the distance threshold value t (for example, it may be adjusted according to the size of the comic font, and t is set to 15px (pixels)), it is considered that the line b also belongs to the sentence S1, and in this case, the line b is the last line of the sentence S1. And so on until the line spacing closest to the last line of sentence S1 is greater than t or there are no more lines, then sentence S1 recognition is considered complete. This process is repeated until all text line nodes on the KD-tree are found, and all sentences are available. All sentences are then sorted (coordinates of the first row of sentences as coordinates of the sentence to which they belong), typically in order from top to bottom and from left to right.

And S460, determining the attribution relationship between the clustered word recognition result and the sub-picture as the corresponding position relationship according to the absolute coordinate position of the text of the clustered word recognition result in the target picture.

According to the technical scheme, the dispersed texts can be clustered, and the texts can be conveniently displayed in a centralized manner in cooperation with the display process of the sub-pictures.

Based on the determined character recognition result and the corresponding position relation corresponding to the target picture, summarizing and converting the character recognition result into structured data can include: the length and width of the slice of the character recognition result (caption or voice audio) sub-picture, the sub-picture to which the character recognition result belongs, and the positioning coordinate of the character recognition result in the target picture. The structured data can be stored together with the target picture or separately for the client to load and display.

Fig. 5 is a flowchart of an image-text displaying method according to an embodiment of the present application. The image-text display method provided by the embodiment of the application is applied to the client side and is suitable for the situation that the client side displays the processed image-text mixed arrangement image in cooperation with the character recognition result. The embodiment may be implemented by a graphic presentation apparatus, which may be implemented by hardware and/or software, and may be configured in a terminal device as a client or a client plug-in. As shown in fig. 5, the method includes:

s510, loading a target picture in image-text mixed arrangement, a character recognition result of the target picture and a corresponding position relation between the character recognition result and the target picture;

in this embodiment, the text recognition result and the corresponding position relationship can be obtained by using the image-text processing method provided in the embodiment of the present application. The character recognition result and the corresponding position relation can be generated at the server side, and the character recognition result and the corresponding position relation can also be generated at the client side.

When the client needs to display the target picture under the control of the user, the data of the target picture is loaded, and at the moment, the character recognition result and the corresponding position relation can be loaded simultaneously or asynchronously.

When a target picture with mixed pictures and texts is loaded through a third-party application, a picture and text processing plug-in is called to process the target picture so as to generate a text recognition result of the target picture and a corresponding position relation between the text recognition result and the target picture. The client is, for example, a third-party application, and the third-party application may be configured with a plug-in capable of performing graphics-text processing, or the third-party application may call a plug-in with graphics-text processing installed in the terminal to generate and display a text recognition result and a corresponding position relationship.

Preferably, the client may display the loaded portion of the target picture in a scrolling manner in a screen of a terminal where the client is located during the process of loading the target picture. That is, when the client does not completely record the completed target picture, the target picture can be displayed.

S520, displaying the target picture in a rolling mode in a screen of the terminal where the client is located, and displaying the character recognition result according to the corresponding position relation in the process of displaying the target picture in the rolling mode.

The operation can be selected as follows:

in the process of displaying the target picture in a rolling mode, determining a corresponding character recognition result according to the corresponding position relation;

and displaying the character recognition result in a set subtitle area in the screen, and/or playing the character recognition result in a voice form.

In a specific operation process, when the target picture is displayed in a scrolling mode, the position relation of the sub-picture or the absolute coordinate position in the target picture relative to the screen can be determined. And when the sub-picture or the absolute coordinate position meets the set display position condition, determining a corresponding character recognition result according to the corresponding position relation, and displaying. For example, when the area range of the sub-picture entering the screen is larger than half of the area of the sub-picture, the sub-picture can be considered as entering the screen.

The character recognition result can be displayed in a set caption area, and the set caption area can be overlapped with or not overlapped with the target picture. For example, for the overlapping case, the character recognition result can be displayed as a bullet screen superimposed on the target picture. At this time, the font size of the characters in the character recognition result can be controlled to accord with the browsing habit of the user and be larger than the set font size threshold.

And playing the character recognition result in a voice form. The caption display and the voice playing can be realized simultaneously. In order to show the effect and pause between sentences in speech synthesis, all sentences of each sub-picture are separated by line breaks.

According to the technical scheme of the embodiment of the application, the picture can be automatically cut into long pictures for pictures and texts in a mixed mode, the characters are automatically identified and coded into real-time subtitles, the positions of the characters are located, and the like, so that the scheme of automatically displaying the subtitles of the current picture area and the text content in a voice broadcast mode along with picture browsing is realized. According to the technical scheme of the embodiment of the application, the existing image-text mixed long images, such as cartoon image resources, can be directly accessed to the client side for browsing without manual secondary editing.

Fig. 6 is a block diagram of an image-text processing apparatus according to an embodiment of the present disclosure, which is applicable to the image-text processing method according to the embodiment of the present disclosure, and has corresponding functions and advantages. The device includes: a graph cutting module 610, a character recognition module 620 and a position relation establishing module 630.

The image cutting module 610 is configured to perform image cutting processing on a target image in which images and texts are mixed according to a screen display size to form at least two sub-images;

the character recognition module 620 is configured to perform character recognition on the text in the sub-picture to obtain a character recognition result;

a position relationship establishing module 630, configured to establish a corresponding position relationship between a text position in the target picture and a text recognition result; and the character recognition result is used for displaying the character recognition result according to the corresponding position relation in the process of displaying the target picture in a rolling mode in a screen.

Optionally, the map cutting module is specifically configured to:

and in the loading process of the target picture, carrying out image cutting processing on the loaded part of the target picture according to the screen display size to form at least two sub-pictures.

Optionally, the map cutting module is specifically configured to:

Optionally, an overlapping redundant area exists before two adjacent sub-pictures; the redundant area is located at the upper edge and/or the lower edge of the sub-picture.

Optionally, the apparatus further comprises:

and the redundant area determining module is used for performing image cutting processing on the target image in image-text mixed arrangement according to the screen display size to determine the size of the redundant area according to the content type and/or the text font size of the target image before forming at least two sub-images.

Optionally, the position relationship establishing module includes:

the position adjusting unit is used for adjusting the absolute coordinate position of the text of the character recognition result in the sub-picture to be the absolute coordinate position in the target picture according to the position of each sub-picture in the target picture;

the clustering unit is used for clustering the character recognition result according to the absolute coordinate position of the text of the character recognition result in the target picture;

and the attribution determining unit is used for determining the attribution relationship between the clustered word recognition result and the sub-picture as the corresponding position relationship according to the absolute coordinate position of the text of the clustered word recognition result in the target picture.

Optionally, the character recognition module is specifically configured to: performing character recognition on the text in the sub-picture by taking a set unit area as an object to obtain recognition characters in the set unit area; wherein, the set unit area is a row, a column or an area with set shape and size;

correspondingly, the clustering unit comprises:

the region clustering subunit is used for clustering at least two set unit regions according to the absolute coordinate position of the text of the character recognition result in the target picture;

and the character merging subunit is used for merging the identification characters in each set unit area of the cluster and obtaining a character identification result according to the identification characters after merging.

Optionally, the region clustering subunit is specifically configured to:

Optionally, the attribution determining unit is specifically configured to:

and determining that the text of the clustered character recognition result belongs to the sub-picture with the largest occupied area according to the absolute coordinate position of the text of the clustered character recognition result in the target picture.

Optionally, the character recognition result is used for displaying the character recognition result belonging to the currently displayed sub-picture when the sub-picture in the target picture is displayed in a scrolling manner in the screen.

Optionally, the character recognition module is specifically configured to:

Optionally, the device is configured in a server or is a graphics processing plug-in configured in a client.

According to the technical scheme, the standardization scheme of cutting, character recognition and position correspondence is adopted for various long graphs, the method and the device have universal applicability for the long graphs of various contents, the processing amount is low, and the method and the device can be used for processing the long graphs quickly to provide the long graphs for users.

Fig. 7 is a block diagram of a structure of an image-text displaying apparatus provided in the embodiment of the present application, where the apparatus may be configured at a client, and may implement the image-text displaying method provided in the embodiment of the present application, and has corresponding functions and beneficial effects. The device comprises: a data loading module 710 and a data presentation module 720.

The data loading module 710 is configured to load a target picture in which pictures and texts are mixed, a text recognition result of the target picture, and a corresponding position relationship between the text recognition result and the target picture;

and the data display module 720 is configured to display the target picture in a rolling manner on a screen of the terminal where the client is located, and display the character recognition result according to the corresponding position relationship in the process of displaying the target picture in a rolling manner.

Optionally, the data loading module is specifically configured to:

and in the process of loading the target picture, displaying the loaded part of the target picture in a rolling way in a screen of a terminal where the client is positioned.

Optionally, the data display module is specifically configured to:

Optionally, the data loading module is specifically configured to:

when a target picture with mixed pictures and texts is loaded through a third-party application, a picture and text processing plug-in is called to process the target picture so as to generate a text recognition result of the target picture and a corresponding position relation between the text recognition result and the target picture.

According to the technical scheme, pictures with pictures and texts in a mixed mode can be browsed in a matched mode according to the character recognition result, and the browsing requirements of users are met.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as a teletext processing method or a teletext presentation method. For example, in some embodiments, the teletext processing method or teletext presentation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by the computing unit 801, the computer program may perform one or more of the steps of the teletext processing method or teletext presentation method described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the teletext processing method or teletext presentation method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image-text processing method comprises the following steps:

2. The method of claim 1, wherein the cutting the graphics-text-mixed target picture according to the screen display size to form at least two sub-pictures comprises:

3. The method according to claim 1 or 2, wherein the cutting the graphics-text-mixed target picture according to the screen display size to form at least two sub-pictures comprises:

4. The method according to claim 1 or 2, wherein two adjacent sub-pictures are preceded by overlapping redundant areas; the redundant area is located at the upper edge and/or the lower edge of the sub-picture.

5. The method of claim 4, wherein before the cutting the graphics-text-mixed target picture according to the screen display size to form at least two sub-pictures, the method further comprises:

and determining the size of the redundant area according to the content type and/or the text font size of the target picture.

6. The method of claim 1, wherein establishing a corresponding positional relationship between the word recognition result and the text position of the text in the target picture comprises:

adjusting the absolute coordinate position of the text of the character recognition result in the sub-picture to be the absolute coordinate position in the target picture according to the position of each sub-picture in the target picture;

clustering the character recognition result according to the absolute coordinate position of the text of the character recognition result in the target picture;

and determining the attribution relationship between the clustered word recognition result and the sub-picture as the corresponding position relationship according to the absolute coordinate position of the text of the clustered word recognition result in the target picture.

7. The method of claim 6, wherein performing word recognition on the text in the sub-picture to obtain a word recognition result comprises:

performing character recognition on the text in the sub-picture by taking a set unit area as an object to obtain recognition characters in the set unit area; wherein, the set unit area is a row, a column or an area with set shape and size;

correspondingly, according to the absolute coordinate position of the text of the character recognition result in the target picture, the clustering processing of the character recognition result comprises the following steps:

clustering at least two set unit areas according to the absolute coordinate position of the text of the character recognition result in the target picture;

and merging the identification characters in each set unit area of the cluster, and obtaining a character identification result according to the identification characters after merging.

8. The method of claim 7, wherein clustering at least two set unit areas according to the absolute coordinate position of the text of the character recognition result in the target picture comprises:

9. The method of claim 6, wherein determining the attribution relationship between the clustered character recognition results and the sub-picture according to the absolute coordinate position of the text of the clustered character recognition results in the target picture, as the corresponding position relationship, comprises:

10. The method according to claim 1 or 6, wherein the character recognition result is used for displaying the character recognition result belonging to the currently displayed sub-picture when the sub-picture displayed in the target picture is scrolled in the screen.

11. The method of claim 1, wherein performing word recognition on the text in the sub-picture to obtain a word recognition result comprises:

12. The method of claim 1, wherein the execution subject of the method is a teletext plug-in configured in the server or the client.

13. An image-text display method is applied to a client, and comprises the following steps:

14. The method of claim 13, wherein scrolling the target picture in the screen of the terminal where the client is located comprises:

15. The method of claim 13, wherein, in the process of displaying the target picture in a scrolling manner, displaying the character recognition result according to the corresponding position relationship comprises:

16. The method of claim 13, wherein loading the target picture in the mixed image-text arrangement, the text recognition result of the target picture, and the corresponding position relationship between the text recognition result and the target picture comprises:

17. An image-text processing apparatus comprising:

18. An image-text display device configured at a client, the device comprising:

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the teletext processing method of any one of claims 1-12 or the teletext presentation method of any one of claims 13-16.

20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the teletext processing method of any one of claims 1-12 or the teletext presentation method of any one of claims 13-16.

21. A computer program product comprising a computer program which, when executed by a processor, implements the teletext processing method of any one of claims 1-12 or implements the teletext presentation method of any one of claims 13-16.