CN112199545A

CN112199545A - Keyword display method and device based on picture character positioning and storage medium

Info

Publication number: CN112199545A
Application number: CN202011316753.5A
Authority: CN
Inventors: 吴俊洋; 王晓斌
Original assignee: Hunan Eefung Software Co ltd
Current assignee: Hunan Eefung Software Co ltd
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2021-01-08
Anticipated expiration: 2040-11-23
Also published as: CN112199545B

Abstract

The invention provides a keyword display method based on picture character positioning, which comprises the following steps: acquiring a picture to be detected; identifying characters in the picture to be detected to obtain an identification result, wherein the identification result comprises the identified characters and coordinates corresponding to each character area; matching in the recognition result based on the target keyword to obtain a matching result; if the characters corresponding to the target keywords are matched in the matching result, acquiring a target character area containing the matching result from the picture to be detected; and displaying the target keyword area by calculation based on a preset display rule and the coordinate corresponding to the target character area. The user can judge whether the picture to be detected contains the target keyword or not and can quickly find the position of the target keyword.

Description

Keyword display method and device based on picture character positioning and storage medium

Technical Field

The invention relates to the technical field of picture display, in particular to a keyword display method and device based on picture character positioning and a storage medium.

Background

In recent years, with the rapid increase of the number of users of social platforms such as microblogs, instagrams and the like, people are willing to publish and forward own life interests and smells on the platforms in the form of pictures or other pictures, the pictures spread on the platforms reach a mass level, the pictures with text information become a novel blog carrier, the influence of the novel blog carrier is the same as that of traditional blossoms, and the picture form even has better operability and greater attraction than the traditional blog. The quality of characters in the pictures is uneven, the information concealment is high, although the transmission efficiency of the characters cannot be achieved through the transmission of the character information in the picture form, the examination and verification of relevant departments are easily avoided, the transmission of bad information is increased, and the public opinion guidance of some hot events is controlled. In the era of diversified public opinion transmission ways, how to quickly screen out key pictures from massive pictures and quickly find key information is a direction worth paying attention.

Although the OCR application at present can extract the character information in the picture, it can only perform positioning detection and recognition on all characters in the target picture. On one hand, the workload of departments such as platform managers and network security is increasing, and pictures containing sensitive characters need to be effectively monitored; on the other hand, most social platforms cannot search the blog article only containing the picture through the keyword, and cannot quickly find the key information contained in the target picture. This situation leaves the user missing a lot of valuable information when retrieving content. The user inputs interested contents and vocabularies in a foreground search box, and the system can feed back all matched pictures and mark key information in the pictures, which is a requirement that the current OCR application cannot be deeply realized and is a difficult problem to be solved by the invention.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a keyword display method, a keyword display device and a storage medium based on picture character positioning, aiming at enabling a user to judge whether a picture to be detected contains a target keyword and quickly find the position of the target keyword.

The invention is realized by the following steps:

the invention provides a keyword display method based on picture character positioning, which comprises the following steps:

acquiring a picture to be detected;

identifying characters in the picture to be detected to obtain an identification result, wherein the identification result comprises the identified characters and coordinates corresponding to each character area;

matching in the recognition result based on the target keyword to obtain a matching result;

if the characters corresponding to the target keywords are matched in the matching result, acquiring a target character area containing the matching result from the picture to be detected;

and displaying the target keyword area by calculation based on a preset display rule and the coordinate corresponding to the target character area.

In one implementation manner, the step of identifying the characters in the picture to be detected and obtaining the identification result includes:

segmenting the picture to be detected to obtain at least one detection frame, and outputting coordinates of a character area of each detection frame to form a coordinate result set;

cutting the character area of each detection frame along the detection frame, and rotating to a detection direction, wherein the detection direction is horizontal or vertical;

performing character recognition on the character area of each detection frame, and outputting recognition characters line by line; and the recognized characters and coordinates are combined into a recognition result.

In one implementation, after the step of matching in the recognition result based on the target keyword to obtain a matching result, the method further includes:

if the characters corresponding to the target keywords are matched in the matching result, determining that the picture to be detected is a key picture;

otherwise, filtering the picture to be detected and acquiring a new picture to be detected again.

In one implementation manner, the step of displaying the target keyword region by calculation based on a preset display rule and a coordinate corresponding to the target text region includes:

acquiring a coordinate point of a detection frame of the target character area;

and determining a target area corresponding to the target keyword based on the frame of the detection box.

In one implementation, the method further comprises:

drawing the coordinate points corresponding to the target keywords in the to-be-detected picture in a highlight mode;

and displaying the picture to be detected with the highlighted keyword.

In one implementation, the text area of each detection box is subjected to text recognition, and recognition characters are output line by line; and the step of forming a recognition result by the recognized characters and coordinates includes:

performing character recognition on the character area of each detection box by adopting an OCR (optical character recognition), and outputting recognized characters line by line; and the recognized characters and coordinates are combined into a recognition result.

In addition, the invention also discloses a keyword display device based on picture character positioning, which comprises a processor and a memory connected with the processor through a communication bus; wherein,

the memory is used for storing a keyword display program based on picture character positioning;

the processor is configured to execute the keyword display program based on the picture character positioning to implement any one of the keyword display steps based on the picture character positioning.

And a computer storage medium storing one or more programs, the one or more programs being executable by one or more processors to cause the one or more processors to perform any of the picture text positioning based keyword display steps is disclosed.

The keyword display method based on the picture character positioning has the following beneficial effects: the user can find the key information wanted by the user from the massive pictures. On the basis of obtaining the position of the character area and the recognition result, the invention screens out pictures containing key information by matching the user-defined key words, calculates the position coordinates of the key words in each key picture, and highlights the key words on the pictures.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a keyword display method based on image character positioning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of OCR text detection and coordinate output of a detection box according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an OCR detection box cropping and rotation according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of OCR text recognition according to an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating screening of key pictures based on customized keywords according to an embodiment of the present invention, where fig. 5 shows a result that matching is successful, marked as a key picture, and a result that matching is unsuccessful;

FIG. 6 is a schematic diagram of keyword coordinate calculation and highlight visualization based on coordinates of a detection box according to an embodiment of the present invention;

fig. 7 is a schematic diagram illustrating a principle and a flow of calculating coordinates of keywords according to an embodiment of the present invention, and fig. 6 and 7 are schematic diagrams illustrating a method for locating city a and extracting text of a line where city a is located.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a keyword display method based on picture character positioning, the method comprising:

and S101, acquiring a picture to be detected.

It should be noted that there may be a plurality of pictures to be detected, so as to meet the matching requirements of the user, for example, a huge number of pictures on the social media.

S102, identifying the characters in the picture to be detected, and acquiring an identification result, wherein the identification result comprises the identified characters and the coordinates corresponding to each character area.

It should be noted that, firstly, the characters on the picture to be detected need to be recognized, and specifically, the step of character recognition includes, but is not limited to:

1) character detection: and performing character detection on the picture to be detected, outputting the coordinates of each character area, and forming a coordinate result set. And cutting the character area in each detection frame divided by the picture to be detected along the detection frame, and rotating to be horizontal (rotating to be vertical when vertical expression is performed). And then, performing character recognition on each cut picture, and outputting recognition results line by line to form a recognition result set.

In one implementation, algorithms such as DBNet are used to detect all text regions in a picture to be detected, and the text regions are framed by region coordinate points. The whole algorithm implementation process can be roughly divided into the following steps:

the first step is as follows: inputting a picture to be detected, and obtaining a plurality of characteristic graphs through CNN learning characteristics;

the second step is that: fusing the characteristics, wherein the fused characteristic graph is 1/4 of the original graph;

the third step: performing characteristic regression, regressing a segmentation graph and a threshold graph, and weighting and fusing to obtain a final characteristic graph;

the fourth step: and (5) obtaining a character detection result and outputting coordinates by adopting final characteristics on the original image, and referring to fig. 2.

2) Cutting and correcting: the text area is cut out along the detection frame in the picture to be detected, a plurality of text line pictures are obtained, and the pictures are rotated to be horizontal (vertical if the area is vertically expressed, such as couplet), referring to fig. 3.

3) Character recognition: and (5) identifying characters in each text line picture by using algorithms such as CRNN and the like, and outputting the characters line by line. The whole algorithm implementation process can be roughly divided into the following steps:

the first step is as follows: a convolutional layer, which uses CNN to extract a characteristic sequence from an image to be detected;

the second step is that: a loop layer, using bidirectional LSTM, to correlate context and predict the label distribution of the characteristic sequence;

the third step: and the translation layer is used for calculating the output probability of the label by combining CTC to obtain a character result, and the figure 4 is referred.

S103, matching is carried out in the recognition result based on the target keyword so as to obtain a matching result.

And S104, if the characters corresponding to the target keywords are matched in the matching result, acquiring a target character area containing the matching result in the picture to be detected.

It should be noted that the target keyword may be a user-defined keyword input by the user, and the target keyword is matched with the recognition result set in step S102, and if the result set includes the keyword, the picture to be detected is marked as a key picture, and is added to the key picture set.

And S105, displaying the target keyword area through calculation based on a preset display rule and the corresponding coordinate of the target character area.

And extracting the coordinate points of the target character area contained in the coordinate result set, using the coordinate points as known conditions, calculating the coordinates of the target keyword through a slope formula, and forming a keyword coordinate set corresponding to the target keyword.

And drawing each coordinate point in the keyword coordinate set on the original drawing, and connecting the corresponding coordinate points through straight lines to enable the target keyword to be highlighted on the original drawing.

The invention accurately positions the process to the user-defined keywords, thereby screening out the pictures which are valuable to the user and enabling the user to quickly find the key information contained in the pictures.

A user inputs a keyword which the user wants to query in a foreground search box, and after the background takes the keyword, the user can judge whether the identification result of the target picture contains the keyword, if so, the picture to be detected (with a corresponding matching result) is marked as a key picture and added into a key picture set, and the set is used as the input of the subsequent steps, as shown in fig. 5.

The whole coordinate value calculation process can be roughly divided into the following steps (taking the calculation of the coordinates of a keyword as an example here):

1) according to the input custom keyword, extracting detection box coordinates containing the keyword, namely box = [ [ a1, b1], [ a2, b2], [ a3, b3], [ a4, b4] ] from the key picture, wherein the coordinates are regarded as known conditions;

2) the value of each coordinate point is obtained by an element extraction rule of the list, such as a1= box [0] [0], b1= box [0] [1], and the like, and 8 values are extracted as a known condition for coordinate calculation;

3) calculating the width p _ w and the height p _ h (distance formula between two points) of the detection frame according to the coordinate values obtained in the step 2);

4) respectively obtaining the character length of the recognition result and the character length of the target keyword in the detection box through len (recognition result) and len (keyword), and obtaining the initial position pid of the keyword in the recognition result through an index function (recognition result, index (keyword, index initial position)) (if a plurality of pids exist, all positions are traversed by adding a plurality of cycles);

5) if the distance between the head of the keyword detection box and the head of the original detection box is w1, and the distance between the tail of the keyword detection box and the head of the original detection box is w2, then:

w1= p _ w (pid)/len (recognition result),

w2= p _ w (pid + len (keyword))/len (recognition result);

6) since the keywords (the characters corresponding to the target keywords are partial characters in the sentence corresponding to the picture to be detected) are part of the recognition result of the whole sentence, they form an angle with the virtual coordinate axis. Let the coordinates of the highlight box of the keyword be [ [ x1, y1], [ x2, y2], [ x3, y3], [ x4, y4] ], and calculated according to the trigonometric function "sin α = opposite side/oblique side, cos α = adjacent side/oblique side", similarly to the slope formula, as follows:

(x1-a1)/w1 = (a2-a1)/p_w —> x1 = w1*(a2-a1)/p_w + a1，

(x2-a1)/w2 = (a2-a1)/p_w —> x2 = w2*(a2-a1)/p_w + a1，

by analogy, all values in the coordinates of the key word highlight box are calculated, and the first two steps are referred to in fig. 6;

7) if a plurality of keywords exist in one picture, putting all the obtained coordinates in the step 6) into a list.

The calculation route of the entire keyword coordinates refers to fig. 7.

The method comprises the steps of establishing a virtual coordinate axis by taking the center of an original drawing as an origin, drawing coordinate points of keywords (each keyword comprises 4 coordinate points) in the coordinate axis, and then connecting adjacent points through straight lines with custom colors to form a rectangle, so that a keyword frame is highlighted in the rectangle, and referring to the last two steps of FIG. 6.

The foregoing embodiments are merely illustrative of the principles of the invention and its efficacy, and are not to be construed as limiting the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A keyword display method based on picture character positioning is characterized by comprising the following steps:

acquiring a picture to be detected;

displaying a target keyword area by calculation based on a preset display rule and a coordinate corresponding to the target character area;

wherein, the displaying the target key area by calculating based on the preset display rule and the coordinate corresponding to the target text area comprises: acquiring a coordinate point of a detection frame of the target character area; and determining a target area corresponding to the target key side based on the frame of the detection frame.

2. The method for displaying keywords based on image text positioning according to claim 1, wherein the step of identifying the characters in the image to be detected and obtaining the identification result comprises:

3. The method for displaying keywords based on graphic text orientation as claimed in claim 1, wherein after the step of matching in the recognition result based on the target keyword to obtain a matching result, the method further comprises:

4. The method for displaying keywords based on graphic text positioning as claimed in claim 1, further comprising:

and displaying the picture to be detected with the highlighted keyword.

5. The method for displaying keywords based on image text positioning according to claim 2, characterized in that the text area of each detection box is subjected to text recognition, and recognition characters are output line by line; and the step of forming a recognition result by the recognized characters and coordinates includes:

6. A keyword display device based on picture character positioning is characterized by comprising a processor and a memory connected with the processor through a communication bus; wherein,

the processor is configured to execute the keyword display program based on photo text positioning to implement the keyword display step based on photo text positioning according to any one of claims 1 to 5.

7. A computer storage medium, characterized in that the computer storage medium stores one or more programs executable by one or more processors to cause the one or more processors to perform the keyword display step based on picture text positioning according to any one of claims 1 to 5.