CN114328804A

CN114328804A - Method and system for searching key words containing character pictures

Info

Publication number: CN114328804A
Application number: CN202011029418.7A
Authority: CN
Inventors: 邓裕强; 朱志
Original assignee: Guangzhou Jiubang Digital Technology Co Ltd
Current assignee: Guangzhou Jiubang Digital Technology Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2022-04-12

Abstract

The invention discloses a keyword retrieval method and a keyword retrieval system containing text pictures, which can easily retrieve keywords required by a user and specific page numbers required by the user on an uneditable pdf document and a picture through three stages of text identification, text retrieval and target word positioning, solve the problem that vast users cannot find out specific pages and specific contents in key words in a large number of pictures and uneditable text pdf documents, and can easily meet the requirement that the specific pages required by the user are positioned through the keywords in the outdoor learning process.

Description

Method and system for searching key words containing character pictures

Technical Field

The invention relates to the field of character recognition and retrieval, in particular to a method and a system for retrieving key words containing character pictures.

Background

With the rise of palm reading, many working groups habitually like online learning, reading novels and the like to spend on-duty and off-duty hours, but the existing application software or mobile terminal cannot jump to a corresponding page and standard at any time and any place according to key words, like a dox format file, a user can find the content of a document which is completely the same as the key words, long sentences, punctuation marks and numbers through a 'searching' function and transfer marks, but the reading terminal software, pdf and the like contain picture types and cannot edit the document, so that the user cannot search by using the 'searching' function, and cannot directly transfer key word search character strings like a search engine, thereby bringing much inconvenience to the user.

Disclosure of Invention

The invention provides a method and a system for searching key words containing text pictures, which solve the problems, save the time of a user and bring convenience to the user.

The technical scheme disclosed by the invention is as follows:

a method and a system for searching key words and phrases containing character pictures are characterized in that a camera of a mobile terminal or other equipment is utilized to shoot a plurality of pictures or a PDF file can not be edited, character elements in the pictures are identified through an OCR character identification technology and a deep learning-based system, and key words and phrases needed by a user are searched.

The first stage is to identify the element content on each page in the to-be-processed picture set or the non-editable PDF document by using an OCR (optical character recognition) technology and arrange the element content in sequence; in the second stage, required key words are retrieved from the text documents through a deep learning network; and in the third stage, the keyword positioning retrieval of the picture and the content of the non-editable pdf document is realized through the positioning coordinates and the page number.

Further, the OCR character recognition model is mainly used for continuously training and recognizing similar pictures through a deep learning network and extracting element information such as characters, numbers, letters and the like in the pictures; and saved in a.doc or.docx document;

in the first stage, OCR character recognition comprises the following steps:

step 1: reading the picture set elements to be identified according to the sequence, and performing character, format correction and interference element removal;

the picture set comprises a jpg, png or pdf document picture set and the like.

Step 2: and marking the coordinates of each element in the picture and recording.

And step 3: and generating a doc or docx document, correspondingly, marking page numbers on the document, and sequentially converting the document into document pages according to the sequence of the picture sets.

Furthermore, the document page number corresponds to only one picture.

In the second stage, the key words are searched by utilizing the deep learning network, and the method comprises the following steps;

step 1: the deep learning network identifies and accurately memorizes the characters in the dox or docx document.

Step 2: and determining key words to be retrieved, checking and inputting the key words into the deep learning network model.

And step 3: positioning specific key words through a trained deep learning network, marking, and recording key word coordinates and document page numbers;

furthermore, at least 1 coordinate corresponding to the key words is provided; the document page number at least comprises 1 key term coordinate; the key terms are located on at least one document page number.

The third stage, positioning the original picture by using the coordinates and the page number, comprising the following steps:

step 1: and identifying the coordinates of the key words and the document page number, and positioning the original picture according to the document page number through a deep learning network.

Step 2: and positioning the position of the keyword in the original picture through the key word coordinates.

In order to realize the method, the invention also discloses a keyword retrieval system containing the character pictures, which comprises three modules:

an OCR recognition module: identifying element contents on each page in the to-be-processed picture set or the non-editable PDF document by using an OCR (optical character recognition) technology, and sequentially arranging the element contents according to the sequence;

the key term retrieval module: and searching out the required key terms from the text documents through a deep learning network.

A key word location module: and by positioning the coordinates and the page number, the retrieval of the pictures and the keywords of the contents of the non-editable pdf document is realized.

The OCR recognition module mainly comprises the following modules through OCR character recognition:

an element acquisition module: reading the picture set elements to be identified according to the sequence, and performing character, format correction and interference element removal.

The picture set comprises a jpg, png or pdf document picture set and the like.

A coordinate marking module: and marking the coordinates of each element in the picture and recording.

The document generation module: and generating a doc or docx document, correspondingly, marking page numbers on the document, and sequentially converting the document into document pages according to the sequence of the picture sets.

Furthermore, the document page number corresponds to only one picture.

The key word retrieval module is used for retrieving key words by utilizing a deep learning network and comprises the following modules;

the deep learning training module: the deep learning network identifies and accurately memorizes the characters in the dox or docx document.

A checking module: and determining key words to be retrieved, checking and inputting the key words into the deep learning network model.

A keyword tagging module: positioning specific key words through a trained deep learning network, marking, and recording key word coordinates and document page numbers;

The key word positioning module is used for positioning the original picture by utilizing coordinates and page numbers, and comprises the following modules:

a page number positioning module: and identifying the coordinates of the key words and the document page number, and positioning the original picture according to the document page number through a deep learning network.

The word coordinate positioning module: and positioning the position of the keyword in the original picture through the key word coordinates.

Drawings

Fig. 1 shows a flowchart of a keyword search method including a text image according to the present invention.

Fig. 2 shows a flowchart of a keyword search method including text images according to the present invention.

Fig. 3 shows a flow chart of a keyword retrieval system with text pictures according to the present invention.

Detailed Description

The method comprises the following steps that more and more electronic books appear in the public view along with the application of reading software, some electronic books are presented in an application terminal through a software background, and are presented in txt, HTML, HLP and other formats, and can be edited, copied, pasted, labeled and the like through reading software; however, if a paper text is scanned by a scanning machine and stored in a pdf file format, the text content, especially the text content scanned as a picture, cannot be retrieved by such a reader, and all pages and contents containing the word in the pdf document cannot be retrieved by the key word.

Similarly, when a student in a classroom browses news through PPT learning or a microblog website, some messages only exist in the form of pictures and texts, a terminal user can only store pictures in a local folder, if a small number of pictures and texts are available, the content of the searched pictures cannot be too much, if tens of pictures, hundreds of pictures and thousands of pictures are available, and how to search out a target picture from a huge picture set is a problem to be solved by the invention.

As shown in fig. 1, a method and a system for retrieving a keyword including a text image take a plurality of photos or a PDF file is not editable by using a mobile terminal camera or other devices, identify text elements in the image by using an OCR text recognition technology and a deep learning-based system, and retrieve a keyword required by a user.

Further, the OCR character recognition model is mainly used for continuously training and recognizing similar pictures through a deep learning network and extracting element information such as characters, numbers, letters and the like in the pictures; and saved in a.doc or.docx document.

As shown in fig. 2, in the first stage, OCR character recognition includes the following steps:

s101: reading the picture set elements to be identified according to the sequence, and performing character, format correction and interference element removal.

The picture set comprises a jpg, png or pdf document picture set and the like.

S102: and marking the coordinates of each element in the picture and recording.

S103: and generating a doc or docx document, correspondingly, marking page numbers on the document, and sequentially converting the document into document pages according to the sequence of the picture sets.

Furthermore, the document page number corresponds to only one picture.

s201: the deep learning network identifies and accurately memorizes the characters in the dox or docx document.

S202: and determining key words to be retrieved, checking and inputting the key words into the deep learning network model.

S203: specific key words are positioned through a trained deep learning network, marking is carried out, and coordinates of the key words and document page numbers are recorded.

s301: and identifying the coordinates of the key words and the document page number, and positioning the original picture according to the document page number through a deep learning network.

S302: and positioning the position of the keyword in the original picture through the key word coordinates.

an OCR recognition module: and identifying the element contents on each page in the to-be-processed picture set or the non-editable PDF document by using an OCR (optical character recognition) technology, and sequentially arranging the element contents according to the sequence.

The picture set comprises a jpg, png or pdf document picture set and the like;

Furthermore, the document page number corresponds to only one picture.

The key word retrieval module for retrieving the key words by using the deep learning network comprises the following modules.

A keyword tagging module: specific key words are positioned through a trained deep learning network, marking is carried out, and coordinates of the key words and document page numbers are recorded.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for searching key words containing text pictures comprises the following steps:

s1, identifying the element contents on each page in the to-be-processed picture set or the non-editable PDF document by using an OCR (optical character recognition) technology, and sequentially arranging the element contents according to the sequence;

s2: searching out required key words from the text documents through a deep learning network;

and S3, realizing the keyword positioning retrieval of the picture and the non-editable pdf document content through positioning coordinates and page numbers.

2. The method as claimed in claim 1, wherein the step S1 comprises the following steps:

s101, reading picture set elements to be identified according to the sequence, and performing character, format correction and interference element removal, wherein the picture set comprises the forms of jpg, png or pdf document picture sets and the like;

s102, marking and recording the coordinates of each element in the picture;

s103, generating a doc or docx document, correspondingly, marking page numbers on the document, and sequentially converting the document into document pages according to the sequence of the picture set; furthermore, the document page number corresponds to only one picture.

3. The method as claimed in claim 1, wherein the step S2 comprises the following steps:

s201: identifying characters in the dox or docx document by the deep learning network, and accurately memorizing the characters;

s202: determining key words to be retrieved, checking and inputting the key words into the deep learning network model;

s203: positioning specific key words through a trained deep learning network, marking, and recording key word coordinates and document page numbers; furthermore, at least 1 coordinate corresponding to the key words is provided; the document page number at least comprises 1 key term coordinate; the key terms are located on at least one document page number.

4. The method as claimed in claim 1, wherein the step S3 comprises the following sub-steps

S301: recognizing the coordinates of the key words and the document page number, and positioning the original picture according to the document page number through a deep learning network;

5. A keyword retrieval system comprising textual images, the system comprising:

an OCR recognition module: identifying element contents on each page in the to-be-processed picture set or the non-editable PDF document by using an OCR (optical character recognition) technology, and sequentially arranging the element contents according to the sequence; the key term retrieval module: searching out required key words from the text documents through a deep learning network; a key word location module: and positioning and searching the picture and the keywords of the contents of the non-editable pdf document are realized through positioning coordinates and page numbers.

6. The system of claim 5, wherein the OCR recognition module comprises:

an element acquisition module: reading the picture set elements to be identified according to the sequence, and performing character, format correction and interference element removal; the picture set comprises a jpg, png or pdf document picture set and the like;

a coordinate marking module: marking and recording the coordinates of each element in the picture;

7. The system of claim 5, wherein the keyword search module comprises the following modules:

the deep learning training module: identifying characters in the dox or docx document by the deep learning network, and accurately memorizing the characters; a checking module: determining key words to be retrieved, checking and inputting the key words into the deep learning network model;

a keyword tagging module: positioning specific key words through a trained deep learning network, marking, and recording key word coordinates and document page numbers; furthermore, at least 1 coordinate corresponding to the key words is provided; the document page number at least comprises 1 key term coordinate; the key terms are located on at least one document page number.

8. The system of claim 5, wherein the keyword spotting module comprises the following modules:

a page number positioning module: recognizing the coordinates of the key words and the document page number, and positioning the original picture according to the document page number through a deep learning network;