CN113361518A - Method and device for quickly fetching words and searching - Google Patents

Method and device for quickly fetching words and searching Download PDF

Info

Publication number
CN113361518A
CN113361518A CN202110732452.9A CN202110732452A CN113361518A CN 113361518 A CN113361518 A CN 113361518A CN 202110732452 A CN202110732452 A CN 202110732452A CN 113361518 A CN113361518 A CN 113361518A
Authority
CN
China
Prior art keywords
words
area
ocr recognition
closed
recognition result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110732452.9A
Other languages
Chinese (zh)
Inventor
黄泉彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Readboy Education Technology Co Ltd
Original Assignee
Readboy Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Readboy Education Technology Co Ltd filed Critical Readboy Education Technology Co Ltd
Priority to CN202110732452.9A priority Critical patent/CN113361518A/en
Publication of CN113361518A publication Critical patent/CN113361518A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method and a device for quickly picking words and searching, by utilizing the method, a user only needs to frame on a screen when meeting the content to be searched, and related resources can be obtained by combining pictures and an OCR (optical character recognition) technology, so that the user can search unknown characters and words on any page, and the learning efficiency is greatly improved.

Description

Method and device for quickly fetching words and searching
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for quickly fetching words and searching.
Background
At present, most learning applications in the market do not have the word taking function, when users encounter unknown characters and words in learning engineering, the users need to exit the current applications and then enter a browser to search, resources searched by the browser are very messy, the users also need to spend much time, energy and time to screen, the process is complex, and the efficiency is low. Especially, students in the low ages cannot completely understand some knowledge points without knowing how to search if the students encounter unknown Chinese characters.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a method and a device for quickly searching words.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for quickly searching words comprises the following specific processes:
when a user encounters unknown characters, words or sentences in the process of reading by using the electronic equipment, framing contents to be searched on a screen of the electronic equipment by using a touch pen to form a closed area;
taking a minimum rectangular picture capable of containing a closed interval, and filling a blank in a part except the closed area;
performing OCR recognition on the minimum rectangular picture;
preprocessing the OCR recognition result in two aspects, namely separating each Chinese character or word in the OCR recognition result on one hand, and performing word segmentation processing on the OCR recognition result on the other hand to extract common words in the OCR recognition result;
carrying out duplication removal processing on the characters and words obtained by the two-aspect preprocessing, and removing duplicated words and words;
searching related resources by using the de-duplicated characters, words and original OCR recognition results;
and classifying the searched results, and displaying the classified searched results to the user in a list form.
Furthermore, the maximum errors of the closed region which cannot be formed in the horizontal direction and the vertical direction of the starting point and the end point of the frame are preset and are respectively recorded as maxXOffset and maxYOffset; recording a sliding track of a touch pen on a screen, wherein the starting point is (X1, Y1), and the end point is (X2, Y2); judging whether a closed area is contained in the sliding track of the stylus pen, if not, subtracting the X axis and the Y axis of the end point and the starting point respectively to obtain absolute values, namely xOffset ═ X2-X1|, yOffset | -Y2-Y1 |; if xOffset < maxXOffset and yOffset < maxYOffset, the area currently drawn by the stylus is also considered to be a closed area.
Further, a minimum width and a minimum height are preset, which are respectively designated minWidth and minHeight; if the sliding track of the touch pen contains a closed area, calculating the maximum width and the maximum height of the closed area, and respectively recording the maximum width and the maximum height as area width and area height; if area width > minWidth is satisfied and area height > minHeight, it is determined that the user is to search for content within the current closed region.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the above-mentioned method.
An electronic device comprising a processor and a memory for storing a computer program; the processor is configured to implement the above method when executing the computer program.
The invention has the beneficial effects that: by utilizing the method, the user only needs to frame on the screen when encountering the content needing to be searched, and related resources can be obtained by the combined search of the picture and the OCR technology, so that the user can search unknown characters and words on any page, and the learning efficiency is greatly improved.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and it should be noted that the present embodiment is based on the technical solution, and the detailed implementation and the specific operation process are provided, but the protection scope of the present invention is not limited to the present embodiment.
The embodiment provides a method for quickly retrieving words and searching, as shown in fig. 1, the specific process is as follows:
when a user encounters an unknown character, word or sentence in the process of reading by using the electronic equipment, the content (the unknown character, word or sentence, which can be Chinese or foreign language) to be searched is framed on the screen of the electronic equipment by the touch pen to form a closed area.
In consideration of the possibility of a certain error in the operation on the screen, the maximum errors that the horizontal and vertical directions of the start and end points of the frame cannot form a closed region are set in advance and are referred to as maxoffset and maxYOffset, respectively. Recording a sliding track of a touch pen on a screen, wherein the starting point is (X1, Y1), and the end point is (X2, Y2); judging whether a closed area is contained in the sliding track of the stylus pen, if not, subtracting the X axis and the Y axis of the end point and the starting point respectively to obtain absolute values, namely xOffset ═ X2-X1|, yOffset | -Y2-Y1 |; if xOffset < maxXOffset and yOffset < maxYOffset, the area currently drawn by the stylus is also considered to be a closed area.
It should be noted that, in order to prevent the user from operating by mistake, in the method of this embodiment, a minimum width and a minimum height are set, which are respectively denoted as minWidth and minHeight. If the sliding track of the stylus contains a closed area, the maximum width and the maximum height of the closed area are calculated and are respectively recorded as area width and area height. If area width > minWidth is satisfied and area height > minHeight, it is determined that the user is to search for content within the current closed region.
Taking a minimum rectangular picture capable of containing a closed interval, and filling a blank in a part except the closed area;
and performing OCR recognition on the minimum rectangular picture. Further, in order to reduce the operation burden of the electronic device, the minimum rectangular picture can be uploaded to a server to operate the OCR recognition, and the server returns the OCR recognition result to the electronic device.
The OCR recognition result is respectively preprocessed in two aspects, namely, each Chinese character (Chinese content) or word (English content) in the OCR recognition result is separated, and the OCR recognition result is segmented to extract common words in the OCR recognition result. It should be noted that the common word extraction may be performed by matching using an existing common word lexicon.
And carrying out duplication removal treatment on the characters and words obtained by the two-aspect pretreatment, and removing duplicated words and words.
And searching related resources by using the de-duplicated characters, words and original OCR recognition results. It should be noted that the de-duplicated characters, words and original OCR recognition results may be uploaded to a server, and the server searches for relevant resources and returns the resources to the electronic device.
And classifying the searched results according to characters, words, ancient poems, compositions and the like, and displaying the classified searched results to the user in a list form.
Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims (5)

1. A method for quickly fetching words and searching is characterized by comprising the following specific processes:
when a user encounters unknown characters, words or sentences in the process of reading by using the electronic equipment, framing contents to be searched on a screen of the electronic equipment by using a touch pen to form a closed area;
taking a minimum rectangular picture capable of containing a closed interval, and filling a blank in a part except the closed area;
performing OCR recognition on the minimum rectangular picture;
preprocessing the OCR recognition result in two aspects, namely separating each Chinese character or word in the OCR recognition result on one hand, and performing word segmentation processing on the OCR recognition result on the other hand to extract common words in the OCR recognition result;
carrying out duplication removal processing on the characters and words obtained by the two-aspect preprocessing, and removing duplicated words and words;
searching related resources by using the de-duplicated characters, words and original OCR recognition results;
and classifying the searched results, and displaying the classified searched results to the user in a list form.
2. The method according to claim 1, characterized in that maximum errors that the horizontal and vertical directions of the start and end points of the frame cannot form a closed region are set in advance, and are respectively denoted as maxoffset and maxYOffset; recording a sliding track of a touch pen on a screen, wherein the starting point is (X1, Y1), and the end point is (X2, Y2); judging whether a closed area is contained in the sliding track of the stylus pen, if not, subtracting the X axis and the Y axis of the end point and the starting point respectively to obtain absolute values, namely xOffset ═ X2-X1|, yOffset | -Y2-Y1 |; if xOffset < maxXOffset and yOffset < maxYOffset, the area currently drawn by the stylus is also considered to be a closed area.
3. The method according to claim 1 or 2, characterized in that a minimum width and a minimum height, denoted minWidth and minHeight, respectively, are preset; if the sliding track of the touch pen contains a closed area, calculating the maximum width and the maximum height of the closed area, and respectively recording the maximum width and the maximum height as area width and area height; if area width > minWidth is satisfied and area height > minHeight, it is determined that the user is to search for content within the current closed region.
4. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 3.
5. An electronic device comprising a processor and a memory, the memory for storing a computer program; the processor is adapted to carry out the method of any one of claims 1 to 3 when executing the computer program.
CN202110732452.9A 2021-06-29 2021-06-29 Method and device for quickly fetching words and searching Pending CN113361518A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110732452.9A CN113361518A (en) 2021-06-29 2021-06-29 Method and device for quickly fetching words and searching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110732452.9A CN113361518A (en) 2021-06-29 2021-06-29 Method and device for quickly fetching words and searching

Publications (1)

Publication Number Publication Date
CN113361518A true CN113361518A (en) 2021-09-07

Family

ID=77537246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110732452.9A Pending CN113361518A (en) 2021-06-29 2021-06-29 Method and device for quickly fetching words and searching

Country Status (1)

Country Link
CN (1) CN113361518A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116027946A (en) * 2023-03-28 2023-04-28 深圳市人马互动科技有限公司 Picture information processing method and device in interactive novel

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239313A (en) * 2013-06-09 2014-12-24 百度在线网络技术(北京)有限公司 Method for searching for characters displayed in screen and based on mobile terminal and mobile terminal
CN106780316A (en) * 2017-01-25 2017-05-31 宇龙计算机通信科技(深圳)有限公司 A kind of image cropping method, image cropping device and mobile terminal
CN106814964A (en) * 2016-12-19 2017-06-09 广东小天才科技有限公司 A kind of method and content search device that content search is carried out in mobile terminal
CN110471599A (en) * 2019-08-14 2019-11-19 广东小天才科技有限公司 Screen word-selecting searching method, device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239313A (en) * 2013-06-09 2014-12-24 百度在线网络技术(北京)有限公司 Method for searching for characters displayed in screen and based on mobile terminal and mobile terminal
CN106814964A (en) * 2016-12-19 2017-06-09 广东小天才科技有限公司 A kind of method and content search device that content search is carried out in mobile terminal
CN106780316A (en) * 2017-01-25 2017-05-31 宇龙计算机通信科技(深圳)有限公司 A kind of image cropping method, image cropping device and mobile terminal
CN110471599A (en) * 2019-08-14 2019-11-19 广东小天才科技有限公司 Screen word-selecting searching method, device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116027946A (en) * 2023-03-28 2023-04-28 深圳市人马互动科技有限公司 Picture information processing method and device in interactive novel

Similar Documents

Publication Publication Date Title
CN107656922B (en) Translation method, translation device, translation terminal and storage medium
US8275604B2 (en) Adaptive pattern learning for bilingual data mining
US8645184B2 (en) Future technology projection supporting apparatus, method, program and method for providing a future technology projection supporting service
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
US20130156302A1 (en) Handwritten word spotter system using synthesized typed queries
CN111832403A (en) Document structure recognition method, and model training method and device for document structure recognition
CN110770735A (en) Transcoding of documents with embedded mathematical expressions
CN112434690A (en) Method, system and storage medium for automatically capturing and understanding elements of dynamically analyzing text image characteristic phenomena
CN104182381A (en) character input method and system
Jindal Generating image captions in Arabic using root-word based recurrent neural networks and deep neural networks
CN111695518B (en) Method and device for labeling structured document information and electronic equipment
CN114187595A (en) Document layout recognition method and system based on fusion of visual features and semantic features
CN112784009A (en) Subject term mining method and device, electronic equipment and storage medium
CN113361518A (en) Method and device for quickly fetching words and searching
CN107463624A (en) A kind of method and system that city interest domain identification is carried out based on social media data
Hsueh Interactive text recognition and translation on a mobile device
CN114579796B (en) Machine reading understanding method and device
Wang Feature Extraction Method of Machine Translation Equivalent Pairs in Chinese-English Comparable Corpus based OCR Recognition
Ueki et al. Survey on deep learning-based Kuzushiji recognition
CN114238689A (en) Video generation method, video generation device, electronic device, storage medium, and program product
CN113449504A (en) Intelligent marking method and system
Henke Building and Improving an OCR Classifier for Republican Chinese Newspaper Text
Puigcerver et al. Advances in handwritten keyword indexing and search technologies
Rai et al. MyOcrTool: visualization system for generating associative images of Chinese characters in smart devices
Malkadi et al. Improving code extraction from coding screencasts using a code-aware encoder-decoder model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210907