CN113361518A

CN113361518A - Method and device for quickly fetching words and searching

Info

Publication number: CN113361518A
Application number: CN202110732452.9A
Authority: CN
Inventors: 黄泉彪
Original assignee: Readboy Education Technology Co Ltd
Current assignee: Readboy Education Technology Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-09-07

Abstract

The invention discloses a method and a device for quickly picking words and searching, by utilizing the method, a user only needs to frame on a screen when meeting the content to be searched, and related resources can be obtained by combining pictures and an OCR (optical character recognition) technology, so that the user can search unknown characters and words on any page, and the learning efficiency is greatly improved.

Description

Method and device for quickly fetching words and searching

Technical Field

The invention relates to the technical field of data processing, in particular to a method and a device for quickly fetching words and searching.

Background

At present, most learning applications in the market do not have the word taking function, when users encounter unknown characters and words in learning engineering, the users need to exit the current applications and then enter a browser to search, resources searched by the browser are very messy, the users also need to spend much time, energy and time to screen, the process is complex, and the efficiency is low. Especially, students in the low ages cannot completely understand some knowledge points without knowing how to search if the students encounter unknown Chinese characters.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a method and a device for quickly searching words.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for quickly searching words comprises the following specific processes:

when a user encounters unknown characters, words or sentences in the process of reading by using the electronic equipment, framing contents to be searched on a screen of the electronic equipment by using a touch pen to form a closed area;

taking a minimum rectangular picture capable of containing a closed interval, and filling a blank in a part except the closed area;

performing OCR recognition on the minimum rectangular picture;

preprocessing the OCR recognition result in two aspects, namely separating each Chinese character or word in the OCR recognition result on one hand, and performing word segmentation processing on the OCR recognition result on the other hand to extract common words in the OCR recognition result;

carrying out duplication removal processing on the characters and words obtained by the two-aspect preprocessing, and removing duplicated words and words;

searching related resources by using the de-duplicated characters, words and original OCR recognition results;

and classifying the searched results, and displaying the classified searched results to the user in a list form.

Furthermore, the maximum errors of the closed region which cannot be formed in the horizontal direction and the vertical direction of the starting point and the end point of the frame are preset and are respectively recorded as maxXOffset and maxYOffset; recording a sliding track of a touch pen on a screen, wherein the starting point is (X1, Y1), and the end point is (X2, Y2); judging whether a closed area is contained in the sliding track of the stylus pen, if not, subtracting the X axis and the Y axis of the end point and the starting point respectively to obtain absolute values, namely xOffset ═ X2-X1|, yOffset | -Y2-Y1 |; if xOffset < maxXOffset and yOffset < maxYOffset, the area currently drawn by the stylus is also considered to be a closed area.

Further, a minimum width and a minimum height are preset, which are respectively designated minWidth and minHeight; if the sliding track of the touch pen contains a closed area, calculating the maximum width and the maximum height of the closed area, and respectively recording the maximum width and the maximum height as area width and area height; if area width > minWidth is satisfied and area height > minHeight, it is determined that the user is to search for content within the current closed region.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the above-mentioned method.

An electronic device comprising a processor and a memory for storing a computer program; the processor is configured to implement the above method when executing the computer program.

The invention has the beneficial effects that: by utilizing the method, the user only needs to frame on the screen when encountering the content needing to be searched, and related resources can be obtained by the combined search of the picture and the OCR technology, so that the user can search unknown characters and words on any page, and the learning efficiency is greatly improved.

Drawings

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings, and it should be noted that the present embodiment is based on the technical solution, and the detailed implementation and the specific operation process are provided, but the protection scope of the present invention is not limited to the present embodiment.

The embodiment provides a method for quickly retrieving words and searching, as shown in fig. 1, the specific process is as follows:

when a user encounters an unknown character, word or sentence in the process of reading by using the electronic equipment, the content (the unknown character, word or sentence, which can be Chinese or foreign language) to be searched is framed on the screen of the electronic equipment by the touch pen to form a closed area.

In consideration of the possibility of a certain error in the operation on the screen, the maximum errors that the horizontal and vertical directions of the start and end points of the frame cannot form a closed region are set in advance and are referred to as maxoffset and maxYOffset, respectively. Recording a sliding track of a touch pen on a screen, wherein the starting point is (X1, Y1), and the end point is (X2, Y2); judging whether a closed area is contained in the sliding track of the stylus pen, if not, subtracting the X axis and the Y axis of the end point and the starting point respectively to obtain absolute values, namely xOffset ═ X2-X1|, yOffset | -Y2-Y1 |; if xOffset < maxXOffset and yOffset < maxYOffset, the area currently drawn by the stylus is also considered to be a closed area.

It should be noted that, in order to prevent the user from operating by mistake, in the method of this embodiment, a minimum width and a minimum height are set, which are respectively denoted as minWidth and minHeight. If the sliding track of the stylus contains a closed area, the maximum width and the maximum height of the closed area are calculated and are respectively recorded as area width and area height. If area width > minWidth is satisfied and area height > minHeight, it is determined that the user is to search for content within the current closed region.

and performing OCR recognition on the minimum rectangular picture. Further, in order to reduce the operation burden of the electronic device, the minimum rectangular picture can be uploaded to a server to operate the OCR recognition, and the server returns the OCR recognition result to the electronic device.

The OCR recognition result is respectively preprocessed in two aspects, namely, each Chinese character (Chinese content) or word (English content) in the OCR recognition result is separated, and the OCR recognition result is segmented to extract common words in the OCR recognition result. It should be noted that the common word extraction may be performed by matching using an existing common word lexicon.

And carrying out duplication removal treatment on the characters and words obtained by the two-aspect pretreatment, and removing duplicated words and words.

And searching related resources by using the de-duplicated characters, words and original OCR recognition results. It should be noted that the de-duplicated characters, words and original OCR recognition results may be uploaded to a server, and the server searches for relevant resources and returns the resources to the electronic device.

And classifying the searched results according to characters, words, ancient poems, compositions and the like, and displaying the classified searched results to the user in a list form.

Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims

1. A method for quickly fetching words and searching is characterized by comprising the following specific processes:

performing OCR recognition on the minimum rectangular picture;

2. The method according to claim 1, characterized in that maximum errors that the horizontal and vertical directions of the start and end points of the frame cannot form a closed region are set in advance, and are respectively denoted as maxoffset and maxYOffset; recording a sliding track of a touch pen on a screen, wherein the starting point is (X1, Y1), and the end point is (X2, Y2); judging whether a closed area is contained in the sliding track of the stylus pen, if not, subtracting the X axis and the Y axis of the end point and the starting point respectively to obtain absolute values, namely xOffset ═ X2-X1|, yOffset | -Y2-Y1 |; if xOffset < maxXOffset and yOffset < maxYOffset, the area currently drawn by the stylus is also considered to be a closed area.

3. The method according to claim 1 or 2, characterized in that a minimum width and a minimum height, denoted minWidth and minHeight, respectively, are preset; if the sliding track of the touch pen contains a closed area, calculating the maximum width and the maximum height of the closed area, and respectively recording the maximum width and the maximum height as area width and area height; if area width > minWidth is satisfied and area height > minHeight, it is determined that the user is to search for content within the current closed region.

4. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 3.

5. An electronic device comprising a processor and a memory, the memory for storing a computer program; the processor is adapted to carry out the method of any one of claims 1 to 3 when executing the computer program.