WO2022194180A1

WO2022194180A1 - Method for recognizing touch-to-read text, and electronic device

Info

Publication number: WO2022194180A1
Application number: PCT/CN2022/081042
Authority: WO
Inventors: 张红蕾; 李力骏
Original assignee: 华为技术有限公司
Priority date: 2021-03-19
Filing date: 2022-03-15
Publication date: 2022-09-22
Also published as: CN115116075A

Abstract

A method for recognizing touch-to-read text. The method for recognizing touch-to-read text comprises: in response to a first operation of a user, an electronic device starting to collect an image; the electronic device recognizing a touch-to-read gesture of the user according to the position movement of a finger of the user that is recognized in the collected image; the electronic device determining target text in the content of a book in the collected image according to the touch-to-read gesture and the position of a trajectory of the touch-to-read gesture; and the electronic device broadcasting the recognized target text. Further disclosed are an electronic device, a computer-readable storage medium and a computer program product.

Description

A method and electronic device for recognizing point-and-read characters

This application claims the priority of the Chinese patent application filed on March 19, 2021 with the application number of 202110298494.6 and the application title of "a method and electronic device for recognizing point-and-read characters", the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the field of terminal artificial intelligence (Artificial Intelligence, AI) and the field of character recognition, and in particular, to a method and electronic device for recognizing point-and-read characters.

Background technique

With the development of science and technology, more and more electronic devices are applied to education. For example, electronic devices with a reading function, such as a reading pen, a tablet computer, a robot, etc., can be used to assist users in reading picture books. When users encounter unfamiliar words and sentences in picture books, they can learn with the help of electronic devices such as reading pens, tablet computers, robots, etc. that have the function of reading. However, the reading pen can only recognize the text in a specific picture book. Some electronic devices such as tablet computers and robots can only recognize the text in the electronic picture book. This limits the user's learning resources. For physical picture books, although some tablet computers and robots can also recognize the text read by the user, the accuracy of recognizing the text that the user needs to read according to the user's gesture is not high.

Therefore, how the electronic device can accurately identify the text that the user needs to read in any physical textbook is an urgent problem to be solved.

SUMMARY OF THE INVENTION

The present application provides a method and an electronic device for recognizing point-and-read characters. Through the method for recognizing point-and-read characters, the electronic device can more accurately recognize the characters specified by a user in a book.

In a first aspect, the present application provides a method for recognizing point-and-read characters, the method may include: the electronic device 100 starts to capture an image in response to a first operation of the user, wherein the image captured by the electronic device 100 includes the user The user's finger and the book are located in the target area of the electronic device; the electronic device 100 recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image; the electronic device 100 The position of the trajectory of the gesture and the reading gesture determines the target text in the content of the book in the captured image; the electronic device 100 broadcasts the recognized target text.

With the method provided in the first aspect of the present application, the electronic device can accurately determine the target text from the image collected by the electronic device in combination with the user's gesture. Therefore, the electronic device can accurately identify the target character. In this way, user experience can be improved.

In a possible implementation manner, the point-to-read gesture includes one or more of the following: dots, dashes, and circles.

In a possible implementation manner, the electronic device recognizes the user's pointing gesture according to the position movement of the user's finger recognized by the collected image, including: after the electronic device detects the user's finger through the collected image, if the first It is detected that the position of the finger in the collected image moves less than the first preset distance within the preset time period, and the electronic device records the first position as the starting point of the pointing gesture; after the electronic device starts recording the starting point of the pointing gesture, if it detects The movement of the second position of the finger in the image collected within the second preset duration is less than the second preset distance, and the electronic device records the second position as the end point of the reading gesture; End point to recognize the read gesture.

It can be understood that the electronic device 100 may start recording the coordinates of the finger at the starting point from the starting point of the pointing gesture, and finish recording the coordinates of the finger at the end point when the pointing gesture ends. In this way, the electronic device can more accurately determine the starting point and the ending point of the user's reading gesture trajectory.

In a possible implementation manner, the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, including: if the electronic device records the starting point and the position of any finger between the starting point and the ending point If the distance is less than the third preset distance, the electronic device recognizes that the reading gesture is a point; if the coordinates of the position of the finger between the starting point and the end point recorded by the electronic device are linearly correlated, the electronic device recognizes that the reading gesture is a dash; If the distance between the starting point and the ending point recorded by the electronic device is less than the fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than the fifth preset distance, the electronic device recognizes that the reading gesture is a circle.

In this way, the electronic device can accurately determine the specific type of the point-to-read gesture.

In a possible implementation manner, the electronic device determines the target text in the content of the book in the collected image according to the position of the reading gesture and the trajectory of the reading gesture, including: the electronic device determines the content of the book according to the collected image Chinese The position of the word area; the electronic device determines the target text in the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area.

In this way, combined with the position of the text area in the image, the user's reading gesture and the trajectory of the reading gesture, the electronic device can more accurately determine the target text, that is, the text that the user needs to recognize and broadcast in the selected book.

In a possible implementation manner, the electronic device determines the target text in the content of the book according to the position of the reading gesture and the trajectory of the reading gesture, and the position of the text area, including: the electronic device according to the trajectory of the reading gesture and The position of the text area in the first book determines the first text area, the first text area includes a first trajectory, and the first trajectory is a part of the trajectory of the pointing gesture that is greater than or equal to a preset ratio; the electronic device is based on the first trajectory and The point reading gesture and the first text area are used to determine the target text in the content of the book.

The electronic device determines that the user needs to recognize the text in the first text region only when most of the trajectory of the user's reading gesture falls within the first text region. In this way, when a part of the user's gesture track falls in the first text area and a part falls in the second text area, the electronic device can also correctly determine the target text.

In a possible implementation manner, the electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture, and the first text area, including: if the pointing gesture is a point, the electronic device determines that the first text area is the same as the one in the first text area. The text with the smallest distance from the first track is the target text; if the reading potential is a dash, the electronic device determines that the text above the first track in the first text area is the target text; if the reading gesture is a circle, the electronic device determines the first text The text in the first track in the region is the target text.

In this way, with different gestures, the electronic device adopts different strategies to determine the target text, which can improve the accuracy of the electronic device in determining the target text in the captured image. Therefore, the accuracy of identifying the target character by the electronic device can be improved.

In a possible implementation manner, the first preset distance is equal to the second preset distance, and the second preset duration is equal to the first preset duration. In this way, that is, the electronic device determines that the conditions under which the finger is still in the images collected by the user at different times are the same, which can reduce the calculation amount of the electronic device.

In a second aspect, the present application provides a method for recognizing point-to-read characters, the method may include: in response to a first operation, the electronic device collects an image of the first book; When the distance between the coordinates of the finger and the coordinates of the finger in the image frame collected before the first preset time is less than the first preset distance, the electronic device starts to record the coordinates of the finger in the image frame; when the image collected by the electronic device at the second moment When the distance between the coordinates of the finger in the frame and the coordinates of the finger in the image frame collected before the second preset duration is less than the second preset distance, the electronic device stops recording the coordinates of the finger in the image frame, and the second time is after the first time time; the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the first time to the second time; the electronic device recognizes and broadcasts the text to be recognized.

The text to be recognized may be referred to as a target text, that is, the text specified by the user in the book to be recognized.

With the method provided in the second aspect of the present application, the coordinates of the two times when the finger of the electronic device is stationary are used as the starting point and the ending point of the user's reading track, respectively. In this way, the electronic device can accurately determine the starting position when the user clicks to read. Therefore, the electronic device can accurately determine the characters to be recognized according to the track coordinates between two times of the finger resting. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience. In addition, the electronic device can recognize the characters in any book, and there is no need to customize the book.

In a possible implementation manner, when the distance between the coordinates of the finger in the image frame collected by the electronic device at the first moment and the coordinates of the finger in the image frame collected before the first preset duration is smaller than the first preset distance, the electronic Before the device starts to record the coordinates of the finger in the image frame, the method further includes: when the electronic device detects the finger in the image frame collected at the third moment, start to obtain the coordinates of the finger in the collected image frame, and the third moment is: Moments before the first moment.

The electronic device may take the moment when the finger appears in the image as the start of the user's reading, and only when the user is clicking, can the coordinates of the finger in the image frame be collected. In this way, it is avoided that the electronic device still performs subsequent steps of point reading when the user's finger is not detected. Therefore, the calculation amount of the electronic device can be reduced, and the power consumption of the electronic device can be saved.

In a possible implementation manner, the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the first time to the second time, which specifically includes: the electronic device records the text from the first time to the second time The coordinates of the finger determine the pointing gesture of the finger; the electronic device determines the position of the text area in the first book according to the image of the first book; the electronic device determines the coordinates of the finger, the pointing gesture and the first The position of the text area in the book determines the text to be recognized in the first book.

In a possible implementation manner, the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first moment to the second moment, specifically including: if the electronic device determines the coordinates of the finger recorded from the first moment to the second moment The distance between any one of the coordinates and the other coordinates is less than the third preset distance, then the electronic device determines that the pointing gesture is the first gesture; if the electronic device determines that the coordinates of the fingers recorded from the first moment to the second moment are linearly related, then the electronic device The device determines that the reading gesture is the second gesture; if the electronic device determines that the distance between the coordinates of the finger recorded at the first moment and the coordinates of the finger recorded at the second moment among the coordinates of the fingers recorded from the first moment to the second moment is smaller than the fourth The preset distance, the distance between the coordinates of the finger recorded at the fourth moment and the coordinates of the finger recorded at the first moment is greater than the fifth preset distance, then the electronic device determines that the pointing gesture is the third gesture, and the fourth moment is the distance between the first moment and the first moment. A moment between the second moment.

In a possible implementation manner, the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the second time to the third time, the pointing gesture, and the position of the text area in the first book, which specifically includes : The electronic device connects the coordinates of the fingers recorded from the first moment to the second moment to obtain the first finger trajectory according to the recording sequence; the electronic device determines the first text area according to the first finger trajectory and the position of the text area in the first book. The area includes a second finger trajectory, and the second finger trajectory is a part of the first finger trajectory that is greater than or equal to a preset ratio; the electronic device determines the text to be recognized according to the second finger trajectory, the finger gesture, and the first text area.

In a possible implementation manner, the electronic device determines the text to be recognized according to the trajectory of the second finger, the gesture of the finger, and the first text area, which specifically includes: if the gesture of the finger is the first gesture, the electronic device determines the difference between the first text area and the second text area. The text with the smallest finger track distance is the text to be recognized; if the finger gesture is the second gesture, the electronic device determines that the text above the second finger track in the first text area is the text to be recognized; if the finger gesture is the third gesture, the electronic device determines that the text above the second finger track is the text to be recognized. It is determined that the text in the track of the second finger in the first text area is the text to be recognized.

The first gesture is a dot, the second gesture is a line, and the third gesture is a circle.

In a possible implementation manner, the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first time to the second time, which specifically includes: when the electronic device does not detect the finger in the image collected at the fifth time , the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first moment to the second moment.

In this way, the electronic device can determine the end time of the user's reading. When the electronic device determines that the user's reading is over, the electronic device may stop performing the pointing step (eg, determine the coordinates of the finger in the image) until the user starts pointing again. In this way, the power consumption of the electrons can be saved.

In a possible implementation manner, the first preset duration and the second preset duration are equal, and the first preset distance and the second preset distance are equal. In this way, that is, the electronic device determines that the conditions under which the finger is still in the images collected by the user at different times are the same, which can reduce the calculation amount of the electronic device.

In a third aspect, an electronic device is provided, the electronic device comprising: one or more processors and a memory; the memory is coupled to the one or more processors, the memory is used to store computer program code, the computer program code includes computer instructions, The one or more processors invoke computer instructions to cause the electronic device to execute the method of recognizing point-and-click text in any possible implementation manner of the first aspect and any possible implementation manner of the second aspect.

In a fourth aspect, the embodiments of the present application provide a computer storage medium, including computer instructions, when the computer instructions are run on an electronic device, the electronic device is made to perform the identification point reading in any of the possible implementations of any of the above aspects text method.

In a fifth aspect, an embodiment of the present application provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to execute the method for recognizing point-and-read characters in any of the possible implementations of any one of the above aspects .

Description of drawings

1A is a schematic diagram of an application scenario of a robot that can be used for point reading provided by an embodiment of the present application;

1B is a schematic diagram of another application scenario of the robot that can be used for point reading provided by an embodiment of the present application;

1C is a schematic diagram of another application scenario of the robot that can be used for point reading provided by an embodiment of the present application;

2 is a schematic flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application;

3A-3D are schematic diagrams of a set of user interfaces of the electronic device 100 provided by the embodiments of the present application;

FIG. 3E is a schematic diagram of collecting a picture book image by the electronic device 100 provided by the embodiment of the present application;

3F is a schematic diagram of content area division of a picture book provided by an embodiment of the present application;

FIG. 4A is a schematic diagram of an image frame when the electronic device 100 according to the embodiment of the present application collects a frame of images when a user clicks and reads;

FIG. 4B is a schematic diagram of finger detection performed on the collected image frame by the electronic device 100 provided by the embodiment of the present application;

FIG. 5 is a schematic diagram of a group of image frames when the electronic device 100 according to an embodiment of the present application collects a user's point reading;

6 is a schematic diagram of a trajectory when an electronic user clicks and reads provided by an embodiment of the present application;

7A-7C are schematic diagrams of polar coordinate diagrams corresponding to different finger trajectories when a user points and reads according to an embodiment of the present application;

FIG. 8 is a schematic diagram of the combination of the analysis result of the picture book layout and the trajectory of the user's finger provided by an embodiment of the present application;

9A-9B are schematic diagrams of a group of text detection provided by an embodiment of the present application;

10 is a schematic flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application;

11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;

12 is a schematic diagram of a software architecture of an electronic device provided by an embodiment of the present application;

FIG. 13 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

The terms used in the following embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to be used as limitations of the present application. As used in the specification of this application and the appended claims, the singular expressions "a," "an," "the," "above," "the," and "the" are intended to also Plural expressions are included unless the context clearly dictates otherwise. It will also be understood that, as used in this application, the term "and/or" refers to and includes any and all possible combinations of one or more of the listed items.

Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as implying or implying relative importance or implying the number of indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise specified, the "multiple" The meaning is two or more.

Since the embodiments of the present application relate to the application of a method for recognizing point-and-read characters, for ease of understanding, related terms and concepts involved in the embodiments of the present application are first introduced below.

1. Click to read

In this embodiment of the present application, "point reading" may mean that the electronic device can recognize and play the text specified by the user in the picture book by voice. In this embodiment of the present application, there may be various forms in which the user specifies the text in the picture book. For example, the user's finger points below the text to be recognized in the picture book, or the user's finger draws a line below the text to be recognized in the picture book, and the user's finger draws a circle to delineate the text to be recognized, and so on.

Exemplarily, as shown in FIG. 1A , the user may use a finger to point under the text (ie, “cat”) to be recognized in the picture book 102 . The robot 100 may include a camera 103 and a camera 104 . The picture book 102 is placed in the photographing field of view area 101 of the camera 103 and the camera 104 . The camera 103 and/or the camera 104 of the robot 100 may capture the image of the user in the picture book 102 where the user's finger is below the word "cat". The robot 100 can recognize the character "cat" and play the character "cat".

Exemplarily, as shown in FIG. 1B , the user can use a finger to draw a line (or cross) under the text (ie, “cat and mouse”) to be recognized in the picture book 102 . The robot 100 can recognize the text "cat and mouse" and play the text "cat and mouse".

Exemplarily, as shown in FIG. 1C , the user may draw a circle with a finger in the picture book 102 to delineate the desired identification text (ie, "cat and mouse"). The robot 100 can recognize the text "cat and mouse" and play the text "cat and mouse".

Correspondingly, in the embodiment of the present application, the gestures of the user when reading can be divided into: "dot", "line", "circle" and so on. It can be understood that the categories and names of gestures in the embodiments of the present application are not limited.

It can be understood that the user may use other objects (for example, a pen) instead of a finger to specify the text to be recognized during drawing, which is not limited in this embodiment of the present application.

In this embodiment of the present application, the electronic device may be the robot 100 shown in FIG. 1A , FIG. 1B and FIG. 1C , or may be a terminal device with a camera such as a tablet computer, a smart phone, or the like, and the electronic device may be a camera and a camera. A point reading device composed of a terminal with a character recognition function is not limited in this embodiment of the present application.

An embodiment of the present application provides a method for recognizing point-to-read text, the method may include: the electronic device 100 can continuously collect images of the first picture book; when a finger appears in the image collected by the electronic device, the electronic device determines that the user starts point-to-read The electronic device analyzes the image of the first picture book to obtain a text analysis result; the electronic device determines the trajectory and gesture of the user according to the coordinates of the fingers in the collected multi-frame images, and the electronic device The reading gesture and the text analysis result determine the text area Q containing the text to be recognized; the electronic device recognizes and voice broadcasts the text to be recognized in the text area Q.

A method for recognizing point-read characters provided by the present application will be described in detail below with reference to the accompanying drawings. FIG. 2 exemplarily shows a flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application. As shown in Figure 2, a method for recognizing point-and-read characters provided by the present application may include the following steps:

S201. In response to the first operation of the user, the camera of the electronic device 100 starts to capture the image of the book B.

The electronic device 100 may receive the user's first operation. The first operation of the user may be to turn on the electronic device 100 or to turn on the reading APP in the electronic device 100 . The electronic device 100 may start capturing images by the camera of the electronic device 100 in response to the user's first operation. The electronic device 100 continuously collects multiple frames of images. The electronic device 100 may be the robot 100 shown in FIGS. 3A-3E , and the book B may be the book 102 shown in FIG. 3E . The embodiment of the present application does not limit the book B. That is, in the method provided by the embodiment of the present application, the electronic device 100 can recognize the text in any book that the user clicks to read.

Illustratively, as shown in FIG. 3A , the electronic device 100 may be the robot 100 shown in FIG. 3A . The electronic device 100 may include a camera 103 and a camera 104 and a display screen 105 . Among them, the icon 106 of the Dot-Read APP can be displayed on the display screen 105 .

As shown in FIG. 3B , the user's first operation may be to click on the icon 106 of the "click to read" APP.

As shown in FIG. 3C , in response to the user's operation of clicking the icon 106 of the "click to read" APP, the camera 103 and the camera 104 of the robot 100 start to capture images. In addition, the display screen 105 of the robot 100 can display the book display area 1051 and the prompt text 1052 . The book display area 1051 can display the images collected by the camera 103 and the camera 104 . The prompt text 1052 may prompt the user to place the book to be learned in the shooting area of the camera 103 and the camera 104 . Exemplarily, the content of the prompt text 1052 may be "please put the book in the shooting area", and the specific content of the prompt text 1052 is not limited here. The prompt text 1052 may be displayed in the book display area 1051, or may be displayed outside the book display area 1051, and the specific position of the prompt text 1052 is not limited here.

Optionally, as shown in FIG. 3D , the display screen 105 of the robot 100 may further include a control 1053 . The control 1053 is used to trigger the robot 100 to perform layout analysis and finger detection on the collected images.

Optionally, as shown in FIG. 3E , when the user places the book 102 in the shooting field of view 101 of the camera 103 and the camera 104 , the image 1021 of the book 102 may be displayed in the book display area 1051 of the display screen 105 . The user can adjust the position of the book according to the image 1021 in the display area 1051 . For example, if the user sees that only the right half of the book 102 is displayed in the display area 1051 , the user can move the book to the right so that the book 102 moves into the shooting field 101 of the camera 103 and the camera 104 . After the user sees the complete image of the book 102 in the display area 1051, the user can click on the control 1053.

Optionally, the electronic device may display a gesture available for reading text on the display screen 105 . For example, the user may click below the text to be recognized, may also draw a line below the text to be recognized, or may draw a circle to delineate the text to be recognized. In this way, the user can be prompted to use a gesture recognizable by the electronic device 100 to read.

S202 , the electronic device 100 performs layout analysis on the image of the book B, and determines the type of the book content in the book B and the position corresponding to the book content.

The electronic device 100 may perform layout analysis on a frame of image of the book B collected by the camera, and obtain the position of the text area in the current page of the book B on the book page. The electronic device 100 may store a layout analysis model, the electronic device 100 inputs a frame of image into the layout analysis model, and the model can output the type of book content (text, drawing, table, etc.) contained in the image and the book content corresponding location. Through the layout analysis model, the electronic device 100 can determine that the image of the book B may include one or more of a text area, a drawing area, and a table area.

In this embodiment of the present application, the text area may refer to an area that only contains text in a frame of image. The drawing area can refer to the area of an image that contains drawing. The table area may refer to an area of a frame image that contains a table. It will be appreciated that the plot area and table area may also contain text. One frame of image of the book B collected by the electronic device 100 may include one or more text regions, and/or drawing regions, and/or table regions.

Exemplarily, as shown in FIG. 3F , the image 1021 of the book 102 may include area A, area B, and area C. Area A and Area C are drawing areas, and area B is text area. The area A may be a rectangular area with vertices A1(xa1, ya1, za1), A2(xa2, ya2, za2), A3(xa3, ya3, za3), A4(xa4, ya4, za4). Region B may be a rectangular region with vertices B1 (xb1, yb1, zb1), B2 (xb2, yb2, zb2), B3 (xb3, yb3, zb3), B4 (xb4, yb4, zb4). Region C may be a rectangular region with vertices C1(xc1, yc1, zc1), C2(xc2, yc2, zc2), C3(xc3, yc3, zc3), C4(xc4, yc4, zc4).

It can be understood that the shape of the text area and the drawing area obtained by the electronic device 100 by performing layout analysis on the image of the book B is not limited to a rectangle, and may also be other shapes, such as polygons, circles, and the like.

In this embodiment of the present application, the electronic device 100 may use the upper left vertex of the photographing field of view of the electronic device 100 as the origin to establish a coordinate system. For example, in the coordinate system XYZ shown in FIG. 3E , the origin O of the coordinate system XYZ is the upper left vertex of the camera field 101 of the robot.

The electronic device 100 may input the image 1021 of the book 102 shown in FIG. 3F into the layout analysis model for layout analysis, and obtain the type of content contained in the image 1021 and the location of the content as shown in Table 1 below.

Table 1

As shown in Table 1, the electronic device 100 performs layout analysis on the image 1021, and can determine the drawings and characters contained in the image 1021, as well as the positions of the drawings and characters. The contents included in the area A and the area C in the image 1021 are drawings, and the contents included in the area B are characters. The coordinates of the four vertices of the area A are (xa1, ya1, za1), (xa2, ya2, za2), (xa3, ya3, za3), (xa4, ya4, za4), respectively. The area A in Table 1 may be the area A shown in FIG. 3F . The coordinates of the four vertices of the region B are (xb1, yb1, zb1), (xb2, yb2, zb2), (xb3, yb3, zb3), (xb4, yb4, zb4) respectively. Region A in Table 1 may be Region B shown in FIG. 3F . The coordinates of the four vertices of the region C are (xc1, yc1, zc1), (xc2, yc2, zc2), (xc3, yc3, zc3), (xc4, yc4, zc4) respectively. Region C in Table 1 may be Region C shown in FIG. 3F .

Further, if the image of the book includes a drawing area or a table area, and the drawing area also includes text, the electronic device 100 can also detect the text and the position of the text in the drawing area when performing layout analysis.

Further, in a possible implementation manner, the electronic device 100 may detect the inclination angle of the text in the text area.

S203, the electronic device 100 detects a finger in the image collected at time T10, and starts to determine the coordinates of the finger in the collected image.

When the user starts to point to the text in the book with his finger, the image captured by the electronic device 100 may include the user's finger. The electronic device 100 can detect the finger in the captured image. The electronic device 100 may store a finger detection model, and the electronic device 100 inputs the collected image into the finger detection model, and the finger detection model can determine whether the input image contains a finger or does not contain a finger.

In a possible implementation manner, the electronic device 100 inputs the image 401 in FIG. 4A into the finger detection model, and the finger detection model can output the image 402 as shown in FIG. 4B . The finger detection model can label the detected fingers with the finger detection box 4022 . The finger detection model can also label fingertips 4021.

It can be understood that the electronic device 100 continuously collects multiple frames of images, and the electronic device 100 can sequentially input each frame of images collected into the finger detection model for finger detection. The device can begin to determine the coordinates of the finger in the frame of image. The electronic device 100 may take the coordinates of the fingertip as the coordinates of the finger. If the electronic device 100 does not detect a finger in one frame of image, the electronic device 100 can detect whether the next frame of image (or an image frame collected after a preset time interval) contains a finger. Until the electronic device 100 detects the finger in the image collected at time T10, it starts to determine the coordinates of the finger in the collected image.

Illustratively, (a) to (i) of FIG. 5 are exemplary image frames captured by the electronic device 100 at time t1-t9. The image frames captured from the time t1 to the time t9 can show the complete process of the user clicking and reading the character "cat" to be recognized. First, the user drops his finger and clicks on the text "cat" in the picture book, then moves his finger away from the picture book. (a) of FIG. 5 is an image frame captured by the electronic device 100 at time t1. (b) The picture shows the image frame captured by the electronic device at time t2. No finger is detected in the image frames captured by the electronic device 100 at time t1 and time t2. (c) The figure is an image frame captured by the electronic device 100 at time t3, and the electronic device 100 can detect a finger in the image frame at time t3. The electronic device 100 starts to acquire the coordinates of the finger in the image frame. (d) The picture shows the image frame captured by the electronic device at time t4. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t4. (e) The picture shows the image frame captured by the electronic device at time t5. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t5. (f) The picture shows the image frame captured by the electronic device at time t6. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t6. (g) The picture shows the image frame captured by the electronic device at time t7. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t7. (h) The picture shows the image frame captured by the electronic device at time t8. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t8. (i) The picture shows the image frame captured by the electronic device at time t9. No finger is detected in the image frame captured by the electronic device 100 at time t9.

As shown in FIG. 5 , the electronic device 100 acquires an image frame at time t1. The electronic device 100 performs finger detection on the image frame captured at time t1, and no finger is detected. The electronic device 100 does not execute the steps after step S203. The electronic device 100 continues to perform finger detection on the next frame of image. For example, the electronic device 100 may perform finger detection on the image captured at time t2. If no finger is detected, the electronic device 100 performs finger detection on the image captured at time t3. The electronic device 100 detects a finger in the image frame captured at time t3. The electronic device 100 can determine the coordinates of the finger in the image frame captured at time t3. Specifically, it may be the coordinates of the fingertip. Time T10 may be time t3 shown in FIG. 5 .

Here, the image frames captured at time t1 to the image frames captured at time t9 shown in FIG. 5 may be consecutive image frames captured by the electronic device 100 . Time t1 and t2, t2 and t3, t3 and t4, t4 and t5, t5 and t6, t6 and t7, t7 and t8, and time between t8 and t9 The interval is related to the frame rate at which the electronic device 100 captures images.

Optionally, the image frames captured at time t1 to the image frames captured at time t9 shown in FIG. 5 may also be image frames captured by the electronic device 100 at preset time intervals. That is, the time interval between time t1 and time t2 may be a preset time interval, the time interval between time t2 and time t3 may be a preset time interval, and the time interval between time t3 and time t4 may be a preset time interval interval, the time interval between the time t4 and the time t5 can be the preset time interval, the time interval between the time t5 and the time t6 can be the preset time interval, and the time interval between the time t6 and the time t7 can be the preset time interval The time interval, the time interval between time t7 and time t8 may be a preset time interval, and the time interval between time t8 and time t9 may be a preset time interval.

The preset time interval may be configured by the system of the electronic device 100 .

It can be understood that, the electronic device 100 can sequentially perform finger detection on each frame of images collected. In this way, the finger in the image frame can be detected in time, so that the time when the user starts to read can be accurately determined.

Optionally, the electronic device 100 may also perform finger detection on images collected at preset time intervals. In this way, the power of the electronic device can be saved.

Further, in a possible implementation manner, when the electronic device 100 detects that a finger appears in the image frame, and the vertical distance between the finger in the image frame and the book B is reduced to the preset vertical distance D01, the electronic device 100 starts to acquire the coordinates of the finger in the captured image frame.

In a possible implementation manner, when the electronic device 100 detects that a finger appears in the image frame, and the vertical distance between the finger in the image frame and the book B gradually decreases, the electronic device 100 starts to acquire the finger in the captured image frame coordinate of.

S204: When the distance between the coordinates of the finger in the image collected at time T11 and the coordinates of the finger in the image frame collected before the preset time duration T21 is less than the preset distance D1, the electronic device 100 starts recording the coordinates of the finger in the collected image .

When the electronic device 100 determines at time T11 that the finger is in a stationary state, the electronic device 100 may record the coordinates of the finger in the image collected at time T11 and use the coordinates as the starting point of the user's reading track. When the distance between the coordinates of the finger in the image collected by the electronic device at time T11 and the coordinates of the finger in the image frame collected before the preset duration T21 is less than the preset distance D1, the electronic device 100 may determine that the user's finger is stationary at time T11 state.

The preset duration T21 may be 0.5 seconds, may be 1 second, or may be 2 seconds, which is not limited here. The preset distance D1 may be 10 pixels, 5 pixels, or 15 pixels, and the specific value of the preset distance D1 is not limited in this embodiment of the present application.

The preset duration T21 and the preset distance D1 may be configured by the system of the electronic device 100 .

When a general user needs the electronic device 100 to assist in learning the character to be recognized, the user will point his finger on the character to be recognized. Hold for a while before moving your finger. For example, as shown in (d) of Fig. 5, the user points his finger on the text "cat" for a period of time (for example, 0.5 seconds, 1 second, not limited here), and then starts from (d) Move from the position of the middle finger to the position of the finger in the figure (e). In this embodiment of the present application, when the distance between the coordinates of the finger in the image collected by the electronic device 100 at time T11 and the coordinates of the finger in the image frame collected before the preset duration T21 is less than the preset distance D1, the electronic device 100 determines the time T11 The finger is in a stationary state in the captured image. The coordinate point of the finger in the image collected at time T11 is the starting point of the user's reading track. That is, the user starts to point the finger on the text to be recognized, and starts to select the text to be recognized. Thus, the electronic device 100 records the coordinates of the finger in the image frame.

It can be understood that the electronic device 100 detects that there is a finger in the image frame, and then acquires the coordinates of the finger. The electronic device 100 may temporarily store the coordinates in the memory, and after the coordinates of the finger in the image frame are used to calculate the distance from the coordinates of the finger in the next frame of image, the electronic device releases the stored coordinates of the finger in the image frame.

The electronic device 100 may record the coordinates of the finger in the image frame captured at time T11. The coordinates of the finger in the image frame captured at time T11 can be recorded in the memory for recording the point reading track. After the electronic device 100 calculates the distance between the coordinates of the finger in the image frame captured at time T11 and the coordinates of the finger in the image frame after the preset duration T21, the electronic device 100 still records the coordinates of the finger in the image frame captured at time T11 in the image frame used for recording. Record the point read trace in memory.

S205: The distance between the coordinates of the finger in the image collected by the electronic device 100 at time T12 and the coordinates of the finger in the image frame collected before the preset duration T22 is less than the preset distance D2, and stops recording the coordinates of the finger in the collected image.

When the electronic device 100 detects that the user's finger is in a stationary state again. That is, the electronic device 100 determines that the distance of the coordinates of the fingers in the image frames collected before the preset duration T22 of the coordinates of the fingers in the images collected at time T12 is smaller than the preset distance D2. After the time T11, when the electronic device 100 detects that the user's finger is stationary again from the motion state, the electronic device 100 stops recording the coordinates of the finger in the captured image. That is, the electronic device 100 takes the coordinates of the finger in the image collected at time T12 as the coordinates of the end point of the user's reading track.

Wherein, T22 may be greater than T21, and may also be less than or equal to T21, which is not limited here. D2 may be greater than D1, and may also be less than or equal to D1, which is not limited here. The preset duration T22 and the preset distance D2 may be configured by the system of the electronic device 100 .

Exemplarily, as shown in FIG. 5 , the distance between the coordinates of the finger in the image frame captured at time t7 in (g) in FIG. 5 and the coordinates of the finger in the image frame captured at time t6 in (f) is smaller than the predetermined distance. Assuming the distance D2, the electronic device 100 stops saving the coordinates of the finger in the captured image.

It can be understood that the electronic device 100 saves the coordinates of the fingers in the images collected between time T11 and time T12. That is, the trajectory coordinates of the user's finger in the picture book to read at one time. As shown in FIG. 6 , the coordinate trajectory of the finger in the image collected between time T11 and time T12 may be shown as line segment P3P4 in FIG. 6 . The electronic device 100 may store the coordinates of the points between the line segments P3P4. The line segment P3P4 is the finger trajectory of the user's finger in the picture book, and the finger 807 trajectory is used to select the text to be recognized in the picture book.

S206 , when the electronic device 100 does not detect a finger in the image collected at time T13 , the electronic device 100 determines the gesture G to be read by the finger according to the coordinates of the finger saved from time T11 to time T12 .

When the electronic device 100 does not detect a finger in the image collected after time T12 (ie, time T13 ), the electronic device 100 may determine that the user has selected the text to be recognized. The text to be recognized is the text selected by the finger from time T11 to time T12. The electronic device 100 may determine the text to be recognized according to the coordinates of the fingers in the image frames captured from the time T11 to the time T12, the gestures clicked by the user, and the layout analysis result.

First, the electronic device 100 may determine the gesture G when the user clicks according to the coordinates of the fingers in the multiple image frames captured from the time T11 to the time T12. When the distance between the finger coordinates in the two image frames before and after the time T11 to the time T12 is smaller than the preset distance D10, and the distance between the finger coordinates in the image frame captured at the time T11 and the finger coordinates in the image frame captured at the time T12 is smaller than the preset distance D10 When the distance is D11, the electronic device 100 determines that the pointing gesture of the finger is "point". D10 is less than or equal to D11.

When the distance between the coordinates of the finger in the image frame captured at time T11 and the coordinates of the finger in the image frame captured at time T12 is greater than the distance D12, and the coordinates of the finger in the image frames captured from time T11 to time T12 are linearly correlated, the electronic device 100 may It is determined that the pointing gesture of the finger is "drawing a line".

When the distance between the finger coordinates in the two image frames before and after the time T11 to the time T12 is gradually increased, and then gradually decreases, the distance between the finger coordinates in the image frame captured at time T11 and the finger coordinates in the image frame captured at time T12 is When the distance between them is smaller than the distance D13, the electronic device 100 may determine that the pointing finger of the finger is "drawing a circle".

Because the size of the picture book, the size of the characters in the picture book and the distance between the characters can affect the values of D10, D11, D12, and D13. In order to more accurately determine the gesture of the user's reading. In a possible implementation manner, the electronic device 100 performs convex hull fitting on the finger coordinate points in the image frames captured from time T11 to time T12, and converts the sampling points into polar coordinate points after uniform sampling to obtain a polar coordinate map. The electronic device 100 inputs the polar coordinate graph into the gesture recognition model, and after the gesture recognition model recognizes the polar coordinate graph, it outputs the gesture type corresponding to the polar coordinate graph.

Exemplarily, in this embodiment of the present application, the following formula can be used to perform convex hull fitting on the finger coordinate points in Cartesian coordinates:

New(x1,y1)＝f(old(x0,y0)) Formula 1

Wherein, old(x,y) is the coordinates of the finger determined from the image frames collected by the electronic device 100 from time T11 to time T12, and New(x,y) is the coordinates of the finger after convex hull fitting is performed on the coordinates of the finger.

The electronic device 100 may uniformly sample the fitted convex hull trajectory to obtain sampling points (xi, yi) (i=1, . . . , N). The electronic device 100 can determine the center point M(xm, ym) in the sampling points:

The electronic device 100 can calculate the coordinates of the relative position of each convex hull fitting point relative to the center point as:

New'(x2,y2)＝(New(x1,y1)-M(xm,ym)) Formula 3

The electronic device 100 can convert the convex hull fitting into polar coordinates, where the origin of the polar coordinates is the center point calculated by the above formula 2, and the electronic device 100 can determine each convex hull according to the relative position of the convex hull fitting point and the center point. The polar coordinates of the fitting point of the package can be referred to the following formula:

θ=atan2(y2,x2)θ∈(-π,π) Formula 5

The electronic device 100 can convert the sampling points into polar coordinate points according to Formula 4 and Formula 5, and then save the plurality of polar coordinate points as a polar coordinate map. The electronic device 100 inputs the polar coordinate graph into the gesture recognition model, and can obtain the gesture type corresponding to the polar coordinate graph.

As shown in FIG. 7A, FIG. 7A exemplarily shows a polar diagram. The coordinates of the finger in the polar coordinate diagram can be sequentially connected according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and a closed curve can be formed. The electronic device 100 may input the polar coordinate diagram shown in FIG. 7A into the gesture recognition model, and the gesture recognition model may output the gesture type of the finger corresponding to the polar coordinate diagram. The gesture type is "circle".

As shown in FIG. 7B, FIG. 7B exemplarily shows another polar diagram. The coordinates of the finger in the polar coordinate graph may be sequentially stored in the polar coordinate graph according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and the coordinate points of a plurality of fingers are concentrated in a certain area. The electronic device 100 may input the polar coordinate diagram shown in FIG. 7B into the gesture recognition model, and the gesture recognition model may output the gesture type of the finger corresponding to the polar coordinate diagram. The gesture type is "point".

As shown in FIG. 7C, FIG. 7C exemplarily shows yet another polar plot. The coordinates of the finger in the polar coordinate diagram can be sequentially connected according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and a polyline can be formed. The electronic device 100 can input the polar coordinate diagram shown in FIG. 7C into the gesture recognition model, and the gesture recognition model can output the gesture type of the finger corresponding to the polar coordinate diagram. The gesture type is "Draw".

S207 , the electronic device determines the text region Q to be recognized according to the coordinates of the finger, the gesture G, and the positions of the multiple text regions and the multiple text regions in the book B stored from time T11 to time T12 .

The electronic device can determine the text area Q to be recognized according to the coordinates of the fingers recorded from time T11 to time T12, that is, the trajectory of the user's finger in the picture book, and the result of the gesture result layout analysis.

Specifically, in a possible implementation manner, when the gesture G is "point", the electronic device 100 uses the text area where the track recorded from time T11 to time T12 is located as the text area to be recognized. The electronic device 100 takes the text with the smallest distance from the coordinates of the finger stored from time T11 to time T12 in the text area to be recognized as the text to be recognized specified by the user.

In a possible implementation manner, when the gesture G is "drawing a line", the electronic device 100 uses the text region intersecting with the track recorded from time T11 to time T12 as the text region Q to be recognized. It can be understood that, the intersection of the track and the text area may be that all the track is within the text area, or a preset proportion of the track is within the text area (for example, half of the track is within the text area A). The electronic device 100 may take the characters above the track in the character area Q as the characters to be recognized designated by the user.

In a possible implementation manner, when the gesture G is "drawing a circle", the electronic device may use the text region that overlaps with the track recorded from time T11 to time T12 as the text region Q to be recognized. The electronic device may use the text in the track recorded from time T11 to time T12 in the text area Q as the text to be recognized selected by the user.

In a possible implementation manner, the user may set point reading that only recognizes the text area in the electronic device 100 . That is, the electronic device 100 determines that the track formed by the coordinates of the fingers stored from time T11 to time T12 is in the text area of the picture book, and the electronic device 100 determines the text area Q to be recognized, and executes step S208. When the electronic device 100 determines that the track formed by the coordinates of the fingers saved by the user from time T11 to time T12 is in the drawing area or table area of the picture book, the electronic device 100 does not execute step S208 .

Illustratively, as shown in FIG. 8 , FIG. 8 exemplarily shows a picture book 800 . The picture book 800 may include a drawing area 801 , a table area 802 , a text area 803 and a text area 804 . The track formed by the coordinates of the finger stored by the electronic device 100 from time T11 to time T12 may be finger track 807 or finger track 809 in FIG. 8 . The electronic device 100 may determine that the finger trace 807 is in the drawing area 801 , or the finger trace 809 is in the table area 802 . After the electronic device 100 determines that the gesture trace 807 is in the drawing area 801 or the finger trace 809 is in the table area 802, the electronic device 100 may prompt the user on the display screen that the current point-to-read area does not conform to the set point-to-read area . The electronic device does not perform step S208. When the electronic device 100 determines that the finger track is in the text region 803 or the text region 804, the electronic device 100 may determine the text region to be recognized according to the finger track and the text region, and execute step S208.

Further, it can be understood that, if the user does not set the recognition-only text area, when the user selects text in the drawing area, the electronic device 100 can recognize and broadcast the text to be recognized selected by the user. When the electronic device 100 performs layout analysis, the position information of the characters contained in the drawing area can be obtained. In this way, the electronic device 100 can determine the text to be recognized selected by the user according to the user's finger trajectory and the position information of the text in the drawing area. Therefore, the electronic device 100 can recognize and broadcast the text to be recognized selected by the user.

Further, if the electronic device 100 determines that the track formed by the coordinates of the fingers stored from time T11 to time T12 is not in the text area, the electronic device 100 can detect whether there is a finger in the captured image, and if a finger is detected, step S203 is performed. If the electronic device 100 does not detect a finger in the captured image within a preset time, the electronic device 100 may close the "point reading" APP. Alternatively, the electronic device 100 may enter a standby state. In this way, the power of the electronic device 100 can be saved and the power consumption can be reduced.

In a possible implementation manner, when the electronic device 100 determines that the part of the trajectory formed by the coordinates of the fingers saved from time T11 to time T12 that falls within the text area is greater than a preset threshold, the electronic device determines the text area to be recognized Q, and Step S208 is executed. Otherwise, the electronic device 100 does not perform the determination of the to-be-recognized text area Q and step S208. Wherein, the preset threshold may be 50%, or 55%, 60%, etc., which is not limited here. For example, as shown in the finger track 808 shown in FIG. 8 , about 20% of the finger track falls in the text area. If the preset threshold is 50%, the electronic device 100 does not perform determining the text area Q to be recognized and step S208 . If the track formed by the coordinates of the finger saved by the electronic device 100 from time T11 to time T12 is the finger track 805 or the finger track 806, the electronic device 100 can determine that the part of the finger track 805 or the finger track 806 that falls within the text area is larger than the predetermined track. Set the threshold. The electronic device can determine the text region to be recognized according to the finger track and the text region.

In a possible implementation manner, when the track formed by the coordinates of the fingers saved by the electronic device from time T11 to time T12 is two or more finger tracks, the electronic device 100 may divide the two or more tracks into the track range. The larger trajectory and the character area determined by the character area are used as the character area Q to be recognized. For example, as shown in FIG. 8 , if the tracks formed by the coordinates of the fingers stored by the electronic device 100 from time T11 to time T12 are finger track 805 and finger track 806 , and the range of finger track 806 is larger than the range of finger track 805 , then the electronic device 100 takes the character area determined by the finger track 806 and the character area 803 as the final character area Q to be recognized.

In a possible implementation manner, the electronic device 100 may determine the text region Q to be recognized according to the coordinates of the finger, the gesture G, and the positions of the multiple text regions and the multiple text regions in the book B saved from time T11 to time T12 . The electronic device 100 can delineate the text to be detected in the to-be-recognized area Q through the text detection frame. As shown in FIG. 9A , FIG. 9A may include a text area 900 to be recognized. The electronic device 100 can determine that the character to be recognized is the character “cat” delineated by the character detection frame 902 according to the coordinates of the finger.

Since there is a certain angle between the characters and the fingertips, the characters to be recognized that are detected by the electronic device 100 may be inaccurate.

Further, in a possible implementation manner, the electronic device determines the offset S0 (S0=dcosθ) according to the angle θ between the finger and the text in the text area to be recognized, and the width d of the finger detection frame, and the electronic device 100 The character detection frame will be moved according to the offset S0, and the electronic device 100 takes the character enclosed by the character detection frame as the character to be recognized in the character area Q to be recognized.

It can be understood that, when the electronic device 100 performs layout analysis on the image of the picture book, the inclination angle β of the characters in the character area in the picture book can be obtained. The electronic device 100 may acquire the inclination angle σ of the finger when detecting the finger. The electronic device can obtain the angle θ between the finger and the text in the text area to be recognized according to the inclination angle β and the inclination angle σ.

As shown in FIG. 9B , the text detection frame 903 in FIG. 9B is the text detection frame obtained after the electronic device 100 moves the text detection frame 902 in FIG. 9A according to the offset. The text detection box 903 determines that the circled text "sum" is the text to be recognized. In this way, the electronic device 100 can more accurately determine the text to be recognized specified by the user.

Optionally, the electronic device 100 can multiply the offset S0 by the offset coefficient α to obtain the offset S1, the electronic device moves the text detection frame according to the offset S1, and the electronic device 100 uses the text enclosed by the text detection frame as the text to be Recognize the text to be recognized in the text area Q. The offset coefficient α may be configured by the system of the electronic device 100 . The value range of the offset coefficient α may be [0.2, 2]. When the electronic device 100 determines that the user's finger is on the same position multiple times, it indicates that the electronic device fails to accurately recognize and broadcast the text to be recognized specified by the user, and the electronic device 100 can adjust the offset coefficient α.

Optionally, the electronic device 100 may record the included angle between the finger and the text during point reading within a preset time period, and the offset corresponding to the angle. The electronic device 100 may establish a mapping relationship between the angle between the finger and the text and the offset. In this way, after the electronic device 100 determines the angle between the finger and the character, a mapping relationship can be established according to the angle between the finger and the character and the offset to find the offset corresponding to the angle. In this way, the calculation amount of the electronic device can be reduced.

Optionally, in a possible implementation manner, when the electronic device 100 determines that the vertical distance from the finger in the captured image frame to the book B is greater than the preset vertical distance D11, the electronic device 100 saves the data according to the time T11 to the time T12. The coordinates of the finger determine the gesture G that the finger reads.

In a possible implementation manner, when the electronic device 100 determines that the vertical distance between the finger in the captured image frame and the book B gradually increases, the electronic device 100 determines the finger point reading according to the coordinates of the finger saved from time T11 to time T12 gesture G.

S208 , the electronic device 100 recognizes and broadcasts the text in the text region Q to be recognized.

The electronic device 100 can recognize the text detected in the text region Q to be recognized. After the electronic device recognizes the text, it broadcasts the text and voice. For example, as shown in FIG. 1A , the electronic device 100 broadcasts the text "cat" designated by the user.

It can be understood that the characters specified by the user in the embodiments of the present application include, but are not limited to, characters in different forms such as Chinese characters, Japanese, Korean, and English.

It can be understood that step S202 can be executed after step S203 and before step S207.

With the method for recognizing point-to-read text provided by the embodiment of the present application, the electronic device 100 can continuously collect images of the first picture book; when a finger appears in the image collected by the electronic device, the electronic device determines that the user starts point-reading; the electronic device The image of the first picture book is analyzed to obtain a text analysis result; when the distance between the coordinates of the finger in the image frame currently collected by the electronic device and the coordinates of the finger in the image frame collected before the preset duration is less than the preset distance, the electronic device determines the image frame The finger in the middle is stationary, and the electronic device can record the track coordinates between the two stationary fingers. The electronic device may determine the to-be-recognized character area Q and the to-be-recognized characters in the to-be-recognized character area Q according to the track coordinates between the two stationary periods. The electronic device recognizes and broadcasts the text to be recognized. In this way, the coordinates of the electronic device 100 when the finger is stationary twice serve as the starting point and the ending point of the user's reading track, respectively. In this way, the electronic device 100 can accurately determine the starting position when the user clicks. Therefore, the electronic device 100 can accurately determine the character to be recognized according to the track coordinates between two times of the finger resting. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience. In addition, the electronic device 100 can recognize characters in any book, and does not need to customize the book.

FIG. 10 exemplarily shows a flowchart of another method for recognizing point-to-read characters provided by an embodiment of the present application. As shown in Figure 10, a method for recognizing point-and-read characters provided by the present application may include the following steps:

S1001. In response to the user's first operation, the electronic device 100 starts to capture an image, wherein the image captured by the electronic device 100 includes the user's finger and the content of the book, and the user's finger and the book are located in the target area of the electronic device.

For the first operation, reference may be made to the description in step S201, which is not repeated here. In response to the user's first operation, the camera of the electronic device 00 may continuously capture images.

Optionally, before the electronic device performs step S1002, the foregoing step S202 may also be performed.

S1002 , the electronic device 100 recognizes the pointing gesture of the user according to the position movement of the user's finger recognized by the collected image.

The electronic device 100 can identify the user's finger in the captured image, and can determine the position of the user's finger in the frame of image. The electronic device 100 may recognize the user's pointing gesture according to the finger positions in the collected multi-frame images.

It can be understood that the electronic device 100 may start recording the coordinates of the finger at the starting point from the starting point of the pointing gesture, and finish recording the coordinates of the finger at the end point when the pointing gesture ends.

In a possible implementation manner, the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, including: if the starting point recorded by the electronic device is between the starting point and the ending point If the distance of the position of any finger is less than the third preset distance, the electronic device recognizes that the point reading gesture is a point; if the coordinates of the position of the finger between the starting point recorded by the electronic device and the end point are linearly related, then the electronic device recognizes The gesture of reading out the point is a line; if the distance between the starting point and the ending point recorded by the electronic device is less than the fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than the fifth preset distance, the electronic device recognizes The gesture of reading out point is to draw a circle.

The gesture G in the above steps S201 to S208 may also be referred to as a pointing gesture. Herein, for step S1002, reference may be made to the descriptions in the foregoing steps S203 to S206, which are not repeated here.

S1003 , the electronic device 100 determines the target text in the content of the book in the captured image according to the pointing gesture and the position of the trajectory of the pointing gesture.

The electronic device 100 can click the gesture and the position of the trajectory of the gesture in the image, and can determine the target text in the content of the book. The target text is the text selected by the user to be recognized in the book.

In a possible implementation manner, the electronic device determines the target text in the content of the book in the collected image according to the position of the reading gesture and the trajectory of the reading gesture, including: the electronic device determines the content of the book according to the collected image Chinese The position of the word area; the electronic device determines the target text in the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area. Here, the electronic device determines the position of the text area in the content of the book according to the collected image, that is, the electronic device performs layout analysis on the collected image, and then analyzes the position of the text area in the book. For the specific process of the layout analysis performed by the electronic device, reference may be made to the description in the foregoing step S202, which will not be repeated here.

Herein, step S1003 may refer to the description in step S207, which will not be repeated here.

S1004, the electronic device 100 broadcasts the recognized target text.

The electronic device 100 can recognize the target text. After the electronic device recognizes the target text, it broadcasts the text and voice. For example, as shown in FIG. 1A , the electronic device 100 broadcasts the text "cat" designated by the user.

The text in the to-be-recognized text area Q in step S208 may be referred to as a target text.

It can be understood that, in this embodiment of the present application, the electronic device 100 can recognize the text in any book designated by the user. The characters specified by the user include but are not limited to characters in different forms such as Chinese characters, Japanese, Korean, and English.

With the method for recognizing point-to-read text provided by the embodiment of the present application, the electronic device 100 starts to collect images in response to the first operation of the user, wherein the images collected by the electronic device 100 include the user's finger and the content of the book , the user's finger and the book are located in the target area of the electronic device; the electronic device 100 recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image; The position of the track determines the target text in the content of the book in the captured image; the electronic device 100 broadcasts the recognized target text. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience. In addition, the electronic device 100 can recognize characters in any book, and does not need to customize the book.

The following first introduces the exemplary electronic device 100 provided by the embodiments of the present application.

FIG. 11 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.

The embodiment will be described in detail below by taking the electronic device 100 as an example. It should be understood that the electronic device 100 may have more or fewer components than those shown in the figures, may combine two or more components, or may have different component configurations. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 2, a wireless Communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194 and so on. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, a magnetic sensor 180D, an acceleration sensor 180E, a touch sensor 180K, and the like.

It can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

The controller may be the nerve center and command center of the electronic device 100 . The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.

The charging management module 140 is used to receive charging input from the charger. The charger may be a wireless charger or a wired charger.

The power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 . The power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .

The wireless communication function of the electronic device 100 may be implemented by the antenna 2, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

Display screen 194 is used to display images, videos, and the like. Display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light). emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on. In some embodiments, the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.

The electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process the data fed back by the camera 193 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193 .

Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.

A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.

Video codecs are used to compress or decompress digital video.

The NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process the input information, and can continuously learn by itself. Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.

The internal memory 121 may include one or more random access memories (RAM) and one or more non-volatile memories (NVM).

Random access memory can include static random-access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronization Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, such as fifth-generation DDR SDRAM is generally called DDR5 SDRAM), etc.;

Non-volatile memory may include magnetic disk storage devices, flash memory.

Flash memory can be divided into NOR FLASH, NAND FLASH, 3D NAND FLASH, etc. according to the operating principle, and can include single-level memory cells (single-level cells, SLC), multi-level memory cells (multi-level memory cells) according to the level of storage cell potential cell, MLC), triple-level cell (TLC), quad-level cell (QLC), etc., according to the storage specification can include universal flash storage (English: universal flash storage, UFS) , embedded multimedia memory card (embedded multi media Card, eMMC) and so on.

The random access memory can be directly read and written by the processor 110, and can be used to store executable programs (eg, machine instructions) of an operating system or other running programs, and can also be used to store data of users and application programs.

The non-volatile memory can also store executable programs and store data of user and application programs, etc., and can be loaded into the random access memory in advance for the processor 110 to directly read and write.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.

The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .

Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The receiver 170B, also referred to as "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be answered by placing the receiver 170B close to the human ear.

The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.

The earphone jack 170D is used to connect wired earphones. The earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.

The pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.

The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .

The magnetic sensor 180D includes a Hall sensor.

The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).

Touch sensor 180K, also called "touch panel". The touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to touch operations may be provided through display screen 194 . In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.

The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .

Motor 191 can generate vibrating cues. The indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.

FIG. 12 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application. It can be understood that FIG. 12 is only a schematic diagram of an exemplary software structure of the electronic device 100. The software structure of the electronic device 100 in the embodiment of the present application may also be a software structure provided by other operating systems (eg, ISO operating system, Hongmeng operating system, etc.), which is not limited here.

The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces. In some embodiments, the system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, a runtime (Runtime) and a system library, and a kernel layer.

The application layer can include a series of application packages.

As shown in FIG. 12 , the application package may include camera, gallery, calendar, call, map, navigation, WLAN, music, video, short message, reading and other applications (also referred to as applications).

The point-to-read application program refers to an application program that can implement the method for point-to-read text recognition provided by the embodiments of the present application. The name of the application program may be called "Reading" or "Assisted Learning", etc. The name of the application program is not limited here.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.

As shown in Figure 12, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.

A window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.

Content providers are used to store and retrieve data and make these data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.

The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications. A display interface can consist of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide the communication function of the electronic device 100 . For example, the management of call status (including connecting, hanging up, etc.).

The resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.

The notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also display notifications in the status bar at the top of the system in the form of a graph or scroll bar text, such as notifications of applications running in the background, and can also display notifications on the screen in the form of a dialog interface. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.

Runtime (Runtime) includes core libraries and virtual machines. Runtime is responsible for the scheduling and management of the system.

The core library consists of two parts: one part is the function functions that the programming language (for example, jave language) needs to call, and the other part is the core library of the system.

The application layer and the application framework layer run in virtual machines. The virtual machine executes application layer and application framework layer programming files (eg, jave files) as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.

A system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.

The Surface Manager is used to manage the display subsystem and provides a fusion of two-dimensional (2-Dimensional, 2D) and three-dimensional (3-Dimensional, 3D) layers for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.

2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is the layer between hardware and software. The kernel layer contains at least display drivers, camera drivers, audio drivers, sensor drivers, and virtual card drivers.

In the following, the workflow of the software and hardware of the electronic device 100 is exemplarily described in conjunction with the capturing and photographing scene.

When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer. The application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon, for example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer. The camera 193 captures still images or video.

FIG. 13 is another exemplary schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.

As shown in FIG. 13 , the electronic device 100 may include: a processor 1201 , a camera 1202 , a display screen 1203 , a speaker 1204 and a sensor 1205 .

A camera 1202 is used to capture images.

The processor 1201 is configured to detect the image captured by the camera 1202 and determine the coordinates of the finger in the image captured by the camera 1202 . The processor 1201 determines the time when the user starts reading and ends the reading according to the image captured by the camera 1202 , and determines the text to be recognized specified by the user according to the image captured by the camera 1202 . The processor 1201 can also convert the recognized text into an audio electrical signal, and send the audio electrical signal to the speaker 1204 .

The display screen 1203 can display the image captured by the camera 1202 . The display screen 1203 may also display the icon of the "click to read" APP, and display prompt text.

The speaker 1204 can receive the audio electrical signal sent by the processor 1201, and convert the audio electrical signal into a sound signal. The electronic device 100 can broadcast the text read by the user through the speaker 1204 .

The sensor 1205 can be a touch sensor, and the touch sensor can be placed on the display screen 1203, and the touch sensor and the display screen 1203 form a touch screen, also called a "touch screen". A touch sensor is used to detect touch operations on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present application.

As used in the above embodiments, the term "when" may be interpreted to mean "if" or "after" or "in response to determining..." or "in response to detecting..." depending on the context. Similarly, depending on the context, the phrases "in determining..." or "if detecting (the stated condition or event)" can be interpreted to mean "if determining..." or "in response to determining..." or "on detecting (the stated condition or event)" or "in response to the detection of (the stated condition or event)".

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented. The process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium. When the program is executed , which may include the processes of the foregoing method embodiments. The aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Claims

A method for recognizing point-to-read characters, characterized in that the method comprises:

In response to the user's first operation, the electronic device starts to capture an image, wherein the image captured by the electronic device includes the user's finger and the content of the book, and the user's finger and the book are located within the target area of the electronic device;

The electronic device recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image;

The electronic device determines the target text in the content of the book in the collected image according to the pointing gesture and the position of the trajectory of the pointing gesture;

The electronic device broadcasts the recognized target text.
The method according to claim 1, wherein the pointing gesture includes one or more of the following: dots, dashes, and circles.
The method according to claim 2, wherein the electronic device recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image, comprising:

After the electronic device detects the user's finger through the collected image, if it is detected that the position of the finger in the collected image moves less than the first preset distance within the first preset time period, the electronic device will The first position is recorded as the starting point of the reading gesture;

After the electronic device starts recording the starting point of the pointing gesture, if it detects that the movement of the second position of the finger in the collected image within the second preset time period is less than the second preset distance, the electronic device will The electronic device records the second position as the end point of the pointing gesture;

The electronic device recognizes the pointing gesture according to a starting point of the pointing gesture and an end point of the pointing gesture.
The method according to claim 3, wherein the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, comprising:

If the distance between the starting point recorded by the electronic device and the position of any one of the fingers between the starting point and the ending point is less than a third preset distance, the electronic device recognizes that the reading gesture is point;

If the coordinates of the position of the finger between the starting point and the ending point recorded by the electronic device are linearly correlated, the electronic device recognizes that the pointing gesture is a dash;

If the distance between the starting point and the ending point recorded by the electronic device is less than a fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than a fifth preset distance distance, the electronic device recognizes the pointing gesture as drawing a circle.
The method according to claim 4, wherein the electronic device determines the target text in the content of the book in the collected image according to the pointing gesture and the position of the trajectory of the pointing gesture ,include:

The electronic device determines the position of the text area in the content of the book according to the collected image;

The electronic device determines the target text in the content of the book according to the pointing gesture and the position of the trajectory of the pointing gesture, and the position of the text area.
The method according to claim 5, wherein the electronic device determines the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area. Target text, including:

The electronic device determines a first text area according to the trajectory of the pointing gesture and the position of the text area in the first book, the first text area includes a first trajectory, and the first trajectory is the point reading A portion of the gesture's trajectory that is greater than or equal to the preset ratio;

The electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture and the first text area.
The method according to claim 6, wherein the electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture and the first text area, comprising:

If the pointing gesture is a point, the electronic device determines that the text with the smallest distance from the first trajectory in the first text area is the target text;

If the point reading potential is a dashed line, the electronic device determines that the text above the first track in the first text area is the target text;

If the pointing gesture is to draw a circle, the electronic device determines that the text in the first track in the first text area is the target text.
The method according to any one of claims 1-7, wherein the first preset duration and the second preset duration are equal, and the first preset distance and the second preset distance equal.
An electronic device, comprising: a camera, a display screen, one or more processors, and one or more memories; the one or more processors and the camera, the one or more memories, and The display screen is coupled, and the one or more memories are used to store computer program code, the computer program code including computer instructions, when the one or more processors execute the computer instructions, cause the terminal to execute the above rights The method for recognizing point-read characters according to any one of requirements 1-8.
A computer-readable storage medium, characterized by comprising computer instructions, which, when the computer instructions are executed on an electronic device, cause the electronic device to perform the identification point reading according to any one of claims 1-8 text method.
A computer program product, characterized in that, when the computer program product runs on a computer, the computer is made to execute the method for recognizing point-and-read characters according to any one of claims 1-8.