WO2022194180A1 - Method for recognizing touch-to-read text, and electronic device - Google Patents

Method for recognizing touch-to-read text, and electronic device Download PDF

Info

Publication number
WO2022194180A1
WO2022194180A1 PCT/CN2022/081042 CN2022081042W WO2022194180A1 WO 2022194180 A1 WO2022194180 A1 WO 2022194180A1 CN 2022081042 W CN2022081042 W CN 2022081042W WO 2022194180 A1 WO2022194180 A1 WO 2022194180A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
finger
text
gesture
user
Prior art date
Application number
PCT/CN2022/081042
Other languages
French (fr)
Chinese (zh)
Inventor
张红蕾
李力骏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022194180A1 publication Critical patent/WO2022194180A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Definitions

  • the present application relates to the field of terminal artificial intelligence (Artificial Intelligence, AI) and the field of character recognition, and in particular, to a method and electronic device for recognizing point-and-read characters.
  • AI Artificial Intelligence
  • electronic devices with a reading function such as a reading pen, a tablet computer, a robot, etc.
  • a reading pen can be used to assist users in reading picture books.
  • the reading pen can only recognize the text in a specific picture book.
  • Some electronic devices such as tablet computers and robots can only recognize the text in the electronic picture book. This limits the user's learning resources.
  • the accuracy of recognizing the text that the user needs to read according to the user's gesture is not high.
  • the present application provides a method and an electronic device for recognizing point-and-read characters. Through the method for recognizing point-and-read characters, the electronic device can more accurately recognize the characters specified by a user in a book.
  • the present application provides a method for recognizing point-and-read characters, the method may include: the electronic device 100 starts to capture an image in response to a first operation of the user, wherein the image captured by the electronic device 100 includes the user The user's finger and the book are located in the target area of the electronic device; the electronic device 100 recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image; the electronic device 100 The position of the trajectory of the gesture and the reading gesture determines the target text in the content of the book in the captured image; the electronic device 100 broadcasts the recognized target text.
  • the electronic device can accurately determine the target text from the image collected by the electronic device in combination with the user's gesture. Therefore, the electronic device can accurately identify the target character. In this way, user experience can be improved.
  • the point-to-read gesture includes one or more of the following: dots, dashes, and circles.
  • the electronic device recognizes the user's pointing gesture according to the position movement of the user's finger recognized by the collected image, including: after the electronic device detects the user's finger through the collected image, if the first It is detected that the position of the finger in the collected image moves less than the first preset distance within the preset time period, and the electronic device records the first position as the starting point of the pointing gesture; after the electronic device starts recording the starting point of the pointing gesture, if it detects The movement of the second position of the finger in the image collected within the second preset duration is less than the second preset distance, and the electronic device records the second position as the end point of the reading gesture; End point to recognize the read gesture.
  • the electronic device 100 may start recording the coordinates of the finger at the starting point from the starting point of the pointing gesture, and finish recording the coordinates of the finger at the end point when the pointing gesture ends. In this way, the electronic device can more accurately determine the starting point and the ending point of the user's reading gesture trajectory.
  • the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, including: if the electronic device records the starting point and the position of any finger between the starting point and the ending point If the distance is less than the third preset distance, the electronic device recognizes that the reading gesture is a point; if the coordinates of the position of the finger between the starting point and the end point recorded by the electronic device are linearly correlated, the electronic device recognizes that the reading gesture is a dash; If the distance between the starting point and the ending point recorded by the electronic device is less than the fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than the fifth preset distance, the electronic device recognizes that the reading gesture is a circle.
  • the electronic device can accurately determine the specific type of the point-to-read gesture.
  • the electronic device determines the target text in the content of the book in the collected image according to the position of the reading gesture and the trajectory of the reading gesture, including: the electronic device determines the content of the book according to the collected image Chinese The position of the word area; the electronic device determines the target text in the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area.
  • the electronic device can more accurately determine the target text, that is, the text that the user needs to recognize and broadcast in the selected book.
  • the electronic device determines the target text in the content of the book according to the position of the reading gesture and the trajectory of the reading gesture, and the position of the text area, including: the electronic device according to the trajectory of the reading gesture and The position of the text area in the first book determines the first text area, the first text area includes a first trajectory, and the first trajectory is a part of the trajectory of the pointing gesture that is greater than or equal to a preset ratio; the electronic device is based on the first trajectory and The point reading gesture and the first text area are used to determine the target text in the content of the book.
  • the electronic device determines that the user needs to recognize the text in the first text region only when most of the trajectory of the user's reading gesture falls within the first text region. In this way, when a part of the user's gesture track falls in the first text area and a part falls in the second text area, the electronic device can also correctly determine the target text.
  • the electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture, and the first text area, including: if the pointing gesture is a point, the electronic device determines that the first text area is the same as the one in the first text area. The text with the smallest distance from the first track is the target text; if the reading potential is a dash, the electronic device determines that the text above the first track in the first text area is the target text; if the reading gesture is a circle, the electronic device determines the first text The text in the first track in the region is the target text.
  • the electronic device adopts different strategies to determine the target text, which can improve the accuracy of the electronic device in determining the target text in the captured image. Therefore, the accuracy of identifying the target character by the electronic device can be improved.
  • the first preset distance is equal to the second preset distance
  • the second preset duration is equal to the first preset duration
  • the present application provides a method for recognizing point-to-read characters, the method may include: in response to a first operation, the electronic device collects an image of the first book; When the distance between the coordinates of the finger and the coordinates of the finger in the image frame collected before the first preset time is less than the first preset distance, the electronic device starts to record the coordinates of the finger in the image frame; when the image collected by the electronic device at the second moment When the distance between the coordinates of the finger in the frame and the coordinates of the finger in the image frame collected before the second preset duration is less than the second preset distance, the electronic device stops recording the coordinates of the finger in the image frame, and the second time is after the first time time; the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the first time to the second time; the electronic device recognizes and broadcasts the text to be recognized.
  • the text to be recognized may be referred to as a target text, that is, the text specified by the user in the book to be recognized.
  • the coordinates of the two times when the finger of the electronic device is stationary are used as the starting point and the ending point of the user's reading track, respectively.
  • the electronic device can accurately determine the starting position when the user clicks to read. Therefore, the electronic device can accurately determine the characters to be recognized according to the track coordinates between two times of the finger resting. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience.
  • the electronic device can recognize the characters in any book, and there is no need to customize the book.
  • the method further includes: when the electronic device detects the finger in the image frame collected at the third moment, start to obtain the coordinates of the finger in the collected image frame, and the third moment is: Moments before the first moment.
  • the electronic device may take the moment when the finger appears in the image as the start of the user's reading, and only when the user is clicking, can the coordinates of the finger in the image frame be collected. In this way, it is avoided that the electronic device still performs subsequent steps of point reading when the user's finger is not detected. Therefore, the calculation amount of the electronic device can be reduced, and the power consumption of the electronic device can be saved.
  • the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the first time to the second time, which specifically includes: the electronic device records the text from the first time to the second time The coordinates of the finger determine the pointing gesture of the finger; the electronic device determines the position of the text area in the first book according to the image of the first book; the electronic device determines the coordinates of the finger, the pointing gesture and the first The position of the text area in the book determines the text to be recognized in the first book.
  • the electronic device can more accurately determine the target text, that is, the text that the user needs to recognize and broadcast in the selected book.
  • the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first moment to the second moment, specifically including: if the electronic device determines the coordinates of the finger recorded from the first moment to the second moment The distance between any one of the coordinates and the other coordinates is less than the third preset distance, then the electronic device determines that the pointing gesture is the first gesture; if the electronic device determines that the coordinates of the fingers recorded from the first moment to the second moment are linearly related, then the electronic device The device determines that the reading gesture is the second gesture; if the electronic device determines that the distance between the coordinates of the finger recorded at the first moment and the coordinates of the finger recorded at the second moment among the coordinates of the fingers recorded from the first moment to the second moment is smaller than the fourth The preset distance, the distance between the coordinates of the finger recorded at the fourth moment and the coordinates of the finger recorded at the first moment is greater than the fifth preset distance, then the electronic device determines that the pointing gesture is the third
  • the electronic device can accurately determine the specific type of the point-to-read gesture.
  • the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the second time to the third time, the pointing gesture, and the position of the text area in the first book, which specifically includes :
  • the electronic device connects the coordinates of the fingers recorded from the first moment to the second moment to obtain the first finger trajectory according to the recording sequence;
  • the electronic device determines the first text area according to the first finger trajectory and the position of the text area in the first book.
  • the area includes a second finger trajectory, and the second finger trajectory is a part of the first finger trajectory that is greater than or equal to a preset ratio; the electronic device determines the text to be recognized according to the second finger trajectory, the finger gesture, and the first text area.
  • the electronic device determines that the user needs to recognize the text in the first text region only when most of the trajectory of the user's reading gesture falls within the first text region. In this way, when a part of the user's gesture track falls in the first text area and a part falls in the second text area, the electronic device can also correctly determine the target text.
  • the electronic device determines the text to be recognized according to the trajectory of the second finger, the gesture of the finger, and the first text area, which specifically includes: if the gesture of the finger is the first gesture, the electronic device determines the difference between the first text area and the second text area. The text with the smallest finger track distance is the text to be recognized; if the finger gesture is the second gesture, the electronic device determines that the text above the second finger track in the first text area is the text to be recognized; if the finger gesture is the third gesture, the electronic device determines that the text above the second finger track is the text to be recognized. It is determined that the text in the track of the second finger in the first text area is the text to be recognized.
  • the first gesture is a dot
  • the second gesture is a line
  • the third gesture is a circle.
  • the electronic device adopts different strategies to determine the target text, which can improve the accuracy of the electronic device in determining the target text in the captured image. Therefore, the accuracy of identifying the target character by the electronic device can be improved.
  • the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first time to the second time, which specifically includes: when the electronic device does not detect the finger in the image collected at the fifth time , the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first moment to the second moment.
  • the electronic device can determine the end time of the user's reading.
  • the electronic device may stop performing the pointing step (eg, determine the coordinates of the finger in the image) until the user starts pointing again. In this way, the power consumption of the electrons can be saved.
  • the electronic device determines that the conditions under which the finger is still in the images collected by the user at different times are the same, which can reduce the calculation amount of the electronic device.
  • an electronic device comprising: one or more processors and a memory; the memory is coupled to the one or more processors, the memory is used to store computer program code, the computer program code includes computer instructions,
  • the one or more processors invoke computer instructions to cause the electronic device to execute the method of recognizing point-and-click text in any possible implementation manner of the first aspect and any possible implementation manner of the second aspect.
  • the embodiments of the present application provide a computer storage medium, including computer instructions, when the computer instructions are run on an electronic device, the electronic device is made to perform the identification point reading in any of the possible implementations of any of the above aspects text method.
  • an embodiment of the present application provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to execute the method for recognizing point-and-read characters in any of the possible implementations of any one of the above aspects .
  • FIG. 1A is a schematic diagram of an application scenario of a robot that can be used for point reading provided by an embodiment of the present application;
  • 1B is a schematic diagram of another application scenario of the robot that can be used for point reading provided by an embodiment of the present application;
  • 1C is a schematic diagram of another application scenario of the robot that can be used for point reading provided by an embodiment of the present application;
  • FIG. 2 is a schematic flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application
  • 3A-3D are schematic diagrams of a set of user interfaces of the electronic device 100 provided by the embodiments of the present application.
  • FIG. 3E is a schematic diagram of collecting a picture book image by the electronic device 100 provided by the embodiment of the present application.
  • 3F is a schematic diagram of content area division of a picture book provided by an embodiment of the present application.
  • FIG. 4A is a schematic diagram of an image frame when the electronic device 100 according to the embodiment of the present application collects a frame of images when a user clicks and reads;
  • FIG. 4B is a schematic diagram of finger detection performed on the collected image frame by the electronic device 100 provided by the embodiment of the present application;
  • FIG. 5 is a schematic diagram of a group of image frames when the electronic device 100 according to an embodiment of the present application collects a user's point reading;
  • FIG. 6 is a schematic diagram of a trajectory when an electronic user clicks and reads provided by an embodiment of the present application
  • FIGS. 7A-7C are schematic diagrams of polar coordinate diagrams corresponding to different finger trajectories when a user points and reads according to an embodiment of the present application;
  • FIG. 8 is a schematic diagram of the combination of the analysis result of the picture book layout and the trajectory of the user's finger provided by an embodiment of the present application;
  • 9A-9B are schematic diagrams of a group of text detection provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application
  • FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a software architecture of an electronic device provided by an embodiment of the present application.
  • FIG. 13 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • first and second are only used for descriptive purposes, and should not be construed as implying or implying relative importance or implying the number of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise specified, the “multiple” The meaning is two or more.
  • "point reading” may mean that the electronic device can recognize and play the text specified by the user in the picture book by voice.
  • the user specifies the text in the picture book. For example, the user's finger points below the text to be recognized in the picture book, or the user's finger draws a line below the text to be recognized in the picture book, and the user's finger draws a circle to delineate the text to be recognized, and so on.
  • the user may use a finger to point under the text (ie, “cat”) to be recognized in the picture book 102 .
  • the robot 100 may include a camera 103 and a camera 104 .
  • the picture book 102 is placed in the photographing field of view area 101 of the camera 103 and the camera 104 .
  • the camera 103 and/or the camera 104 of the robot 100 may capture the image of the user in the picture book 102 where the user's finger is below the word "cat”.
  • the robot 100 can recognize the character "cat” and play the character "cat”.
  • the user can use a finger to draw a line (or cross) under the text (ie, “cat and mouse”) to be recognized in the picture book 102 .
  • the robot 100 can recognize the text “cat and mouse” and play the text "cat and mouse”.
  • the user may draw a circle with a finger in the picture book 102 to delineate the desired identification text (ie, "cat and mouse”).
  • the robot 100 can recognize the text “cat and mouse” and play the text "cat and mouse”.
  • the gestures of the user when reading can be divided into: “dot”, “line”, “circle” and so on. It can be understood that the categories and names of gestures in the embodiments of the present application are not limited.
  • the electronic device may be the robot 100 shown in FIG. 1A , FIG. 1B and FIG. 1C , or may be a terminal device with a camera such as a tablet computer, a smart phone, or the like, and the electronic device may be a camera and a camera.
  • a point reading device composed of a terminal with a character recognition function is not limited in this embodiment of the present application.
  • An embodiment of the present application provides a method for recognizing point-to-read text, the method may include: the electronic device 100 can continuously collect images of the first picture book; when a finger appears in the image collected by the electronic device, the electronic device determines that the user starts point-to-read The electronic device analyzes the image of the first picture book to obtain a text analysis result; the electronic device determines the trajectory and gesture of the user according to the coordinates of the fingers in the collected multi-frame images, and the electronic device The reading gesture and the text analysis result determine the text area Q containing the text to be recognized; the electronic device recognizes and voice broadcasts the text to be recognized in the text area Q.
  • FIG. 2 exemplarily shows a flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application.
  • a method for recognizing point-and-read characters provided by the present application may include the following steps:
  • the camera of the electronic device 100 starts to capture the image of the book B.
  • the electronic device 100 may receive the user's first operation.
  • the first operation of the user may be to turn on the electronic device 100 or to turn on the reading APP in the electronic device 100 .
  • the electronic device 100 may start capturing images by the camera of the electronic device 100 in response to the user's first operation.
  • the electronic device 100 continuously collects multiple frames of images.
  • the electronic device 100 may be the robot 100 shown in FIGS. 3A-3E
  • the book B may be the book 102 shown in FIG. 3E .
  • the embodiment of the present application does not limit the book B. That is, in the method provided by the embodiment of the present application, the electronic device 100 can recognize the text in any book that the user clicks to read.
  • the electronic device 100 may be the robot 100 shown in FIG. 3A .
  • the electronic device 100 may include a camera 103 and a camera 104 and a display screen 105 .
  • the icon 106 of the Dot-Read APP can be displayed on the display screen 105 .
  • the user's first operation may be to click on the icon 106 of the "click to read" APP.
  • the camera 103 and the camera 104 of the robot 100 start to capture images.
  • the display screen 105 of the robot 100 can display the book display area 1051 and the prompt text 1052 .
  • the book display area 1051 can display the images collected by the camera 103 and the camera 104 .
  • the prompt text 1052 may prompt the user to place the book to be learned in the shooting area of the camera 103 and the camera 104 .
  • the content of the prompt text 1052 may be "please put the book in the shooting area", and the specific content of the prompt text 1052 is not limited here.
  • the prompt text 1052 may be displayed in the book display area 1051, or may be displayed outside the book display area 1051, and the specific position of the prompt text 1052 is not limited here.
  • the display screen 105 of the robot 100 may further include a control 1053 .
  • the control 1053 is used to trigger the robot 100 to perform layout analysis and finger detection on the collected images.
  • the image 1021 of the book 102 may be displayed in the book display area 1051 of the display screen 105 .
  • the user can adjust the position of the book according to the image 1021 in the display area 1051 . For example, if the user sees that only the right half of the book 102 is displayed in the display area 1051 , the user can move the book to the right so that the book 102 moves into the shooting field 101 of the camera 103 and the camera 104 . After the user sees the complete image of the book 102 in the display area 1051, the user can click on the control 1053.
  • the electronic device may display a gesture available for reading text on the display screen 105 .
  • the user may click below the text to be recognized, may also draw a line below the text to be recognized, or may draw a circle to delineate the text to be recognized. In this way, the user can be prompted to use a gesture recognizable by the electronic device 100 to read.
  • the electronic device 100 performs layout analysis on the image of the book B, and determines the type of the book content in the book B and the position corresponding to the book content.
  • the electronic device 100 may perform layout analysis on a frame of image of the book B collected by the camera, and obtain the position of the text area in the current page of the book B on the book page.
  • the electronic device 100 may store a layout analysis model, the electronic device 100 inputs a frame of image into the layout analysis model, and the model can output the type of book content (text, drawing, table, etc.) contained in the image and the book content corresponding location.
  • the layout analysis model the electronic device 100 can determine that the image of the book B may include one or more of a text area, a drawing area, and a table area.
  • the text area may refer to an area that only contains text in a frame of image.
  • the drawing area can refer to the area of an image that contains drawing.
  • the table area may refer to an area of a frame image that contains a table. It will be appreciated that the plot area and table area may also contain text.
  • One frame of image of the book B collected by the electronic device 100 may include one or more text regions, and/or drawing regions, and/or table regions.
  • the image 1021 of the book 102 may include area A, area B, and area C.
  • Area A and Area C are drawing areas, and area B is text area.
  • the area A may be a rectangular area with vertices A1(xa1, ya1, za1), A2(xa2, ya2, za2), A3(xa3, ya3, za3), A4(xa4, ya4, za4).
  • Region B may be a rectangular region with vertices B1 (xb1, yb1, zb1), B2 (xb2, yb2, zb2), B3 (xb3, yb3, zb3), B4 (xb4, yb4, zb4).
  • Region C may be a rectangular region with vertices C1(xc1, yc1, zc1), C2(xc2, yc2, zc2), C3(xc3, yc3, zc3), C4(xc4, yc4, zc4).
  • the shape of the text area and the drawing area obtained by the electronic device 100 by performing layout analysis on the image of the book B is not limited to a rectangle, and may also be other shapes, such as polygons, circles, and the like.
  • the electronic device 100 may use the upper left vertex of the photographing field of view of the electronic device 100 as the origin to establish a coordinate system.
  • the origin O of the coordinate system XYZ is the upper left vertex of the camera field 101 of the robot.
  • the electronic device 100 may input the image 1021 of the book 102 shown in FIG. 3F into the layout analysis model for layout analysis, and obtain the type of content contained in the image 1021 and the location of the content as shown in Table 1 below.
  • the electronic device 100 performs layout analysis on the image 1021, and can determine the drawings and characters contained in the image 1021, as well as the positions of the drawings and characters.
  • the contents included in the area A and the area C in the image 1021 are drawings, and the contents included in the area B are characters.
  • the coordinates of the four vertices of the area A are (xa1, ya1, za1), (xa2, ya2, za2), (xa3, ya3, za3), (xa4, ya4, za4), respectively.
  • the area A in Table 1 may be the area A shown in FIG. 3F .
  • the coordinates of the four vertices of the region B are (xb1, yb1, zb1), (xb2, yb2, zb2), (xb3, yb3, zb3), (xb4, yb4, zb4) respectively.
  • Region A in Table 1 may be Region B shown in FIG. 3F .
  • the coordinates of the four vertices of the region C are (xc1, yc1, zc1), (xc2, yc2, zc2), (xc3, yc3, zc3), (xc4, yc4, zc4) respectively.
  • Region C in Table 1 may be Region C shown in FIG. 3F .
  • the electronic device 100 can also detect the text and the position of the text in the drawing area when performing layout analysis.
  • the electronic device 100 may detect the inclination angle of the text in the text area.
  • the electronic device 100 detects a finger in the image collected at time T10, and starts to determine the coordinates of the finger in the collected image.
  • the image captured by the electronic device 100 may include the user's finger.
  • the electronic device 100 can detect the finger in the captured image.
  • the electronic device 100 may store a finger detection model, and the electronic device 100 inputs the collected image into the finger detection model, and the finger detection model can determine whether the input image contains a finger or does not contain a finger.
  • the electronic device 100 inputs the image 401 in FIG. 4A into the finger detection model, and the finger detection model can output the image 402 as shown in FIG. 4B .
  • the finger detection model can label the detected fingers with the finger detection box 4022 .
  • the finger detection model can also label fingertips 4021.
  • the electronic device 100 continuously collects multiple frames of images, and the electronic device 100 can sequentially input each frame of images collected into the finger detection model for finger detection.
  • the device can begin to determine the coordinates of the finger in the frame of image.
  • the electronic device 100 may take the coordinates of the fingertip as the coordinates of the finger. If the electronic device 100 does not detect a finger in one frame of image, the electronic device 100 can detect whether the next frame of image (or an image frame collected after a preset time interval) contains a finger. Until the electronic device 100 detects the finger in the image collected at time T10, it starts to determine the coordinates of the finger in the collected image.
  • (a) to (i) of FIG. 5 are exemplary image frames captured by the electronic device 100 at time t1-t9.
  • the image frames captured from the time t1 to the time t9 can show the complete process of the user clicking and reading the character "cat" to be recognized. First, the user drops his finger and clicks on the text "cat" in the picture book, then moves his finger away from the picture book.
  • (a) of FIG. 5 is an image frame captured by the electronic device 100 at time t1.
  • (b) The picture shows the image frame captured by the electronic device at time t2. No finger is detected in the image frames captured by the electronic device 100 at time t1 and time t2.
  • the figure is an image frame captured by the electronic device 100 at time t3, and the electronic device 100 can detect a finger in the image frame at time t3.
  • the electronic device 100 starts to acquire the coordinates of the finger in the image frame.
  • the picture shows the image frame captured by the electronic device at time t4.
  • the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t4.
  • the picture shows the image frame captured by the electronic device at time t5.
  • the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t5.
  • the picture shows the image frame captured by the electronic device at time t6.
  • the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t6.
  • the picture shows the image frame captured by the electronic device at time t7.
  • the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t7.
  • the picture shows the image frame captured by the electronic device at time t8.
  • the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t8.
  • the picture shows the image frame captured by the electronic device at time t9. No finger is detected in the image frame captured by the electronic device 100 at time t9.
  • the electronic device 100 acquires an image frame at time t1.
  • the electronic device 100 performs finger detection on the image frame captured at time t1, and no finger is detected.
  • the electronic device 100 does not execute the steps after step S203.
  • the electronic device 100 continues to perform finger detection on the next frame of image.
  • the electronic device 100 may perform finger detection on the image captured at time t2. If no finger is detected, the electronic device 100 performs finger detection on the image captured at time t3.
  • the electronic device 100 detects a finger in the image frame captured at time t3.
  • the electronic device 100 can determine the coordinates of the finger in the image frame captured at time t3. Specifically, it may be the coordinates of the fingertip.
  • Time T10 may be time t3 shown in FIG. 5 .
  • the image frames captured at time t1 to the image frames captured at time t9 shown in FIG. 5 may be consecutive image frames captured by the electronic device 100 .
  • the interval is related to the frame rate at which the electronic device 100 captures images.
  • the image frames captured at time t1 to the image frames captured at time t9 shown in FIG. 5 may also be image frames captured by the electronic device 100 at preset time intervals. That is, the time interval between time t1 and time t2 may be a preset time interval, the time interval between time t2 and time t3 may be a preset time interval, and the time interval between time t3 and time t4 may be a preset time interval interval, the time interval between the time t4 and the time t5 can be the preset time interval, the time interval between the time t5 and the time t6 can be the preset time interval, and the time interval between the time t6 and the time t7 can be the preset time interval.
  • the time interval, the time interval between time t7 and time t8 may be a preset time interval, and the time interval between time t8 and time t9 may be a preset time interval.
  • the preset time interval may be configured by the system of the electronic device 100 .
  • the electronic device 100 can sequentially perform finger detection on each frame of images collected. In this way, the finger in the image frame can be detected in time, so that the time when the user starts to read can be accurately determined.
  • the electronic device 100 may also perform finger detection on images collected at preset time intervals. In this way, the power of the electronic device can be saved.
  • the electronic device 100 when the electronic device 100 detects that a finger appears in the image frame, and the vertical distance between the finger in the image frame and the book B is reduced to the preset vertical distance D01, the electronic device 100 starts to acquire the coordinates of the finger in the captured image frame.
  • the electronic device 100 when the electronic device 100 detects that a finger appears in the image frame, and the vertical distance between the finger in the image frame and the book B gradually decreases, the electronic device 100 starts to acquire the finger in the captured image frame coordinate of.
  • the electronic device 100 may record the coordinates of the finger in the image collected at time T11 and use the coordinates as the starting point of the user's reading track.
  • the electronic device 100 may determine that the user's finger is stationary at time T11 state.
  • the preset duration T21 may be 0.5 seconds, may be 1 second, or may be 2 seconds, which is not limited here.
  • the preset distance D1 may be 10 pixels, 5 pixels, or 15 pixels, and the specific value of the preset distance D1 is not limited in this embodiment of the present application.
  • the preset duration T21 and the preset distance D1 may be configured by the system of the electronic device 100 .
  • the user When a general user needs the electronic device 100 to assist in learning the character to be recognized, the user will point his finger on the character to be recognized. Hold for a while before moving your finger. For example, as shown in (d) of Fig. 5, the user points his finger on the text "cat" for a period of time (for example, 0.5 seconds, 1 second, not limited here), and then starts from (d) Move from the position of the middle finger to the position of the finger in the figure (e).
  • a period of time for example, 0.5 seconds, 1 second, not limited here
  • the electronic device 100 determines the time T11 The finger is in a stationary state in the captured image.
  • the coordinate point of the finger in the image collected at time T11 is the starting point of the user's reading track. That is, the user starts to point the finger on the text to be recognized, and starts to select the text to be recognized.
  • the electronic device 100 records the coordinates of the finger in the image frame.
  • the electronic device 100 detects that there is a finger in the image frame, and then acquires the coordinates of the finger.
  • the electronic device 100 may temporarily store the coordinates in the memory, and after the coordinates of the finger in the image frame are used to calculate the distance from the coordinates of the finger in the next frame of image, the electronic device releases the stored coordinates of the finger in the image frame.
  • the electronic device 100 may record the coordinates of the finger in the image frame captured at time T11.
  • the coordinates of the finger in the image frame captured at time T11 can be recorded in the memory for recording the point reading track.
  • the electronic device 100 After the electronic device 100 calculates the distance between the coordinates of the finger in the image frame captured at time T11 and the coordinates of the finger in the image frame after the preset duration T21, the electronic device 100 still records the coordinates of the finger in the image frame captured at time T11 in the image frame used for recording. Record the point read trace in memory.
  • S205 The distance between the coordinates of the finger in the image collected by the electronic device 100 at time T12 and the coordinates of the finger in the image frame collected before the preset duration T22 is less than the preset distance D2, and stops recording the coordinates of the finger in the collected image.
  • the electronic device 100 detects that the user's finger is in a stationary state again. That is, the electronic device 100 determines that the distance of the coordinates of the fingers in the image frames collected before the preset duration T22 of the coordinates of the fingers in the images collected at time T12 is smaller than the preset distance D2.
  • the electronic device 100 stops recording the coordinates of the finger in the captured image. That is, the electronic device 100 takes the coordinates of the finger in the image collected at time T12 as the coordinates of the end point of the user's reading track.
  • T22 may be greater than T21, and may also be less than or equal to T21, which is not limited here.
  • D2 may be greater than D1, and may also be less than or equal to D1, which is not limited here.
  • the preset duration T22 and the preset distance D2 may be configured by the system of the electronic device 100 .
  • the distance between the coordinates of the finger in the image frame captured at time t7 in (g) in FIG. 5 and the coordinates of the finger in the image frame captured at time t6 in (f) is smaller than the predetermined distance. Assuming the distance D2, the electronic device 100 stops saving the coordinates of the finger in the captured image.
  • the electronic device 100 saves the coordinates of the fingers in the images collected between time T11 and time T12. That is, the trajectory coordinates of the user's finger in the picture book to read at one time.
  • the coordinate trajectory of the finger in the image collected between time T11 and time T12 may be shown as line segment P3P4 in FIG. 6 .
  • the electronic device 100 may store the coordinates of the points between the line segments P3P4.
  • the line segment P3P4 is the finger trajectory of the user's finger in the picture book, and the finger 807 trajectory is used to select the text to be recognized in the picture book.
  • the electronic device 100 determines the gesture G to be read by the finger according to the coordinates of the finger saved from time T11 to time T12 .
  • the electronic device 100 may determine that the user has selected the text to be recognized.
  • the text to be recognized is the text selected by the finger from time T11 to time T12.
  • the electronic device 100 may determine the text to be recognized according to the coordinates of the fingers in the image frames captured from the time T11 to the time T12, the gestures clicked by the user, and the layout analysis result.
  • the electronic device 100 may determine the gesture G when the user clicks according to the coordinates of the fingers in the multiple image frames captured from the time T11 to the time T12.
  • the distance between the finger coordinates in the two image frames before and after the time T11 to the time T12 is smaller than the preset distance D10
  • the distance between the finger coordinates in the image frame captured at the time T11 and the finger coordinates in the image frame captured at the time T12 is smaller than the preset distance D10
  • the electronic device 100 determines that the pointing gesture of the finger is "point". D10 is less than or equal to D11.
  • the electronic device 100 may It is determined that the pointing gesture of the finger is "drawing a line".
  • the electronic device 100 may determine that the pointing finger of the finger is "drawing a circle”.
  • the electronic device 100 performs convex hull fitting on the finger coordinate points in the image frames captured from time T11 to time T12, and converts the sampling points into polar coordinate points after uniform sampling to obtain a polar coordinate map.
  • the electronic device 100 inputs the polar coordinate graph into the gesture recognition model, and after the gesture recognition model recognizes the polar coordinate graph, it outputs the gesture type corresponding to the polar coordinate graph.
  • old(x,y) is the coordinates of the finger determined from the image frames collected by the electronic device 100 from time T11 to time T12
  • New(x,y) is the coordinates of the finger after convex hull fitting is performed on the coordinates of the finger.
  • the electronic device 100 can determine the center point M(xm, ym) in the sampling points:
  • the electronic device 100 can calculate the coordinates of the relative position of each convex hull fitting point relative to the center point as:
  • the electronic device 100 can convert the convex hull fitting into polar coordinates, where the origin of the polar coordinates is the center point calculated by the above formula 2, and the electronic device 100 can determine each convex hull according to the relative position of the convex hull fitting point and the center point.
  • the polar coordinates of the fitting point of the package can be referred to the following formula:
  • the electronic device 100 can convert the sampling points into polar coordinate points according to Formula 4 and Formula 5, and then save the plurality of polar coordinate points as a polar coordinate map.
  • the electronic device 100 inputs the polar coordinate graph into the gesture recognition model, and can obtain the gesture type corresponding to the polar coordinate graph.
  • FIG. 7A exemplarily shows a polar diagram.
  • the coordinates of the finger in the polar coordinate diagram can be sequentially connected according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and a closed curve can be formed.
  • the electronic device 100 may input the polar coordinate diagram shown in FIG. 7A into the gesture recognition model, and the gesture recognition model may output the gesture type of the finger corresponding to the polar coordinate diagram.
  • the gesture type is "circle".
  • FIG. 7B exemplarily shows another polar diagram.
  • the coordinates of the finger in the polar coordinate graph may be sequentially stored in the polar coordinate graph according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and the coordinate points of a plurality of fingers are concentrated in a certain area.
  • the electronic device 100 may input the polar coordinate diagram shown in FIG. 7B into the gesture recognition model, and the gesture recognition model may output the gesture type of the finger corresponding to the polar coordinate diagram.
  • the gesture type is "point".
  • FIG. 7C exemplarily shows yet another polar plot.
  • the coordinates of the finger in the polar coordinate diagram can be sequentially connected according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and a polyline can be formed.
  • the electronic device 100 can input the polar coordinate diagram shown in FIG. 7C into the gesture recognition model, and the gesture recognition model can output the gesture type of the finger corresponding to the polar coordinate diagram.
  • the gesture type is "Draw".
  • the electronic device determines the text region Q to be recognized according to the coordinates of the finger, the gesture G, and the positions of the multiple text regions and the multiple text regions in the book B stored from time T11 to time T12 .
  • the electronic device can determine the text area Q to be recognized according to the coordinates of the fingers recorded from time T11 to time T12, that is, the trajectory of the user's finger in the picture book, and the result of the gesture result layout analysis.
  • the electronic device 100 uses the text area where the track recorded from time T11 to time T12 is located as the text area to be recognized.
  • the electronic device 100 takes the text with the smallest distance from the coordinates of the finger stored from time T11 to time T12 in the text area to be recognized as the text to be recognized specified by the user.
  • the electronic device 100 uses the text region intersecting with the track recorded from time T11 to time T12 as the text region Q to be recognized. It can be understood that, the intersection of the track and the text area may be that all the track is within the text area, or a preset proportion of the track is within the text area (for example, half of the track is within the text area A).
  • the electronic device 100 may take the characters above the track in the character area Q as the characters to be recognized designated by the user.
  • the electronic device may use the text region that overlaps with the track recorded from time T11 to time T12 as the text region Q to be recognized.
  • the electronic device may use the text in the track recorded from time T11 to time T12 in the text area Q as the text to be recognized selected by the user.
  • the user may set point reading that only recognizes the text area in the electronic device 100 . That is, the electronic device 100 determines that the track formed by the coordinates of the fingers stored from time T11 to time T12 is in the text area of the picture book, and the electronic device 100 determines the text area Q to be recognized, and executes step S208. When the electronic device 100 determines that the track formed by the coordinates of the fingers saved by the user from time T11 to time T12 is in the drawing area or table area of the picture book, the electronic device 100 does not execute step S208 .
  • FIG. 8 exemplarily shows a picture book 800 .
  • the picture book 800 may include a drawing area 801 , a table area 802 , a text area 803 and a text area 804 .
  • the track formed by the coordinates of the finger stored by the electronic device 100 from time T11 to time T12 may be finger track 807 or finger track 809 in FIG. 8 .
  • the electronic device 100 may determine that the finger trace 807 is in the drawing area 801 , or the finger trace 809 is in the table area 802 .
  • the electronic device 100 may prompt the user on the display screen that the current point-to-read area does not conform to the set point-to-read area .
  • the electronic device does not perform step S208.
  • the electronic device 100 may determine the text region to be recognized according to the finger track and the text region, and execute step S208.
  • the electronic device 100 can recognize and broadcast the text to be recognized selected by the user.
  • the electronic device 100 performs layout analysis, the position information of the characters contained in the drawing area can be obtained. In this way, the electronic device 100 can determine the text to be recognized selected by the user according to the user's finger trajectory and the position information of the text in the drawing area. Therefore, the electronic device 100 can recognize and broadcast the text to be recognized selected by the user.
  • the electronic device 100 can detect whether there is a finger in the captured image, and if a finger is detected, step S203 is performed. If the electronic device 100 does not detect a finger in the captured image within a preset time, the electronic device 100 may close the "point reading" APP. Alternatively, the electronic device 100 may enter a standby state. In this way, the power of the electronic device 100 can be saved and the power consumption can be reduced.
  • the electronic device 100 determines that the part of the trajectory formed by the coordinates of the fingers saved from time T11 to time T12 that falls within the text area is greater than a preset threshold, the electronic device determines the text area to be recognized Q, and Step S208 is executed. Otherwise, the electronic device 100 does not perform the determination of the to-be-recognized text area Q and step S208.
  • the preset threshold may be 50%, or 55%, 60%, etc., which is not limited here. For example, as shown in the finger track 808 shown in FIG. 8 , about 20% of the finger track falls in the text area. If the preset threshold is 50%, the electronic device 100 does not perform determining the text area Q to be recognized and step S208 .
  • the electronic device 100 can determine that the part of the finger track 805 or the finger track 806 that falls within the text area is larger than the predetermined track. Set the threshold. The electronic device can determine the text region to be recognized according to the finger track and the text region.
  • the electronic device 100 may divide the two or more tracks into the track range.
  • the larger trajectory and the character area determined by the character area are used as the character area Q to be recognized.
  • the electronic device 100 takes the character area determined by the finger track 806 and the character area 803 as the final character area Q to be recognized.
  • the electronic device 100 may determine the text region Q to be recognized according to the coordinates of the finger, the gesture G, and the positions of the multiple text regions and the multiple text regions in the book B saved from time T11 to time T12 .
  • the electronic device 100 can delineate the text to be detected in the to-be-recognized area Q through the text detection frame.
  • FIG. 9A may include a text area 900 to be recognized.
  • the electronic device 100 can determine that the character to be recognized is the character “cat” delineated by the character detection frame 902 according to the coordinates of the finger.
  • the character detection frame will be moved according to the offset S0, and the electronic device 100 takes the character enclosed by the character detection frame as the character to be recognized in the character area Q to be recognized.
  • the inclination angle ⁇ of the characters in the character area in the picture book can be obtained.
  • the electronic device 100 may acquire the inclination angle ⁇ of the finger when detecting the finger.
  • the electronic device can obtain the angle ⁇ between the finger and the text in the text area to be recognized according to the inclination angle ⁇ and the inclination angle ⁇ .
  • the text detection frame 903 in FIG. 9B is the text detection frame obtained after the electronic device 100 moves the text detection frame 902 in FIG. 9A according to the offset.
  • the text detection box 903 determines that the circled text "sum" is the text to be recognized. In this way, the electronic device 100 can more accurately determine the text to be recognized specified by the user.
  • the electronic device 100 can multiply the offset S0 by the offset coefficient ⁇ to obtain the offset S1, the electronic device moves the text detection frame according to the offset S1, and the electronic device 100 uses the text enclosed by the text detection frame as the text to be Recognize the text to be recognized in the text area Q.
  • the offset coefficient ⁇ may be configured by the system of the electronic device 100 .
  • the value range of the offset coefficient ⁇ may be [0.2, 2].
  • the electronic device 100 may record the included angle between the finger and the text during point reading within a preset time period, and the offset corresponding to the angle.
  • the electronic device 100 may establish a mapping relationship between the angle between the finger and the text and the offset. In this way, after the electronic device 100 determines the angle between the finger and the character, a mapping relationship can be established according to the angle between the finger and the character and the offset to find the offset corresponding to the angle. In this way, the calculation amount of the electronic device can be reduced.
  • the electronic device 100 when the electronic device 100 determines that the vertical distance from the finger in the captured image frame to the book B is greater than the preset vertical distance D11, the electronic device 100 saves the data according to the time T11 to the time T12.
  • the coordinates of the finger determine the gesture G that the finger reads.
  • the electronic device 100 determines the finger point reading according to the coordinates of the finger saved from time T11 to time T12 gesture G.
  • the electronic device 100 recognizes and broadcasts the text in the text region Q to be recognized.
  • the electronic device 100 can recognize the text detected in the text region Q to be recognized. After the electronic device recognizes the text, it broadcasts the text and voice. For example, as shown in FIG. 1A , the electronic device 100 broadcasts the text "cat" designated by the user.
  • characters specified by the user in the embodiments of the present application include, but are not limited to, characters in different forms such as Chinese characters, Japanese, Korean, and English.
  • step S202 can be executed after step S203 and before step S207.
  • the electronic device 100 can continuously collect images of the first picture book; when a finger appears in the image collected by the electronic device, the electronic device determines that the user starts point-reading; the electronic device The image of the first picture book is analyzed to obtain a text analysis result; when the distance between the coordinates of the finger in the image frame currently collected by the electronic device and the coordinates of the finger in the image frame collected before the preset duration is less than the preset distance, the electronic device determines the image frame The finger in the middle is stationary, and the electronic device can record the track coordinates between the two stationary fingers.
  • the electronic device may determine the to-be-recognized character area Q and the to-be-recognized characters in the to-be-recognized character area Q according to the track coordinates between the two stationary periods.
  • the electronic device recognizes and broadcasts the text to be recognized.
  • the coordinates of the electronic device 100 when the finger is stationary twice serve as the starting point and the ending point of the user's reading track, respectively.
  • the electronic device 100 can accurately determine the starting position when the user clicks. Therefore, the electronic device 100 can accurately determine the character to be recognized according to the track coordinates between two times of the finger resting. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience.
  • the electronic device 100 can recognize characters in any book, and does not need to customize the book.
  • FIG. 10 exemplarily shows a flowchart of another method for recognizing point-to-read characters provided by an embodiment of the present application.
  • a method for recognizing point-and-read characters provided by the present application may include the following steps:
  • the electronic device 100 starts to capture an image, wherein the image captured by the electronic device 100 includes the user's finger and the content of the book, and the user's finger and the book are located in the target area of the electronic device.
  • the camera of the electronic device 00 may continuously capture images.
  • step S1002 before the electronic device performs step S1002, the foregoing step S202 may also be performed.
  • the electronic device 100 recognizes the pointing gesture of the user according to the position movement of the user's finger recognized by the collected image.
  • the electronic device 100 can identify the user's finger in the captured image, and can determine the position of the user's finger in the frame of image.
  • the electronic device 100 may recognize the user's pointing gesture according to the finger positions in the collected multi-frame images.
  • the point-to-read gesture includes one or more of the following: dots, dashes, and circles.
  • the electronic device recognizes the user's pointing gesture according to the position movement of the user's finger recognized by the collected image, including: after the electronic device detects the user's finger through the collected image, if the first It is detected that the position of the finger in the collected image moves less than the first preset distance within the preset time period, and the electronic device records the first position as the starting point of the pointing gesture; after the electronic device starts recording the starting point of the pointing gesture, if it detects The movement of the second position of the finger in the image collected within the second preset duration is less than the second preset distance, and the electronic device records the second position as the end point of the reading gesture; End point to recognize the read gesture.
  • the electronic device 100 may start recording the coordinates of the finger at the starting point from the starting point of the pointing gesture, and finish recording the coordinates of the finger at the end point when the pointing gesture ends.
  • the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, including: if the starting point recorded by the electronic device is between the starting point and the ending point If the distance of the position of any finger is less than the third preset distance, the electronic device recognizes that the point reading gesture is a point; if the coordinates of the position of the finger between the starting point recorded by the electronic device and the end point are linearly related, then the electronic device recognizes The gesture of reading out the point is a line; if the distance between the starting point and the ending point recorded by the electronic device is less than the fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than the fifth preset distance, the electronic device recognizes The gesture of reading out point is to draw a circle.
  • the gesture G in the above steps S201 to S208 may also be referred to as a pointing gesture.
  • step S1002 reference may be made to the descriptions in the foregoing steps S203 to S206, which are not repeated here.
  • the electronic device 100 determines the target text in the content of the book in the captured image according to the pointing gesture and the position of the trajectory of the pointing gesture.
  • the electronic device 100 can click the gesture and the position of the trajectory of the gesture in the image, and can determine the target text in the content of the book.
  • the target text is the text selected by the user to be recognized in the book.
  • the electronic device determines the target text in the content of the book in the collected image according to the position of the reading gesture and the trajectory of the reading gesture, including: the electronic device determines the content of the book according to the collected image Chinese The position of the word area; the electronic device determines the target text in the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area.
  • the electronic device determines the position of the text area in the content of the book according to the collected image, that is, the electronic device performs layout analysis on the collected image, and then analyzes the position of the text area in the book.
  • the electronic device determines the position of the text area in the content of the book according to the collected image, that is, the electronic device performs layout analysis on the collected image, and then analyzes the position of the text area in the book.
  • the electronic device determines the target text in the content of the book according to the position of the reading gesture and the trajectory of the reading gesture, and the position of the text area, including: the electronic device according to the trajectory of the reading gesture and The position of the text area in the first book determines the first text area, the first text area includes a first trajectory, and the first trajectory is a part of the trajectory of the pointing gesture that is greater than or equal to a preset ratio; the electronic device is based on the first trajectory and The point reading gesture and the first text area are used to determine the target text in the content of the book.
  • the electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture, and the first text area, including: if the pointing gesture is a point, the electronic device determines that the first text area is the same as the one in the first text area. The text with the smallest distance from the first track is the target text; if the reading potential is a dash, the electronic device determines that the text above the first track in the first text area is the target text; if the reading gesture is a circle, the electronic device determines the first text The text in the first track in the region is the target text.
  • step S1003 may refer to the description in step S207, which will not be repeated here.
  • the electronic device 100 broadcasts the recognized target text.
  • the electronic device 100 can recognize the target text. After the electronic device recognizes the target text, it broadcasts the text and voice. For example, as shown in FIG. 1A , the electronic device 100 broadcasts the text "cat" designated by the user.
  • the text in the to-be-recognized text area Q in step S208 may be referred to as a target text.
  • the electronic device 100 can recognize the text in any book designated by the user.
  • the characters specified by the user include but are not limited to characters in different forms such as Chinese characters, Japanese, Korean, and English.
  • the electronic device 100 starts to collect images in response to the first operation of the user, wherein the images collected by the electronic device 100 include the user's finger and the content of the book , the user's finger and the book are located in the target area of the electronic device; the electronic device 100 recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image; The position of the track determines the target text in the content of the book in the captured image; the electronic device 100 broadcasts the recognized target text. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience. In addition, the electronic device 100 can recognize characters in any book, and does not need to customize the book.
  • FIG. 11 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may have more or fewer components than those shown in the figures, may combine two or more components, or may have different component configurations.
  • the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 2, a wireless Communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194 and so on.
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, a magnetic sensor 180D, an acceleration sensor 180E, a touch sensor 180K, and the like.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 2, the wireless communication module 160, a modem processor, a baseband processor, and the like.
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED diode AMOLED
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
  • the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the internal memory 121 may include one or more random access memories (RAM) and one or more non-volatile memories (NVM).
  • RAM random access memories
  • NVM non-volatile memories
  • Random access memory can include static random-access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronization Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, such as fifth-generation DDR SDRAM is generally called DDR5 SDRAM), etc.;
  • SRAM static random-access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • fifth-generation DDR SDRAM is generally called DDR5 SDRAM
  • Non-volatile memory may include magnetic disk storage devices, flash memory.
  • Flash memory can be divided into NOR FLASH, NAND FLASH, 3D NAND FLASH, etc. according to the operating principle, and can include single-level memory cells (single-level cells, SLC), multi-level memory cells (multi-level memory cells) according to the level of storage cell potential cell, MLC), triple-level cell (TLC), quad-level cell (QLC), etc., according to the storage specification can include universal flash storage (English: universal flash storage, UFS) , embedded multimedia memory card (embedded multi media Card, eMMC) and so on.
  • SLC single-level memory cells
  • multi-level memory cells multi-level memory cells
  • MLC multi-level memory cells
  • TLC triple-level cell
  • QLC quad-level cell
  • UFS universal flash storage
  • eMMC embedded multimedia memory card
  • the random access memory can be directly read and written by the processor 110, and can be used to store executable programs (eg, machine instructions) of an operating system or other running programs, and can also be used to store data of users and application programs.
  • executable programs eg, machine instructions
  • the random access memory can be directly read and written by the processor 110, and can be used to store executable programs (eg, machine instructions) of an operating system or other running programs, and can also be used to store data of users and application programs.
  • the non-volatile memory can also store executable programs and store data of user and application programs, etc., and can be loaded into the random access memory in advance for the processor 110 to directly read and write.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .
  • the magnetic sensor 180D includes a Hall sensor.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • FIG. 12 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application. It can be understood that FIG. 12 is only a schematic diagram of an exemplary software structure of the electronic device 100.
  • the software structure of the electronic device 100 in the embodiment of the present application may also be a software structure provided by other operating systems (eg, ISO operating system, Hongmeng operating system, etc.), which is not limited here.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, a runtime (Runtime) and a system library, and a kernel layer.
  • the application layer can include a series of application packages.
  • the application package may include camera, gallery, calendar, call, map, navigation, WLAN, music, video, short message, reading and other applications (also referred to as applications).
  • the point-to-read application program refers to an application program that can implement the method for point-to-read text recognition provided by the embodiments of the present application.
  • the name of the application program may be called "Reading” or “Assisted Learning”, etc.
  • the name of the application program is not limited here.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide the communication function of the electronic device 100 .
  • the management of call status including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of a graph or scroll bar text, such as notifications of applications running in the background, and can also display notifications on the screen in the form of a dialog interface. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
  • Runtime includes core libraries and virtual machines. Runtime is responsible for the scheduling and management of the system.
  • the core library consists of two parts: one part is the function functions that the programming language (for example, jave language) needs to call, and the other part is the core library of the system.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes application layer and application framework layer programming files (eg, jave files) as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • surface manager surface manager
  • media library Media Libraries
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of two-dimensional (2-Dimensional, 2D) and three-dimensional (3-Dimensional, 3D) layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, sensor drivers, and virtual card drivers.
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon, for example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
  • the camera 193 captures still images or video.
  • FIG. 13 is another exemplary schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may include: a processor 1201 , a camera 1202 , a display screen 1203 , a speaker 1204 and a sensor 1205 .
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • a camera 1202 is used to capture images.
  • the processor 1201 is configured to detect the image captured by the camera 1202 and determine the coordinates of the finger in the image captured by the camera 1202 .
  • the processor 1201 determines the time when the user starts reading and ends the reading according to the image captured by the camera 1202 , and determines the text to be recognized specified by the user according to the image captured by the camera 1202 .
  • the processor 1201 can also convert the recognized text into an audio electrical signal, and send the audio electrical signal to the speaker 1204 .
  • the display screen 1203 can display the image captured by the camera 1202 .
  • the display screen 1203 may also display the icon of the "click to read” APP, and display prompt text.
  • the speaker 1204 can receive the audio electrical signal sent by the processor 1201, and convert the audio electrical signal into a sound signal.
  • the electronic device 100 can broadcast the text read by the user through the speaker 1204 .
  • the sensor 1205 can be a touch sensor, and the touch sensor can be placed on the display screen 1203, and the touch sensor and the display screen 1203 form a touch screen, also called a "touch screen".
  • a touch sensor is used to detect touch operations on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the term “when” may be interpreted to mean “if” or “after” or “in response to determining" or “in response to detecting" depending on the context.
  • the phrases “in determining" or “if detecting (the stated condition or event)” can be interpreted to mean “if determining" or “in response to determining" or “on detecting (the stated condition or event)” or “in response to the detection of (the stated condition or event)”.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.
  • the process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium.
  • the program When the program is executed , which may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A method for recognizing touch-to-read text. The method for recognizing touch-to-read text comprises: in response to a first operation of a user, an electronic device starting to collect an image; the electronic device recognizing a touch-to-read gesture of the user according to the position movement of a finger of the user that is recognized in the collected image; the electronic device determining target text in the content of a book in the collected image according to the touch-to-read gesture and the position of a trajectory of the touch-to-read gesture; and the electronic device broadcasting the recognized target text. Further disclosed are an electronic device, a computer-readable storage medium and a computer program product.

Description

一种识别点读文字的方法及电子设备A method and electronic device for recognizing point-and-read characters
本申请要求于2021年03月19日提交中国专利局、申请号为202110298494.6、申请名称为“一种识别点读文字的方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on March 19, 2021 with the application number of 202110298494.6 and the application title of "a method and electronic device for recognizing point-and-read characters", the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及终端人工智能(Artificial Intelligence,AI)领域以及文字识别领域,尤其涉及一种识别点读文字的方法及电子设备。The present application relates to the field of terminal artificial intelligence (Artificial Intelligence, AI) and the field of character recognition, and in particular, to a method and electronic device for recognizing point-and-read characters.
背景技术Background technique
随着科技的发展,越来越多的电子设备应用到教育中。例如,点读笔、平板电脑、机器人等等具有点读功能的电子设备可用于辅助用户阅读绘本。当用户在绘本中遇到不认识的单词和句子时,可以借助可以点读笔、平板电脑、机器人等等具有点读功能的电子设备进行学习。但是,点读笔只能识别出特定的绘本中的文字。部分平板电脑和机器人等电子设备只能识别出电子绘本中的文字。这样对用户的学习资源有所限制。对于实体绘本,部分的平板电脑和机器人虽然也可以识别用户点读的文字,但是根据用户手势识别出用户需要点读的文字的准确率不高。With the development of science and technology, more and more electronic devices are applied to education. For example, electronic devices with a reading function, such as a reading pen, a tablet computer, a robot, etc., can be used to assist users in reading picture books. When users encounter unfamiliar words and sentences in picture books, they can learn with the help of electronic devices such as reading pens, tablet computers, robots, etc. that have the function of reading. However, the reading pen can only recognize the text in a specific picture book. Some electronic devices such as tablet computers and robots can only recognize the text in the electronic picture book. This limits the user's learning resources. For physical picture books, although some tablet computers and robots can also recognize the text read by the user, the accuracy of recognizing the text that the user needs to read according to the user's gesture is not high.
由此,电子设备如何能准确地识别出任意实体课本中,用户需要点读的文字是亟待解决的问题。Therefore, how the electronic device can accurately identify the text that the user needs to read in any physical textbook is an urgent problem to be solved.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种识别点读文字的方法及电子设备,通过该识别点读文字的方法,电子设备可以更准确地识别出用户在书本中指定的文字。The present application provides a method and an electronic device for recognizing point-and-read characters. Through the method for recognizing point-and-read characters, the electronic device can more accurately recognize the characters specified by a user in a book.
第一方面,本申请提供了一种识别点读文字的方法,该方法可以包括:电子设备100响应于用户的第一操作,电子设备100开始采集图像,其中,电子设备100采集的图像包括用户的手指和书本的内容,用户的手指和书本位于电子设备的目标区域内;电子设备100根据采集的图像识别出的用户的手指的位置移动来识别用户的点读手势;电子设备100根据点读手势和点读手势的轨迹的位置,确定采集的图像中书本的内容中的目标文字;电子设备100播报已识别出的目标文字。In a first aspect, the present application provides a method for recognizing point-and-read characters, the method may include: the electronic device 100 starts to capture an image in response to a first operation of the user, wherein the image captured by the electronic device 100 includes the user The user's finger and the book are located in the target area of the electronic device; the electronic device 100 recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image; the electronic device 100 The position of the trajectory of the gesture and the reading gesture determines the target text in the content of the book in the captured image; the electronic device 100 broadcasts the recognized target text.
通过本申请第一方面提供的方法,电子设备可以结合用户的手势准确地从电子设备采集的图像中确定出目标文字。从而电子设备可以准确地识别出该目标文字。这样,可以提高用户体验。With the method provided in the first aspect of the present application, the electronic device can accurately determine the target text from the image collected by the electronic device in combination with the user's gesture. Therefore, the electronic device can accurately identify the target character. In this way, user experience can be improved.
在一种可能的实现方式中,点读手势包括如下一项或多项:点、划线、画圈。In a possible implementation manner, the point-to-read gesture includes one or more of the following: dots, dashes, and circles.
在一种可能的实现方式中,电子设备根据采集的图像识别出的用户的手指的位置移动来识别用户的点读手势,包括:电子设备通过采集的图像检测到用户手指后,若在第一预设时长内检测到采集的图像中手指的位置移动小于第一预设距离,电子设备将第一位置记录为点读手势的起点;电子设备在开始记录点读手势的起点后,若检测到第二预设时长内采集的图像中手指的第二位置移动小于第二预设距离,电子设备将第二位置记录为点读手势的终点; 电子设备根据点读手势的起点和点读手势的终点来识别点读手势。In a possible implementation manner, the electronic device recognizes the user's pointing gesture according to the position movement of the user's finger recognized by the collected image, including: after the electronic device detects the user's finger through the collected image, if the first It is detected that the position of the finger in the collected image moves less than the first preset distance within the preset time period, and the electronic device records the first position as the starting point of the pointing gesture; after the electronic device starts recording the starting point of the pointing gesture, if it detects The movement of the second position of the finger in the image collected within the second preset duration is less than the second preset distance, and the electronic device records the second position as the end point of the reading gesture; End point to recognize the read gesture.
可以理解的,电子设备100可以从点读手势的起点开始记录起点处手指的坐标,到点读手势的终点时结束记录终点处手指的坐标。这样,电子设备可以更准确地确定出用户点读手势轨迹的起始点和终点。It can be understood that the electronic device 100 may start recording the coordinates of the finger at the starting point from the starting point of the pointing gesture, and finish recording the coordinates of the finger at the end point when the pointing gesture ends. In this way, the electronic device can more accurately determine the starting point and the ending point of the user's reading gesture trajectory.
在一种可能的实现方式中,电子设备根据点读手势的起点和点读手势的终点来识别点读手势,包括:若电子设备记录的起点与起点至终点之间的任一手指的位置的距离小于第三预设距离,则电子设备识别出点读手势为点;若电子设备记录的起点至终点之间的手指的位置的坐标线性相关,则电子设备识别出点读手势为划线;若电子设备记录的起点与终点的距离小于第四预设距离,且起点与起点至终点之间的手指的位置的距离大于第五预设距离,电子设备识别出点读手势为画圈。In a possible implementation manner, the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, including: if the electronic device records the starting point and the position of any finger between the starting point and the ending point If the distance is less than the third preset distance, the electronic device recognizes that the reading gesture is a point; if the coordinates of the position of the finger between the starting point and the end point recorded by the electronic device are linearly correlated, the electronic device recognizes that the reading gesture is a dash; If the distance between the starting point and the ending point recorded by the electronic device is less than the fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than the fifth preset distance, the electronic device recognizes that the reading gesture is a circle.
这样,电子设备可以准确地确定出点读手势的具体类型。In this way, the electronic device can accurately determine the specific type of the point-to-read gesture.
在一种可能的实现方式中,电子设备根据点读手势和点读手势的轨迹的位置,确定采集的图像中书本的内容中的目标文字,包括:电子设备根据采集的图像确定书本的内容中文字区域的位置;电子设备根据点读手势和点读手势的轨迹的位置,以及文字区域的位置,确定书本的内容中的目标文字。In a possible implementation manner, the electronic device determines the target text in the content of the book in the collected image according to the position of the reading gesture and the trajectory of the reading gesture, including: the electronic device determines the content of the book according to the collected image Chinese The position of the word area; the electronic device determines the target text in the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area.
这样,结合图像中文字区域的位置以及用户的点读手势以及点读手势的轨迹,电子设备可以更准确地确定出目标文字,即用户在书本选定地需要识别并播报的文字。In this way, combined with the position of the text area in the image, the user's reading gesture and the trajectory of the reading gesture, the electronic device can more accurately determine the target text, that is, the text that the user needs to recognize and broadcast in the selected book.
在一种可能的实现方式中,电子设备根据点读手势和点读手势的轨迹的位置,以及文字区域的位置,确定书本的内容中的目标文字,包括:电子设备根据点读手势的轨迹以及第一书本中文字区域的位置,确定第一文字区域,第一文字区域中包含第一轨迹,第一轨迹为点读手势的轨迹中的大于或等于预设比例的一部分;电子设备根据第一轨迹和点读手势以及第一文字区域,确定书本的内容中的目标文字。In a possible implementation manner, the electronic device determines the target text in the content of the book according to the position of the reading gesture and the trajectory of the reading gesture, and the position of the text area, including: the electronic device according to the trajectory of the reading gesture and The position of the text area in the first book determines the first text area, the first text area includes a first trajectory, and the first trajectory is a part of the trajectory of the pointing gesture that is greater than or equal to a preset ratio; the electronic device is based on the first trajectory and The point reading gesture and the first text area are used to determine the target text in the content of the book.
当用户点读手势的轨迹的大半部分落在第一文字区域时,电子设备才确定用户需要识别第一文字区域中的文字。这样,当用户的手势轨迹一部分落在第一文字区域,一部分落在第二文字区域时,电子设备也可以正确地确定出目标文字。The electronic device determines that the user needs to recognize the text in the first text region only when most of the trajectory of the user's reading gesture falls within the first text region. In this way, when a part of the user's gesture track falls in the first text area and a part falls in the second text area, the electronic device can also correctly determine the target text.
在一种可能的实现方式中,电子设备根据第一轨迹和点读手势以及第一文字区域,确定书本的内容中的目标文字,包括:若点读手势为点,电子设备确定第一文字区域中与第一轨迹距离最小的文字为目标文字;若点读势为划线,电子设备确定第一文字区域中在第一轨迹上方的文字为目标文字;若点读手势为画圈,电子设备确定第一文字区域中在第一轨迹中的文字为目标文字。In a possible implementation manner, the electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture, and the first text area, including: if the pointing gesture is a point, the electronic device determines that the first text area is the same as the one in the first text area. The text with the smallest distance from the first track is the target text; if the reading potential is a dash, the electronic device determines that the text above the first track in the first text area is the target text; if the reading gesture is a circle, the electronic device determines the first text The text in the first track in the region is the target text.
这样,不同的手势,电子设备采用不同的策略确定目标文字,可以提升电子设备在采集图像中确定目标文字的准确率。从而,可以提升电子设备识别该目标文字的准确率。In this way, with different gestures, the electronic device adopts different strategies to determine the target text, which can improve the accuracy of the electronic device in determining the target text in the captured image. Therefore, the accuracy of identifying the target character by the electronic device can be improved.
在一种可能的实现方式中,第一预设距离等于第二预设距离,第二预设时长等于第一预设时长。这样,即电子设备确定用户不同时间采集的图像中手指静止的条件是一样的,可以减少电子设备的计算量。In a possible implementation manner, the first preset distance is equal to the second preset distance, and the second preset duration is equal to the first preset duration. In this way, that is, the electronic device determines that the conditions under which the finger is still in the images collected by the user at different times are the same, which can reduce the calculation amount of the electronic device.
第二方面,本申请提供一种识别点读文字的方法,该方法可以包括:响应于第一操作,电子设备采集第一书本的图像;当电子设备在第一时刻采集到的图像帧中手指的坐标与第一预设时长前采集到的图像帧中手指的坐标的距离小于第一预设距离时,电子设备开始记录图像帧中手指的坐标;当电子设备在第二时刻采集到的图像帧中手指的坐标与第二预设时长前采集到的图像帧中手指的坐标的距离小于第二预设距离时,电子设备停止记录图像帧中手指 的坐标,第二时刻为第一时刻之后的时刻;电子设备根据第一时刻至第二时刻记录的手指的坐标,确定第一书本中的待识别文字;电子设备识别并播报出待识别文字。In a second aspect, the present application provides a method for recognizing point-to-read characters, the method may include: in response to a first operation, the electronic device collects an image of the first book; When the distance between the coordinates of the finger and the coordinates of the finger in the image frame collected before the first preset time is less than the first preset distance, the electronic device starts to record the coordinates of the finger in the image frame; when the image collected by the electronic device at the second moment When the distance between the coordinates of the finger in the frame and the coordinates of the finger in the image frame collected before the second preset duration is less than the second preset distance, the electronic device stops recording the coordinates of the finger in the image frame, and the second time is after the first time time; the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the first time to the second time; the electronic device recognizes and broadcasts the text to be recognized.
其中,待识别文字可以称为目标文字,即为用户在书本中指定的需要识别的文字。The text to be recognized may be referred to as a target text, that is, the text specified by the user in the book to be recognized.
通过本申请第二方面提供的方法,电子设备手指两次静止时的坐标分别作为用户点读轨迹的起点和终点。这样,电子设备可以准确地确定出用户点读时的起始位置。从而,电子设备可以根据手指两次静止之间的轨迹坐标准确地确定出待识别文字。这样,可以提高电子设备的点读正确率,从而可以提升用户体验。并且,电子设备可以识别任意书本中的文字,不需要定制书本。With the method provided in the second aspect of the present application, the coordinates of the two times when the finger of the electronic device is stationary are used as the starting point and the ending point of the user's reading track, respectively. In this way, the electronic device can accurately determine the starting position when the user clicks to read. Therefore, the electronic device can accurately determine the characters to be recognized according to the track coordinates between two times of the finger resting. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience. In addition, the electronic device can recognize the characters in any book, and there is no need to customize the book.
在一种可能的实现方式中,电子设备在第一时刻采集到的图像帧中手指的坐标与第一预设时长前采集的图像帧中手指的坐标的距离小于第一预设距离时,电子设备开始记录图像帧中手指的坐标之前,该方法还包括:当电子设备在第三时刻采集的到图像帧中检测到手指时,开始获取所采集的图像帧中手指的坐标,第三时刻为第一时刻之前的时刻。In a possible implementation manner, when the distance between the coordinates of the finger in the image frame collected by the electronic device at the first moment and the coordinates of the finger in the image frame collected before the first preset duration is smaller than the first preset distance, the electronic Before the device starts to record the coordinates of the finger in the image frame, the method further includes: when the electronic device detects the finger in the image frame collected at the third moment, start to obtain the coordinates of the finger in the collected image frame, and the third moment is: Moments before the first moment.
电子设备可以将图像出现手指的时刻作为用户点读的开始,只有在用户点读时,才可以是采集图像帧中手指的坐标。这样,避免电子设备在未检测到用户手指,仍然执行点读的后续步骤。从而,可以减少电子设备的计算量,节约电子设备的功耗。The electronic device may take the moment when the finger appears in the image as the start of the user's reading, and only when the user is clicking, can the coordinates of the finger in the image frame be collected. In this way, it is avoided that the electronic device still performs subsequent steps of point reading when the user's finger is not detected. Therefore, the calculation amount of the electronic device can be reduced, and the power consumption of the electronic device can be saved.
在一种可能的实现方式中,电子设备根据第一时刻至第二时刻记录的手指的坐标,确定第一书本中的待识别文字,具体包括:电子设备根据第一时刻至第二时刻记录的手指的坐标确定手指的点读手势;电子设备根据第一书本的图像确定第一书本中文字区域的位置;电子设备根据第一时刻至第二时刻记录的手指的坐标、点读手势以及第一书本中文字区域的位置,确定第一书本中的待识别文字。In a possible implementation manner, the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the first time to the second time, which specifically includes: the electronic device records the text from the first time to the second time The coordinates of the finger determine the pointing gesture of the finger; the electronic device determines the position of the text area in the first book according to the image of the first book; the electronic device determines the coordinates of the finger, the pointing gesture and the first The position of the text area in the book determines the text to be recognized in the first book.
这样,结合图像中文字区域的位置以及用户的点读手势以及点读手势的轨迹,电子设备可以更准确地确定出目标文字,即用户在书本选定地需要识别并播报的文字。In this way, combined with the position of the text area in the image, the user's reading gesture and the trajectory of the reading gesture, the electronic device can more accurately determine the target text, that is, the text that the user needs to recognize and broadcast in the selected book.
在一种可能的实现方式中,电子设备根据第一时刻至第二时刻记录的手指的坐标确定手指的点读手势,具体包括:若电子设备确定第一时刻至第二时刻记录的手指的坐标中的任一个坐标与其他坐标的距离小于第三预设距离,则电子设备确定点读手势为第一手势;若电子设备确定第一时刻至第二时刻记录的手指的坐标线性相关,则电子设备确定点读手势为第二手势;若电子设备确定第一时刻至第二时刻记录的手指的坐标中第一时刻记录的手指的坐标与第二时刻记录的手指的坐标的距离小于第四预设距离,第四时刻记录的手指的坐标与第一时刻记录的手指的坐标的距离大于第五预设距离,则电子设备确定点读手势为第三手势,第四时刻为第一时刻与第二时刻之间的一个时刻。In a possible implementation manner, the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first moment to the second moment, specifically including: if the electronic device determines the coordinates of the finger recorded from the first moment to the second moment The distance between any one of the coordinates and the other coordinates is less than the third preset distance, then the electronic device determines that the pointing gesture is the first gesture; if the electronic device determines that the coordinates of the fingers recorded from the first moment to the second moment are linearly related, then the electronic device The device determines that the reading gesture is the second gesture; if the electronic device determines that the distance between the coordinates of the finger recorded at the first moment and the coordinates of the finger recorded at the second moment among the coordinates of the fingers recorded from the first moment to the second moment is smaller than the fourth The preset distance, the distance between the coordinates of the finger recorded at the fourth moment and the coordinates of the finger recorded at the first moment is greater than the fifth preset distance, then the electronic device determines that the pointing gesture is the third gesture, and the fourth moment is the distance between the first moment and the first moment. A moment between the second moment.
这样,电子设备可以准确地确定出点读手势的具体类型。In this way, the electronic device can accurately determine the specific type of the point-to-read gesture.
在一种可能的实现方式中,电子设备根据第二时刻至第三时刻记录的手指的坐标、点读手势以及第一书本中文字区域的位置,确定第一书本中的待识别文字,具体包括:电子设备将第一时刻至第二时刻记录的手指的坐标按照记录顺序连接得到第一手指轨迹;电子设备根据第一手指轨迹以及第一书本中文字区域的位置,确定第一文字区域,第一文字区域中包含第二手指轨迹,第二手指轨迹为第一手指轨迹中的大于或等于预设比例的一部分;电子设备根据第二手指轨迹和手指手势以及第一文字区域,确定待识别文字。In a possible implementation manner, the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the second time to the third time, the pointing gesture, and the position of the text area in the first book, which specifically includes : The electronic device connects the coordinates of the fingers recorded from the first moment to the second moment to obtain the first finger trajectory according to the recording sequence; the electronic device determines the first text area according to the first finger trajectory and the position of the text area in the first book. The area includes a second finger trajectory, and the second finger trajectory is a part of the first finger trajectory that is greater than or equal to a preset ratio; the electronic device determines the text to be recognized according to the second finger trajectory, the finger gesture, and the first text area.
当用户点读手势的轨迹的大半部分落在第一文字区域时,电子设备才确定用户需要识别第一文字区域中的文字。这样,当用户的手势轨迹一部分落在第一文字区域,一部分落在第 二文字区域时,电子设备也可以正确地确定出目标文字。The electronic device determines that the user needs to recognize the text in the first text region only when most of the trajectory of the user's reading gesture falls within the first text region. In this way, when a part of the user's gesture track falls in the first text area and a part falls in the second text area, the electronic device can also correctly determine the target text.
在一种可能的实现方式中,电子设备根据第二手指轨迹和手指手势以及第一文字区域,确定待识别文字,具体包括:若手指手势为第一手势,电子设备确定第一文字区域中与第二手指轨迹距离最小的文字为待识别文字;若手指手势为第二手势,电子设备确定第一文字区域中在第二手指轨迹上方的文字为待识别文字;若手指手势为第三手势,电子设备确定第一文字区域中在第二手指轨迹中的文字为待识别文字。In a possible implementation manner, the electronic device determines the text to be recognized according to the trajectory of the second finger, the gesture of the finger, and the first text area, which specifically includes: if the gesture of the finger is the first gesture, the electronic device determines the difference between the first text area and the second text area. The text with the smallest finger track distance is the text to be recognized; if the finger gesture is the second gesture, the electronic device determines that the text above the second finger track in the first text area is the text to be recognized; if the finger gesture is the third gesture, the electronic device determines that the text above the second finger track is the text to be recognized. It is determined that the text in the track of the second finger in the first text area is the text to be recognized.
其中,第一手势为点,第二手势为划线,第三手势为画圈。The first gesture is a dot, the second gesture is a line, and the third gesture is a circle.
这样,不同的手势,电子设备采用不同的策略确定目标文字,可以提升电子设备在采集图像中确定目标文字的准确率。从而,可以提升电子设备识别该目标文字的准确率。In this way, with different gestures, the electronic device adopts different strategies to determine the target text, which can improve the accuracy of the electronic device in determining the target text in the captured image. Therefore, the accuracy of identifying the target character by the electronic device can be improved.
在一种可能的实现方式中,电子设备根据第一时刻至第二时刻记录的手指的坐标确定手指的点读手势,具体包括:当电子设备在第五时刻采集到的图像中未检测到手指时,电子设备根据第一时刻至第二时刻记录的手指的坐标确定手指的点读手势。In a possible implementation manner, the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first time to the second time, which specifically includes: when the electronic device does not detect the finger in the image collected at the fifth time , the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first moment to the second moment.
这样,电子设备可以确定出用户点读的结束时刻。当电子设备确定出用户点读结束时,电子设备可以停止执行点读步骤(例如,确定图像中手指的坐标),直至用户再次开始点读时电子设备再次开始执行点读步骤。这样,可以节约电子的功耗。In this way, the electronic device can determine the end time of the user's reading. When the electronic device determines that the user's reading is over, the electronic device may stop performing the pointing step (eg, determine the coordinates of the finger in the image) until the user starts pointing again. In this way, the power consumption of the electrons can be saved.
在一种可能的实现方式中,第一预设时长和第二预设时长相等,第一预设距离和第二预设距离相等。这样,即电子设备确定用户不同时间采集的图像中手指静止的条件是一样的,可以减少电子设备的计算量。In a possible implementation manner, the first preset duration and the second preset duration are equal, and the first preset distance and the second preset distance are equal. In this way, that is, the electronic device determines that the conditions under which the finger is still in the images collected by the user at different times are the same, which can reduce the calculation amount of the electronic device.
第三方面,提供了一种电子设备,该电子设备包括:一个或多个处理器和存储器;存储器与一个或多个处理器耦合,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,一个或多个处理器调用计算机指令以使得电子设备执行第一方面任一种可能的实现方式和第二方面任一种可能的实现方式中的识别点读文字的方法。In a third aspect, an electronic device is provided, the electronic device comprising: one or more processors and a memory; the memory is coupled to the one or more processors, the memory is used to store computer program code, the computer program code includes computer instructions, The one or more processors invoke computer instructions to cause the electronic device to execute the method of recognizing point-and-click text in any possible implementation manner of the first aspect and any possible implementation manner of the second aspect.
第四方面,本申请实施例提供了一种计算机存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述任一方面任一项可能的实现方式中的识别点读文字的方法。In a fourth aspect, the embodiments of the present application provide a computer storage medium, including computer instructions, when the computer instructions are run on an electronic device, the electronic device is made to perform the identification point reading in any of the possible implementations of any of the above aspects text method.
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行上述任一方面任一项可能的实现方式中的识别点读文字的方法。In a fifth aspect, an embodiment of the present application provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to execute the method for recognizing point-and-read characters in any of the possible implementations of any one of the above aspects .
附图说明Description of drawings
图1A是本申请实施例提供的可用于点读的机器人的一种应用场景示意图;1A is a schematic diagram of an application scenario of a robot that can be used for point reading provided by an embodiment of the present application;
图1B是本申请实施例提供的可用于点读的机器人的另一种应用场景示意图;1B is a schematic diagram of another application scenario of the robot that can be used for point reading provided by an embodiment of the present application;
图1C是本申请实施例提供的可用于点读的机器人的又一种应用场景示意图;1C is a schematic diagram of another application scenario of the robot that can be used for point reading provided by an embodiment of the present application;
图2是本申请实施例提供的一种识别点读文字的方法流程示意图;2 is a schematic flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application;
图3A-图3D是本申请实施例提供的电子设备100的一组用户界面示意图;3A-3D are schematic diagrams of a set of user interfaces of the electronic device 100 provided by the embodiments of the present application;
图3E是本申请实施例提供的电子设备100采集绘本图像示意图;FIG. 3E is a schematic diagram of collecting a picture book image by the electronic device 100 provided by the embodiment of the present application;
图3F是本申请实施例提供的绘本的内容区域划分示意图;3F is a schematic diagram of content area division of a picture book provided by an embodiment of the present application;
图4A是本申请实施例提供的电子设备100采集用户点读时的一帧图像帧示意图;FIG. 4A is a schematic diagram of an image frame when the electronic device 100 according to the embodiment of the present application collects a frame of images when a user clicks and reads;
图4B是本申请实施例提供的电子设备100对采集的图像帧进行手指检测示意图;FIG. 4B is a schematic diagram of finger detection performed on the collected image frame by the electronic device 100 provided by the embodiment of the present application;
图5是本申请实施例提供的电子设备100采集用户点读时的一组图像帧示意图;FIG. 5 is a schematic diagram of a group of image frames when the electronic device 100 according to an embodiment of the present application collects a user's point reading;
图6是本申请实施例提供的电子用户点读时的轨迹示意图;6 is a schematic diagram of a trajectory when an electronic user clicks and reads provided by an embodiment of the present application;
图7A-图7C是本申请实施例提供的用户点读时不同手指轨迹对应的极坐标图的示意图;7A-7C are schematic diagrams of polar coordinate diagrams corresponding to different finger trajectories when a user points and reads according to an embodiment of the present application;
图8是本申请实施例提供的绘本版面分析结果与用户手指轨迹结合的示意图;FIG. 8 is a schematic diagram of the combination of the analysis result of the picture book layout and the trajectory of the user's finger provided by an embodiment of the present application;
图9A-图9B是本申请实施例提供的一组文字检测示意图;9A-9B are schematic diagrams of a group of text detection provided by an embodiment of the present application;
图10是本申请实施例提供的一种识别点读文字的方法流程示意图;10 is a schematic flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application;
图11是本申请实施例提供的电子设备的结构示意图;11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图12是本申请实施例提供的电子设备的软件架构示意图;12 is a schematic diagram of a software architecture of an electronic device provided by an embodiment of the present application;
图13是本申请实施例提供的电子设备的另一种结构示意图。FIG. 13 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
本申请以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括复数表达形式,除非其上下文中明确地有相反指示。还应当理解,本申请中使用的术语“和/或”是指并包含一个或多个所列出项目的任何或所有可能组合。The terms used in the following embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to be used as limitations of the present application. As used in the specification of this application and the appended claims, the singular expressions "a," "an," "the," "above," "the," and "the" are intended to also Plural expressions are included unless the context clearly dictates otherwise. It will also be understood that, as used in this application, the term "and/or" refers to and includes any and all possible combinations of one or more of the listed items.
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as implying or implying relative importance or implying the number of indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise specified, the "multiple" The meaning is two or more.
由于本申请实施例涉及一种识别点读文字的方法的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及概念进行介绍。Since the embodiments of the present application relate to the application of a method for recognizing point-and-read characters, for ease of understanding, related terms and concepts involved in the embodiments of the present application are first introduced below.
1、点读1. Click to read
在本申请实施例中,“点读”可以指,电子设备可以识别并语音播放出用户在绘本中所指定的文字。在本申请实施例中,用户在绘本中指定文字的形式可以有多种。例如,用户手指点在绘本中所需识别文字的下方,或用户手指在绘本中所需识别文字的下方划线,以及用户手指画圈来圈定所需识别文字等等。In this embodiment of the present application, "point reading" may mean that the electronic device can recognize and play the text specified by the user in the picture book by voice. In this embodiment of the present application, there may be various forms in which the user specifies the text in the picture book. For example, the user's finger points below the text to be recognized in the picture book, or the user's finger draws a line below the text to be recognized in the picture book, and the user's finger draws a circle to delineate the text to be recognized, and so on.
示例性地,如图1A所示,用户可以用手指点在绘本102中所需识别的文字(即“猫”)的下方。机器人100可以包括摄像头103和摄像头104。绘本102放置在摄像头103和摄像头104的拍照视野区域101内。机器人100的摄像头103和/或摄像头104可拍摄到用户在绘本102中用户手指点在文字“猫”的下方。机器人100可识别出文字“猫”,并播放出该文字“猫”。Exemplarily, as shown in FIG. 1A , the user may use a finger to point under the text (ie, “cat”) to be recognized in the picture book 102 . The robot 100 may include a camera 103 and a camera 104 . The picture book 102 is placed in the photographing field of view area 101 of the camera 103 and the camera 104 . The camera 103 and/or the camera 104 of the robot 100 may capture the image of the user in the picture book 102 where the user's finger is below the word "cat". The robot 100 can recognize the character "cat" and play the character "cat".
示例性地,如图1B所示,用户可以用手指在绘本102中所需识别的文字(即“猫与老鼠”)的下方划线(或划过)。机器人100可以识别出文字“猫与老鼠”,并播放出该文字“猫与老鼠”。Exemplarily, as shown in FIG. 1B , the user can use a finger to draw a line (or cross) under the text (ie, “cat and mouse”) to be recognized in the picture book 102 . The robot 100 can recognize the text "cat and mouse" and play the text "cat and mouse".
示例性地,如图1C所示,用户可以在绘本102中用手指画圈来圈定所需识别文字(即“猫与老鼠”)。机器人100可以识别出文字“猫与老鼠”,并播放出该文字“猫与老鼠”。Exemplarily, as shown in FIG. 1C , the user may draw a circle with a finger in the picture book 102 to delineate the desired identification text (ie, "cat and mouse"). The robot 100 can recognize the text "cat and mouse" and play the text "cat and mouse".
对应地,在本申请实施例中,用户在点读时的手势可以分为:“点”、“划线”、“画圈”等等。可以理解的是,本申请实施例中对手势的分类及名称不限定。Correspondingly, in the embodiment of the present application, the gestures of the user when reading can be divided into: "dot", "line", "circle" and so on. It can be understood that the categories and names of gestures in the embodiments of the present application are not limited.
可以理解的是,用户可以用其他物体(例如,笔)代替手指在绘制中指定所需识别的文字,本申请实施例对此不作限定。It can be understood that the user may use other objects (for example, a pen) instead of a finger to specify the text to be recognized during drawing, which is not limited in this embodiment of the present application.
在本申请实施例中,电子设备可以是如图1A、图1B以及图1C中示出的机器人100,也可以是平板电脑、智能手机等具备摄像头的终端设备等等,电子设备可以是摄像头和具备文 字识别功能的终端组成的点读装置,本申请实施例对此不作限定。In this embodiment of the present application, the electronic device may be the robot 100 shown in FIG. 1A , FIG. 1B and FIG. 1C , or may be a terminal device with a camera such as a tablet computer, a smart phone, or the like, and the electronic device may be a camera and a camera. A point reading device composed of a terminal with a character recognition function is not limited in this embodiment of the present application.
本申请实施例提供一种识别点读文字的方法,该方法可以包括:电子设备100可以连续采集第一绘本的图像;当电子设备采集到的图像中出现手指时,电子设备确定用户开始点读;电子设备对第一绘本的图像进行分析得到文字分析结果;电子设备根据采集到的多帧图像中手指的坐标确定用户点读的轨迹和点读的手势,电子设备根据点读的轨迹和点读的手势以及文字分析结果确定包含待识别文字的文字区域Q;电子设备识别并语音播报出文字区域Q中的待识别文字。An embodiment of the present application provides a method for recognizing point-to-read text, the method may include: the electronic device 100 can continuously collect images of the first picture book; when a finger appears in the image collected by the electronic device, the electronic device determines that the user starts point-to-read The electronic device analyzes the image of the first picture book to obtain a text analysis result; the electronic device determines the trajectory and gesture of the user according to the coordinates of the fingers in the collected multi-frame images, and the electronic device The reading gesture and the text analysis result determine the text area Q containing the text to be recognized; the electronic device recognizes and voice broadcasts the text to be recognized in the text area Q.
下面结合附图详细介绍本申请提供的一种识别点读文字的方法。图2示例性地示出了本申请实施例提供的一种识别点读文字的方法流程图。如图2所示,本申请提供的一种识别点读文字的方法可以包括如下步骤:A method for recognizing point-read characters provided by the present application will be described in detail below with reference to the accompanying drawings. FIG. 2 exemplarily shows a flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application. As shown in Figure 2, a method for recognizing point-and-read characters provided by the present application may include the following steps:
S201、响应于用户的第一操作,电子设备100的摄像头开始采集书本B的图像。S201. In response to the first operation of the user, the camera of the electronic device 100 starts to capture the image of the book B.
电子设备100可以接收到用户的第一操作。用户的第一操作可以是打开电子设备100,或者是打开电子设备100中的点读APP。电子设备100可以响应于用户的第一操作,电子设备100的摄像头开始采集图像。电子设备100持续采集多帧图像。电子设备100可以如图3A-图3E中示出的机器人100,书本B可以如图3E中示出的书本102。本申请实施例对书本B不作限定。即,本申请实施例提供的方法中,电子设备100可以识别用户点读的任意书本中的文字。The electronic device 100 may receive the user's first operation. The first operation of the user may be to turn on the electronic device 100 or to turn on the reading APP in the electronic device 100 . The electronic device 100 may start capturing images by the camera of the electronic device 100 in response to the user's first operation. The electronic device 100 continuously collects multiple frames of images. The electronic device 100 may be the robot 100 shown in FIGS. 3A-3E , and the book B may be the book 102 shown in FIG. 3E . The embodiment of the present application does not limit the book B. That is, in the method provided by the embodiment of the present application, the electronic device 100 can recognize the text in any book that the user clicks to read.
示例性地,如图3A所示,电子设备100可以是图3A中示出的机器人100。电子设备100中可以包括摄像头103和摄像头104以及显示屏105。其中,显示屏105中可以显示有点读APP的图标106。Illustratively, as shown in FIG. 3A , the electronic device 100 may be the robot 100 shown in FIG. 3A . The electronic device 100 may include a camera 103 and a camera 104 and a display screen 105 . Among them, the icon 106 of the Dot-Read APP can be displayed on the display screen 105 .
如图3B所示,用户的第一操作可以是点击“点读”APP的图标106。As shown in FIG. 3B , the user's first operation may be to click on the icon 106 of the "click to read" APP.
如图3C所示,响应于用户点击“点读”APP的图标106的操作,机器人100的摄像头103和摄像头104开始采集图像。并且,机器人100的显示屏105中可以显示书本显示区域1051、提示文字1052。其中,书本显示区域1051可以展示摄像头103和摄像头104采集的图像。提示文字1052可以提示用户将需要学习的书本放在摄像头103和摄像头104的拍摄区域内。示例性地,提示文字1052的内容可以是“请将书本放入拍摄区域内”,此处对提示文字1052的具体内容不作限定。提示文字1052可以显示在书本显示区域1051中,也可以显示在书本显示区域1051外,此处对提示文字1052的具体位置不作限定。As shown in FIG. 3C , in response to the user's operation of clicking the icon 106 of the "click to read" APP, the camera 103 and the camera 104 of the robot 100 start to capture images. In addition, the display screen 105 of the robot 100 can display the book display area 1051 and the prompt text 1052 . The book display area 1051 can display the images collected by the camera 103 and the camera 104 . The prompt text 1052 may prompt the user to place the book to be learned in the shooting area of the camera 103 and the camera 104 . Exemplarily, the content of the prompt text 1052 may be "please put the book in the shooting area", and the specific content of the prompt text 1052 is not limited here. The prompt text 1052 may be displayed in the book display area 1051, or may be displayed outside the book display area 1051, and the specific position of the prompt text 1052 is not limited here.
可选地,如图3D所示,机器人100的显示屏105中还可以包括控件1053。控件1053用于触发机器人100对采集到的图像进行版面分析和手指检测。Optionally, as shown in FIG. 3D , the display screen 105 of the robot 100 may further include a control 1053 . The control 1053 is used to trigger the robot 100 to perform layout analysis and finger detection on the collected images.
可选地,如图3E所示,当用户将书本102放在摄像头103和摄像头104的拍摄视野101中时,显示屏105的书本显示区域1051中可以显示书本102的图像1021。用户可以根据显示区域1051中的图像1021调整书本的位置。例如,若用户看到显示区域1051中只显示书本102的右半部分,那么用户可以将书本向右移动,以使得书本102移动到摄像头103和摄像头104的拍摄视野101内。当用户在显示区域1051中看到书本102完整的图像后,用户可以点击控件1053。Optionally, as shown in FIG. 3E , when the user places the book 102 in the shooting field of view 101 of the camera 103 and the camera 104 , the image 1021 of the book 102 may be displayed in the book display area 1051 of the display screen 105 . The user can adjust the position of the book according to the image 1021 in the display area 1051 . For example, if the user sees that only the right half of the book 102 is displayed in the display area 1051 , the user can move the book to the right so that the book 102 moves into the shooting field 101 of the camera 103 and the camera 104 . After the user sees the complete image of the book 102 in the display area 1051, the user can click on the control 1053.
可选地,电子设备可以在显示屏105中展示点读文字可用手势。例如,用户可以点在需要识别的文字下方,也可以在需要识别的文字下方划线,也可以画圈圈定需要识别的文字。这样可以提示用户使用电子设备100可识别的手势进行点读。Optionally, the electronic device may display a gesture available for reading text on the display screen 105 . For example, the user may click below the text to be recognized, may also draw a line below the text to be recognized, or may draw a circle to delineate the text to be recognized. In this way, the user can be prompted to use a gesture recognizable by the electronic device 100 to read.
S202、电子设备100对书本B的图像进行版面分析,确定书本B中的书本内容的类型以及书本内容对应的位置。S202 , the electronic device 100 performs layout analysis on the image of the book B, and determines the type of the book content in the book B and the position corresponding to the book content.
电子设备100可以对摄像头采集到书本B的一帧图像进行版面分析,得到书本B的当前页面中文字区域在书本页面中的位置。电子设备100中可以存储有版面分析模型,电子设备100将一帧图像输入该版面分析模型,该模型可以输出该图像中包含的书本内容的类型(文字、绘图、表格等等)以及该书本内容对应的位置。电子设备100通过该版面分析模型可以确定出书本B的图像中可以包括文字区域、绘图区域、表格区域中的一项或多项。The electronic device 100 may perform layout analysis on a frame of image of the book B collected by the camera, and obtain the position of the text area in the current page of the book B on the book page. The electronic device 100 may store a layout analysis model, the electronic device 100 inputs a frame of image into the layout analysis model, and the model can output the type of book content (text, drawing, table, etc.) contained in the image and the book content corresponding location. Through the layout analysis model, the electronic device 100 can determine that the image of the book B may include one or more of a text area, a drawing area, and a table area.
在本申请实施例中,文字区域可以指一帧图像中仅包含文字的区域。绘图区域可以指一帧图像中包含绘图的区域。表格区域可以指一帧图像中包含表格的区域。可以理解的是,绘图区域和表格区域中也可以包含文字。电子设备100采集的书本B的一帧图像中可以包含一个或多个文字区域、和/或绘图区域、和/或表格区域。In this embodiment of the present application, the text area may refer to an area that only contains text in a frame of image. The drawing area can refer to the area of an image that contains drawing. The table area may refer to an area of a frame image that contains a table. It will be appreciated that the plot area and table area may also contain text. One frame of image of the book B collected by the electronic device 100 may include one or more text regions, and/or drawing regions, and/or table regions.
示例性地,如图3F所示,书本102的图像1021中可以包括区域A和区域B以及区域C。区域A和区域C为绘图区域,区域B为文字区域。区域A可以是顶点为A1(xa1,ya1,za1)、A2(xa2,ya2,za2)、A3(xa3,ya3,za3)、A4(xa4,ya4,za4)的矩形区域。区域B可以是顶点为B1(xb1,yb1,zb1)、B2(xb2,yb2,zb2)、B3(xb3,yb3,zb3)、B4(xb4,yb4,zb4)的矩形区域。区域C可以是顶点为C1(xc1,yc1,zc1)、C2(xc2,yc2,zc2)、C3(xc3,yc3,zc3)、C4(xc4,yc4,zc4)的矩形区域。Exemplarily, as shown in FIG. 3F , the image 1021 of the book 102 may include area A, area B, and area C. Area A and Area C are drawing areas, and area B is text area. The area A may be a rectangular area with vertices A1(xa1, ya1, za1), A2(xa2, ya2, za2), A3(xa3, ya3, za3), A4(xa4, ya4, za4). Region B may be a rectangular region with vertices B1 (xb1, yb1, zb1), B2 (xb2, yb2, zb2), B3 (xb3, yb3, zb3), B4 (xb4, yb4, zb4). Region C may be a rectangular region with vertices C1(xc1, yc1, zc1), C2(xc2, yc2, zc2), C3(xc3, yc3, zc3), C4(xc4, yc4, zc4).
可以理解的是,电子设备100对书本B的图像进行版面分析得到的文字区域、绘图区域的形状不限于矩形,也可以是其他形状,例如多边形、圆形等等。It can be understood that the shape of the text area and the drawing area obtained by the electronic device 100 by performing layout analysis on the image of the book B is not limited to a rectangle, and may also be other shapes, such as polygons, circles, and the like.
本申请实施例,电子设备100可以将电子设备100的拍摄视野的左上角顶点作为原点建立坐标系。例如,图3E中示出的坐标系XYZ,坐标系XYZ的原点O为机器人拍摄视野101的左上角顶点。In this embodiment of the present application, the electronic device 100 may use the upper left vertex of the photographing field of view of the electronic device 100 as the origin to establish a coordinate system. For example, in the coordinate system XYZ shown in FIG. 3E , the origin O of the coordinate system XYZ is the upper left vertex of the camera field 101 of the robot.
电子设备100可以对图3F中示出的书本102的图像1021输入到版面分析模型进行版面分析,得到该图像1021中包含的内容的类型以及该内容的位置可以如下表1所示。The electronic device 100 may input the image 1021 of the book 102 shown in FIG. 3F into the layout analysis model for layout analysis, and obtain the type of content contained in the image 1021 and the location of the content as shown in Table 1 below.
表1Table 1
Figure PCTCN2022081042-appb-000001
Figure PCTCN2022081042-appb-000001
如表1所示,电子设备100对图像1021进行版面分析,可以确定出图像1021中包含的绘图和文字,以及绘图和文字的位置。图像1021中的区域A和区域C中包含的内容为绘图,区域B中包含的内容为文字。区域A的四个顶点的坐标分别为(xa1,ya1,za1)、(xa2,ya2,za2)、(xa3,ya3,za3)、(xa4,ya4,za4)。表1中的区域A可以是图3F中示出的区域A。区域B的四个顶点的坐标分别为(xb1,yb1,zb1)、(xb2,yb2,zb2)、(xb3,yb3,zb3)、(xb4,yb4,zb4)。表1中的区域A可以是图3F中示出的区域B。区域C的四个顶点的坐标分别为(xc1,yc1,zc1)、(xc2,yc2,zc2)、(xc3,yc3,zc3)、(xc4,yc4,zc4)。表1中的区域C可以是图3F中示出的区域C。As shown in Table 1, the electronic device 100 performs layout analysis on the image 1021, and can determine the drawings and characters contained in the image 1021, as well as the positions of the drawings and characters. The contents included in the area A and the area C in the image 1021 are drawings, and the contents included in the area B are characters. The coordinates of the four vertices of the area A are (xa1, ya1, za1), (xa2, ya2, za2), (xa3, ya3, za3), (xa4, ya4, za4), respectively. The area A in Table 1 may be the area A shown in FIG. 3F . The coordinates of the four vertices of the region B are (xb1, yb1, zb1), (xb2, yb2, zb2), (xb3, yb3, zb3), (xb4, yb4, zb4) respectively. Region A in Table 1 may be Region B shown in FIG. 3F . The coordinates of the four vertices of the region C are (xc1, yc1, zc1), (xc2, yc2, zc2), (xc3, yc3, zc3), (xc4, yc4, zc4) respectively. Region C in Table 1 may be Region C shown in FIG. 3F .
进一步地,若书本的图像中包括绘图区域,或表格区域,绘图区域中还包含有文字,电子设备100在进行版面分析时,还可以检测出绘图区域中的文字和文字所在的位置。Further, if the image of the book includes a drawing area or a table area, and the drawing area also includes text, the electronic device 100 can also detect the text and the position of the text in the drawing area when performing layout analysis.
进一步地,在一种可能的实现方式中,电子设备100可以检测出文字区域中的文字的倾斜角度。Further, in a possible implementation manner, the electronic device 100 may detect the inclination angle of the text in the text area.
S203、电子设备100在时刻T10采集到的图像中检测到手指,开始确定所采集的图像中手指的坐标。S203, the electronic device 100 detects a finger in the image collected at time T10, and starts to determine the coordinates of the finger in the collected image.
当用户开始用手指点读书本中的文字时,电子设备100采集的图像中可以包含用户的手指。电子设备100可以在采集到的图像中检测到手指。电子设备100中可以保存有手指检测模型,电子设备100将采集到的图像输入到手指检测模型中,该手指检测模型可以确定出输入的图像中包含手指或未包含手指。When the user starts to point to the text in the book with his finger, the image captured by the electronic device 100 may include the user's finger. The electronic device 100 can detect the finger in the captured image. The electronic device 100 may store a finger detection model, and the electronic device 100 inputs the collected image into the finger detection model, and the finger detection model can determine whether the input image contains a finger or does not contain a finger.
在一种可能的实现方式中,电子设备100输入图4A中的图像401到手指检测模型中,手指检测模型可以输出如图4B示出的图像402。手指检测模型可以用手指检测框4022标注出检测到的手指。手指检测模型还可以标记出指尖4021。In a possible implementation manner, the electronic device 100 inputs the image 401 in FIG. 4A into the finger detection model, and the finger detection model can output the image 402 as shown in FIG. 4B . The finger detection model can label the detected fingers with the finger detection box 4022 . The finger detection model can also label fingertips 4021.
可以理解的是,电子设备100持续的采集多帧图像,电子设备100可以将采集到的每一帧图像依次输入到手指检测模型中进行手指检测,若检测出该帧图像中包含手指,则电子设备可以开始确定出该帧图像中手指的坐标。电子设备100可以将指尖的坐标作为手指的坐标。若电子设备100在一帧图像中未检测到手指,那么电子设备100可以检测下一帧图像(或者预设时间间隔后采集的图像帧)中是否包含手指。直至电子设备100在时刻T10采集到的图像中检测到手指,开始确定所采集的图像中手指的坐标。It can be understood that the electronic device 100 continuously collects multiple frames of images, and the electronic device 100 can sequentially input each frame of images collected into the finger detection model for finger detection. The device can begin to determine the coordinates of the finger in the frame of image. The electronic device 100 may take the coordinates of the fingertip as the coordinates of the finger. If the electronic device 100 does not detect a finger in one frame of image, the electronic device 100 can detect whether the next frame of image (or an image frame collected after a preset time interval) contains a finger. Until the electronic device 100 detects the finger in the image collected at time T10, it starts to determine the coordinates of the finger in the collected image.
示例性地,图5中的(a)图至(i)图示例性地电子设备100在t1时刻-t9时刻拍摄的图像帧。该t1时刻-t9时刻拍摄的图像帧可以展示用户点读待识别文字“猫”的完整过程。首先,用户将手指落下,并点在绘本中的文字“猫”处,然后移动后将手指离开绘本。图5中的(a)图为电子设备100在t1时刻拍摄的图像帧。(b)图为电子设备在t2时刻拍摄的图像帧。电子设备100在t1时刻和t2时刻拍摄的图像帧中未检测到手指。(c)图为电子设备100在t3时刻拍摄的图像帧,电子设备100可以在t3时刻的图像帧中检测到手指。电子设备100开始获取图像帧中手指的坐标。(d)图为电子设备在t4时刻拍摄的图像帧。电子设备100可以获取t4时刻拍摄的图像帧中手指的坐标。(e)图为电子设备在t5时刻拍摄的图像帧。电子设备100可以获取t5时刻拍摄的图像帧中手指的坐标。(f)图为电子设备在t6时刻拍摄的图像帧。电子设备100可以获取t6时刻拍摄的图像帧中手指的坐标。(g)图为电子设备在t7时刻拍摄的图像帧。电子设备100可以获取t7时刻拍摄的图像帧中手指的坐标。(h)图为电子设备在t8时刻拍摄的图像帧。电子设备100可以获取t8时刻拍摄的图像帧中手指的坐标。(i)图为电子设备在t9时刻拍摄的图像帧。电子设备100在t9时刻拍摄的图像帧中未检测到手指。Illustratively, (a) to (i) of FIG. 5 are exemplary image frames captured by the electronic device 100 at time t1-t9. The image frames captured from the time t1 to the time t9 can show the complete process of the user clicking and reading the character "cat" to be recognized. First, the user drops his finger and clicks on the text "cat" in the picture book, then moves his finger away from the picture book. (a) of FIG. 5 is an image frame captured by the electronic device 100 at time t1. (b) The picture shows the image frame captured by the electronic device at time t2. No finger is detected in the image frames captured by the electronic device 100 at time t1 and time t2. (c) The figure is an image frame captured by the electronic device 100 at time t3, and the electronic device 100 can detect a finger in the image frame at time t3. The electronic device 100 starts to acquire the coordinates of the finger in the image frame. (d) The picture shows the image frame captured by the electronic device at time t4. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t4. (e) The picture shows the image frame captured by the electronic device at time t5. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t5. (f) The picture shows the image frame captured by the electronic device at time t6. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t6. (g) The picture shows the image frame captured by the electronic device at time t7. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t7. (h) The picture shows the image frame captured by the electronic device at time t8. The electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t8. (i) The picture shows the image frame captured by the electronic device at time t9. No finger is detected in the image frame captured by the electronic device 100 at time t9.
如图5所示,电子设备100在t1时刻获取到图像帧。电子设备100对t1时刻拍摄的图像帧进行手指检测,未检测到手指。电子设备100不执行步骤S203后的步骤。电子设备100继续对下一帧图像进行手指检测。例如,电子设备100可以对t2时刻拍摄的图像进行手指检测。若未检测到手指,电子设备100对t3时刻拍摄的图像进行手指检测。电子设备100在t3时刻拍摄的图像帧中检测到手指。电子设备100可以确定出t3时刻拍摄的图像帧中手指的坐标。具体,可以是手指指尖的坐标。时刻T10可以是图5中示出的t3时刻。As shown in FIG. 5 , the electronic device 100 acquires an image frame at time t1. The electronic device 100 performs finger detection on the image frame captured at time t1, and no finger is detected. The electronic device 100 does not execute the steps after step S203. The electronic device 100 continues to perform finger detection on the next frame of image. For example, the electronic device 100 may perform finger detection on the image captured at time t2. If no finger is detected, the electronic device 100 performs finger detection on the image captured at time t3. The electronic device 100 detects a finger in the image frame captured at time t3. The electronic device 100 can determine the coordinates of the finger in the image frame captured at time t3. Specifically, it may be the coordinates of the fingertip. Time T10 may be time t3 shown in FIG. 5 .
这里,图5中示出的t1时刻拍摄的图像帧至t9时刻拍摄的图像帧可以是电子设备100采集的连续的图像帧。t1时刻和t2时刻、t2时刻和t3时刻、t3时刻和t4时刻、t4时刻和t5 时刻、t5时刻和t6时刻、t6时刻和t7、t7时刻和t8时刻以及t8时刻和t9时刻之间的时间间隔与电子设备100采集图像的帧率有关。Here, the image frames captured at time t1 to the image frames captured at time t9 shown in FIG. 5 may be consecutive image frames captured by the electronic device 100 . Time t1 and t2, t2 and t3, t3 and t4, t4 and t5, t5 and t6, t6 and t7, t7 and t8, and time between t8 and t9 The interval is related to the frame rate at which the electronic device 100 captures images.
可选地,图5中示出的t1时刻拍摄的图像帧至t9时刻拍摄的图像帧也可以是电子设备100在预设时间间隔采集的图像帧。即t1时刻与t2时刻之间的时间间隔可以为预设时间间隔,t2时刻与t3时刻之间的时间间隔可以为预设时间间隔,t3时刻与t4时刻之间的时间间隔可以为预设时间间隔,t4时刻与t5时刻之间的时间间隔可以为预设时间间隔,t5时刻与t6时刻之间的时间间隔可以为预设时间间隔,t6时刻与t7时刻之间的时间间隔可以为预设时间间隔,t7时刻与t8时刻之间的时间间隔可以为预设时间间隔,t8时刻与t9时刻之间的时间间隔可以为预设时间间隔。Optionally, the image frames captured at time t1 to the image frames captured at time t9 shown in FIG. 5 may also be image frames captured by the electronic device 100 at preset time intervals. That is, the time interval between time t1 and time t2 may be a preset time interval, the time interval between time t2 and time t3 may be a preset time interval, and the time interval between time t3 and time t4 may be a preset time interval interval, the time interval between the time t4 and the time t5 can be the preset time interval, the time interval between the time t5 and the time t6 can be the preset time interval, and the time interval between the time t6 and the time t7 can be the preset time interval The time interval, the time interval between time t7 and time t8 may be a preset time interval, and the time interval between time t8 and time t9 may be a preset time interval.
预设时间间隔可以由电子设备100的系统配置。The preset time interval may be configured by the system of the electronic device 100 .
可以理解的是,电子设备100可以对采集的每一帧图像依次进行手指检测。这样,可以及时地检测出图像帧中的手指,从而能够准确地确定用户开始点读的时间。It can be understood that, the electronic device 100 can sequentially perform finger detection on each frame of images collected. In this way, the finger in the image frame can be detected in time, so that the time when the user starts to read can be accurately determined.
可选地,电子设备100也可以每隔预设时间间隔采集的图像进行一次手指检测。这样,可以节约电子设备的功率。Optionally, the electronic device 100 may also perform finger detection on images collected at preset time intervals. In this way, the power of the electronic device can be saved.
进一步地,在一种可能的实现方式中,电子设备100可以在检测到图像帧中出现手指,且图像帧中的手指与书本B的垂直距离减小到时预设垂直距离D01时,电子设备100开始获取采集到的图像帧中手指的坐标。Further, in a possible implementation manner, when the electronic device 100 detects that a finger appears in the image frame, and the vertical distance between the finger in the image frame and the book B is reduced to the preset vertical distance D01, the electronic device 100 starts to acquire the coordinates of the finger in the captured image frame.
在一种可能的实现方式中,电子设备100可以在检测到图像帧中出现手指,且图像帧中的手指与书本B的垂直距离逐渐减小时,电子设备100开始获取采集到的图像帧中手指的坐标。In a possible implementation manner, when the electronic device 100 detects that a finger appears in the image frame, and the vertical distance between the finger in the image frame and the book B gradually decreases, the electronic device 100 starts to acquire the finger in the captured image frame coordinate of.
S204、电子设备100在时刻T11采集到的图像中手指的坐标与预设时长T21前采集的图像帧中手指的坐标的距离小于预设距离D1时,开始记录所采集到的图像中手指的坐标。S204: When the distance between the coordinates of the finger in the image collected at time T11 and the coordinates of the finger in the image frame collected before the preset time duration T21 is less than the preset distance D1, the electronic device 100 starts recording the coordinates of the finger in the collected image .
当电子设备100在时刻T11确定手指处于静止状态时,电子设备100可以记录时刻T11采集到的图像中手指的坐标,并将该坐标作为用户点读轨迹的起始点。当电子设备在时刻T11采集到的图像中手指的坐标与预设时长T21前采集的图像帧中手指的坐标的距离小于预设距离D1时,电子设备100可以确定用户手指在时刻T11时处于静止状态。When the electronic device 100 determines at time T11 that the finger is in a stationary state, the electronic device 100 may record the coordinates of the finger in the image collected at time T11 and use the coordinates as the starting point of the user's reading track. When the distance between the coordinates of the finger in the image collected by the electronic device at time T11 and the coordinates of the finger in the image frame collected before the preset duration T21 is less than the preset distance D1, the electronic device 100 may determine that the user's finger is stationary at time T11 state.
其中,预设时长T21可以是0.5秒,也可以是1秒,也可以是2秒,此处不作限定。预设距离D1可以是10个像素点、也可以是5个像素点,也可以是15个像素点,本申请实施例对预设距离D1的具体取值不作限定。The preset duration T21 may be 0.5 seconds, may be 1 second, or may be 2 seconds, which is not limited here. The preset distance D1 may be 10 pixels, 5 pixels, or 15 pixels, and the specific value of the preset distance D1 is not limited in this embodiment of the present application.
预设时长T21和预设距离D1可以由电子设备100的系统配置。The preset duration T21 and the preset distance D1 may be configured by the system of the electronic device 100 .
一般用户需要电子设备100辅助学习待识别文字时,用户会将手指点在待识别文字处。静止一段时间,然后再移动手指。例如,如图5中的(d)图所示,用户将手指点在文字“猫”处静止一段时间后(例如,0.5秒,1秒,此处不作限定),然后再从(d)图中手指所在位置处移动至(e)图中的手指所在的位置。在本申请实施例中,电子设备100在时刻T11采集到的图像中手指的坐标与预设时长T21前采集的图像帧中手指的坐标的距离小于预设距离D1时,电子设备100确定时刻T11采集到的图像中手指处于静止状态。时刻T11采集到的图像中手指的坐标点即为用户点读轨迹的起始点。即为用户开始将手指点在待识别文字处,开始选定待识别文字了。于是,电子设备100记录图像帧中手指的坐标。When a general user needs the electronic device 100 to assist in learning the character to be recognized, the user will point his finger on the character to be recognized. Hold for a while before moving your finger. For example, as shown in (d) of Fig. 5, the user points his finger on the text "cat" for a period of time (for example, 0.5 seconds, 1 second, not limited here), and then starts from (d) Move from the position of the middle finger to the position of the finger in the figure (e). In this embodiment of the present application, when the distance between the coordinates of the finger in the image collected by the electronic device 100 at time T11 and the coordinates of the finger in the image frame collected before the preset duration T21 is less than the preset distance D1, the electronic device 100 determines the time T11 The finger is in a stationary state in the captured image. The coordinate point of the finger in the image collected at time T11 is the starting point of the user's reading track. That is, the user starts to point the finger on the text to be recognized, and starts to select the text to be recognized. Thus, the electronic device 100 records the coordinates of the finger in the image frame.
可以理解的是,电子设备100检测到图像帧中有手指,然后获取手指的坐标。电子设备100可以暂时保存在内存中,等该图像帧中手指的坐标用于和下一帧图像中手指坐标计算距 离后,电子设备释放掉保存的该图像帧中手指的坐标。It can be understood that the electronic device 100 detects that there is a finger in the image frame, and then acquires the coordinates of the finger. The electronic device 100 may temporarily store the coordinates in the memory, and after the coordinates of the finger in the image frame are used to calculate the distance from the coordinates of the finger in the next frame of image, the electronic device releases the stored coordinates of the finger in the image frame.
电子设备100可以记录时刻T11拍摄的图像帧中手指的坐标。时刻T11拍摄的图像帧中手指的坐标可以记录在用于记录点读轨迹的内存中。电子设备100将时刻T11拍摄的图像帧中手指坐标与预设时长T21后的图像帧中手指的坐标计算距离后,电子设备100仍然将时刻T11拍摄的图像帧中手指的坐标可以记录在用于记录点读轨迹的内存中。The electronic device 100 may record the coordinates of the finger in the image frame captured at time T11. The coordinates of the finger in the image frame captured at time T11 can be recorded in the memory for recording the point reading track. After the electronic device 100 calculates the distance between the coordinates of the finger in the image frame captured at time T11 and the coordinates of the finger in the image frame after the preset duration T21, the electronic device 100 still records the coordinates of the finger in the image frame captured at time T11 in the image frame used for recording. Record the point read trace in memory.
S205、电子设备100在时刻T12采集的图像中手指的坐标与预设时长T22前采集的图像帧中手指的坐标的距离小于预设距离D2,停止记录所采集到的图像中手指的坐标。S205: The distance between the coordinates of the finger in the image collected by the electronic device 100 at time T12 and the coordinates of the finger in the image frame collected before the preset duration T22 is less than the preset distance D2, and stops recording the coordinates of the finger in the collected image.
当电子设备100再次检测到用户的手指处于静止状态。即,电子设备100确定在时刻T12采集的图像中手指的坐标预设时长T22前采集的图像帧中手指的坐标的距离小于预设距离D2。电子设备100在时刻T11之后,检测到用户的手指从运动状态再次静止时,电子设备100停止记录所采集的图像中手指的坐标。即电子设备100将时刻T12采集的图像中手指的坐标作为用户点读轨迹终点的坐标。When the electronic device 100 detects that the user's finger is in a stationary state again. That is, the electronic device 100 determines that the distance of the coordinates of the fingers in the image frames collected before the preset duration T22 of the coordinates of the fingers in the images collected at time T12 is smaller than the preset distance D2. After the time T11, when the electronic device 100 detects that the user's finger is stationary again from the motion state, the electronic device 100 stops recording the coordinates of the finger in the captured image. That is, the electronic device 100 takes the coordinates of the finger in the image collected at time T12 as the coordinates of the end point of the user's reading track.
其中,T22可以大于T21,也可以小于或等于T21,此处不作限定。D2可以大于D1,也可以小于或等于D1,此处不作限定。预设时长T22和预设距离D2可以由电子设备100的系统配置。Wherein, T22 may be greater than T21, and may also be less than or equal to T21, which is not limited here. D2 may be greater than D1, and may also be less than or equal to D1, which is not limited here. The preset duration T22 and the preset distance D2 may be configured by the system of the electronic device 100 .
示例性地,如图5所示,图5中的(g)图中t7时刻拍摄的图像帧中手指的坐标,与(f)图中t6时刻拍摄的图像帧中手指的坐标的距离小于预设距离D2,电子设备100停止保存所采集的图像中手指的坐标。Exemplarily, as shown in FIG. 5 , the distance between the coordinates of the finger in the image frame captured at time t7 in (g) in FIG. 5 and the coordinates of the finger in the image frame captured at time t6 in (f) is smaller than the predetermined distance. Assuming the distance D2, the electronic device 100 stops saving the coordinates of the finger in the captured image.
可以理解的是,电子设备100保存了时刻T11到时刻T12之间采集的图像中的手指的坐标。即为用户手指在绘本中一次点读的轨迹坐标。如图6所示,时刻T11至时刻T12之间采集的图像中的手指的坐标轨迹可以如图6中的线段P3P4所示。电子设备100可以保存线段P3P4之间点的坐标。线段P3P4即为用户手指在绘本中的手指轨迹,该手指807轨迹用于选定绘本中的待识别文字。It can be understood that the electronic device 100 saves the coordinates of the fingers in the images collected between time T11 and time T12. That is, the trajectory coordinates of the user's finger in the picture book to read at one time. As shown in FIG. 6 , the coordinate trajectory of the finger in the image collected between time T11 and time T12 may be shown as line segment P3P4 in FIG. 6 . The electronic device 100 may store the coordinates of the points between the line segments P3P4. The line segment P3P4 is the finger trajectory of the user's finger in the picture book, and the finger 807 trajectory is used to select the text to be recognized in the picture book.
S206、电子设备100在时刻T13采集到的图像中未检测到手指时,电子设备100根据时刻T11到时刻T12保存的手指的坐标确定手指点读的手势G。S206 , when the electronic device 100 does not detect a finger in the image collected at time T13 , the electronic device 100 determines the gesture G to be read by the finger according to the coordinates of the finger saved from time T11 to time T12 .
电子设备100在时刻T12之后(即时刻T13)采集到的图像中未检测手指时,电子设备100可以确定用户已经选定待识别文字。该待识别文字即为时刻T11到时刻T12中手指选定的文字。电子设备100可以根据时刻T11至时刻T12拍摄的图像帧中手指的坐标、以及用户点读的手势、和版面分析结果确定待识别文字。When the electronic device 100 does not detect a finger in the image collected after time T12 (ie, time T13 ), the electronic device 100 may determine that the user has selected the text to be recognized. The text to be recognized is the text selected by the finger from time T11 to time T12. The electronic device 100 may determine the text to be recognized according to the coordinates of the fingers in the image frames captured from the time T11 to the time T12, the gestures clicked by the user, and the layout analysis result.
首先,电子设备100可以根据时刻T11至时刻T12拍摄的多个图像帧中手指的坐标确定用户点读时的手势G。当时刻T11至时刻T12拍摄的前后两帧图像帧中手指坐标之间的距离小于预设距离D10,且时刻T11拍摄的图像帧中手指坐标与时刻T12拍摄的图像帧手指坐标之间的距离小于距离D11时,电子设备100确定手指的点读手势为“点”。D10小于或等于D11。First, the electronic device 100 may determine the gesture G when the user clicks according to the coordinates of the fingers in the multiple image frames captured from the time T11 to the time T12. When the distance between the finger coordinates in the two image frames before and after the time T11 to the time T12 is smaller than the preset distance D10, and the distance between the finger coordinates in the image frame captured at the time T11 and the finger coordinates in the image frame captured at the time T12 is smaller than the preset distance D10 When the distance is D11, the electronic device 100 determines that the pointing gesture of the finger is "point". D10 is less than or equal to D11.
当时刻T11拍摄的图像帧中手指的坐标与时刻T12拍摄的图像帧手指的坐标之间的距离大于距离D12时,且时刻T11至时刻T12拍摄的图像帧手指的坐标线性相关,电子设备100可以确定出手指的点读手势为“划线”。When the distance between the coordinates of the finger in the image frame captured at time T11 and the coordinates of the finger in the image frame captured at time T12 is greater than the distance D12, and the coordinates of the finger in the image frames captured from time T11 to time T12 are linearly correlated, the electronic device 100 may It is determined that the pointing gesture of the finger is "drawing a line".
当时刻T11至时刻T12拍摄的前后两帧图像帧中手指坐标之间的距离逐渐增大,然后又逐渐减小,且时刻T11拍摄的图像帧中手指坐标与时刻T12拍摄的图像帧手指坐标之间的距离小于距离D13时,电子设备100可以确定出手指的点读手指为“画圈”。When the distance between the finger coordinates in the two image frames before and after the time T11 to the time T12 is gradually increased, and then gradually decreases, the distance between the finger coordinates in the image frame captured at time T11 and the finger coordinates in the image frame captured at time T12 is When the distance between them is smaller than the distance D13, the electronic device 100 may determine that the pointing finger of the finger is "drawing a circle".
由于,绘本的大小,绘本中文字的大小以及文字间的距离可以影响D10、D11、D12、D13 的取值。为了更准确地判断出用户点读的手势。在一种可能的实现方式中,电子设备100对时刻T11至时刻T12拍摄的图像帧中手指坐标点进行凸包拟合,均匀采样后将采样点转换为极坐标点,得到极坐标图。电子设备100将极坐标图输入手势识别模型,该手势识别模型对极坐标图进行识别后,输出该极坐标图对应的手势类型。Because the size of the picture book, the size of the characters in the picture book and the distance between the characters can affect the values of D10, D11, D12, and D13. In order to more accurately determine the gesture of the user's reading. In a possible implementation manner, the electronic device 100 performs convex hull fitting on the finger coordinate points in the image frames captured from time T11 to time T12, and converts the sampling points into polar coordinate points after uniform sampling to obtain a polar coordinate map. The electronic device 100 inputs the polar coordinate graph into the gesture recognition model, and after the gesture recognition model recognizes the polar coordinate graph, it outputs the gesture type corresponding to the polar coordinate graph.
示例性地,本申请实施例可以使用如下公式对笛卡尔坐标下的手指坐标点进行凸包拟合:Exemplarily, in this embodiment of the present application, the following formula can be used to perform convex hull fitting on the finger coordinate points in Cartesian coordinates:
New(x1,y1)=f(old(x0,y0))   公式1New(x1,y1)=f(old(x0,y0)) Formula 1
其中,old(x,y)为电子设备100从时刻T11至时刻T12采集的图像帧中确定出的手指的坐标,New(x,y)为该手指坐标进行凸包拟合后的坐标。Wherein, old(x,y) is the coordinates of the finger determined from the image frames collected by the electronic device 100 from time T11 to time T12, and New(x,y) is the coordinates of the finger after convex hull fitting is performed on the coordinates of the finger.
电子设备100可以对拟合后的凸包轨迹进行均匀采样,得到采样点(xi,yi)(i=1,...,N)。电子设备100可以确定出采样点中的中心点M(xm,ym):The electronic device 100 may uniformly sample the fitted convex hull trajectory to obtain sampling points (xi, yi) (i=1, . . . , N). The electronic device 100 can determine the center point M(xm, ym) in the sampling points:
Figure PCTCN2022081042-appb-000002
Figure PCTCN2022081042-appb-000002
电子设备100可以计算出每个凸包拟合点相对于中心点的相对位置的坐标为:The electronic device 100 can calculate the coordinates of the relative position of each convex hull fitting point relative to the center point as:
New’(x2,y2)=(New(x1,y1)-M(xm,ym))   公式3New'(x2,y2)=(New(x1,y1)-M(xm,ym)) Formula 3
电子设备100可以将凸包拟合转换成极坐标,其中极坐标的原点即为上述公式2计算出的中心点,电子设备100可以根据凸包拟合点与中心点的相对位置确定每个凸包拟合点的极坐标,具体可参考如下公式:The electronic device 100 can convert the convex hull fitting into polar coordinates, where the origin of the polar coordinates is the center point calculated by the above formula 2, and the electronic device 100 can determine each convex hull according to the relative position of the convex hull fitting point and the center point. The polar coordinates of the fitting point of the package can be referred to the following formula:
Figure PCTCN2022081042-appb-000003
Figure PCTCN2022081042-appb-000003
θ=atan2(y2,x2)θ∈(-π,π)   公式5θ=atan2(y2,x2)θ∈(-π,π) Formula 5
电子设备100根据公式4和公式5可以将采样点转换成极坐标点,然后将多个极坐标点保存成极坐标图。电子设备100将极坐标图输入到手势识别模型中,可得到该极坐标图对应的手势类型。The electronic device 100 can convert the sampling points into polar coordinate points according to Formula 4 and Formula 5, and then save the plurality of polar coordinate points as a polar coordinate map. The electronic device 100 inputs the polar coordinate graph into the gesture recognition model, and can obtain the gesture type corresponding to the polar coordinate graph.
如图7A所示,图7A示例性地示出了一种极坐标图。该极坐标图中手指的坐标可以按照电子设备100获取该手指坐标的时间顺序依次连接,可形成闭合曲线。电子设备100可以将图7A中示出的极坐标图输入到手势识别模型中,该手势识别模型可以输出该极坐标图对应的手指的手势类型。该手势类型即为“画圈”。As shown in FIG. 7A, FIG. 7A exemplarily shows a polar diagram. The coordinates of the finger in the polar coordinate diagram can be sequentially connected according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and a closed curve can be formed. The electronic device 100 may input the polar coordinate diagram shown in FIG. 7A into the gesture recognition model, and the gesture recognition model may output the gesture type of the finger corresponding to the polar coordinate diagram. The gesture type is "circle".
如图7B所示,图7B示例性地示出了另一种极坐标图。该极坐标图中手指的坐标可以按照电子设备100获取该手指坐标的时间顺序依次保存在该极坐标图中,多个手指的坐标点集中分布在一定区域内。电子设备100可以将图7B中示出的极坐标图输入到手势识别模型中,该手势识别模型可以输出该极坐标图对应的手指的手势类型。该手势类型即为“点”。As shown in FIG. 7B, FIG. 7B exemplarily shows another polar diagram. The coordinates of the finger in the polar coordinate graph may be sequentially stored in the polar coordinate graph according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and the coordinate points of a plurality of fingers are concentrated in a certain area. The electronic device 100 may input the polar coordinate diagram shown in FIG. 7B into the gesture recognition model, and the gesture recognition model may output the gesture type of the finger corresponding to the polar coordinate diagram. The gesture type is "point".
如图7C所示,图7C示例性地示出了又一种极坐标图。该极坐标图中手指的坐标可以按照电子设备100获取该手指坐标的时间顺序依次连接,可形成一条折线。电子设备100可以将图7C中示出的极坐标图输入到手势识别模型中,该手势识别模型可以输出该极坐标图对应的手指的手势类型。该手势类型即为“划线”。As shown in FIG. 7C, FIG. 7C exemplarily shows yet another polar plot. The coordinates of the finger in the polar coordinate diagram can be sequentially connected according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and a polyline can be formed. The electronic device 100 can input the polar coordinate diagram shown in FIG. 7C into the gesture recognition model, and the gesture recognition model can output the gesture type of the finger corresponding to the polar coordinate diagram. The gesture type is "Draw".
S207、电子设备根据时刻T11到时刻T12保存的手指的坐标、手势G以及多个文字区域和多个文字区域在书本B中的位置确定待识别文字区域Q。S207 , the electronic device determines the text region Q to be recognized according to the coordinates of the finger, the gesture G, and the positions of the multiple text regions and the multiple text regions in the book B stored from time T11 to time T12 .
电子设备可以根据时刻T11到时刻T12记录的手指的坐标,即用户手指在绘本中的轨迹,手势结果版面分析的结果,可以确定出待识别文字区域Q。The electronic device can determine the text area Q to be recognized according to the coordinates of the fingers recorded from time T11 to time T12, that is, the trajectory of the user's finger in the picture book, and the result of the gesture result layout analysis.
具体地,在一种可能的实现方式中,当手势G为“点”时,电子设备100将时刻T11到时刻T12记录的轨迹所在的文字区域作为待识别文字区域。电子设备100将待识别文字区域Q中与时刻T11到时刻T12保存的手指的坐标距离最小的文字作为用户指定的待识别文字。Specifically, in a possible implementation manner, when the gesture G is "point", the electronic device 100 uses the text area where the track recorded from time T11 to time T12 is located as the text area to be recognized. The electronic device 100 takes the text with the smallest distance from the coordinates of the finger stored from time T11 to time T12 in the text area to be recognized as the text to be recognized specified by the user.
在一种可能的实现方式中,当手势G为“划线”时,电子设备100将与时刻T11到时刻T12记录的轨迹相交的文字区域作为待识别文字区域Q。可以理解的是,轨迹与文字区域相交可以是,轨迹全都在文字区域内,或者轨迹的预设比例部分在文字区域内(例如,轨迹的一半在文字区域A内)。电子设备100可以将文字区域Q中在轨迹上方的字作为用户指定的待识别文字。In a possible implementation manner, when the gesture G is "drawing a line", the electronic device 100 uses the text region intersecting with the track recorded from time T11 to time T12 as the text region Q to be recognized. It can be understood that, the intersection of the track and the text area may be that all the track is within the text area, or a preset proportion of the track is within the text area (for example, half of the track is within the text area A). The electronic device 100 may take the characters above the track in the character area Q as the characters to be recognized designated by the user.
在一种可能的实现方式中,当手势G为“画圈”时,电子设备可以将与时刻T11到时刻T12记录的轨迹重合的文字区域作为待识别文字区域Q。电子设备可以将文字区域Q中在时刻T11到时刻T12记录的轨迹内的文字作为用户选定的待识别文字。In a possible implementation manner, when the gesture G is "drawing a circle", the electronic device may use the text region that overlaps with the track recorded from time T11 to time T12 as the text region Q to be recognized. The electronic device may use the text in the track recorded from time T11 to time T12 in the text area Q as the text to be recognized selected by the user.
在一种可能的实现方式中,用户可以在电子设备100中设置只识别文字区域的点读。即电子设备100确定时刻T11到时刻T12保存的手指的坐标形成的轨迹在绘本的文字区域中,电子设备100才确定待识别文字区域Q,并执行步骤S208。当电子设备100确定用户时刻T11到时刻T12保存的手指的坐标形成的轨迹在绘本的绘图区域或表格区域时,电子设备100不执行步骤S208。In a possible implementation manner, the user may set point reading that only recognizes the text area in the electronic device 100 . That is, the electronic device 100 determines that the track formed by the coordinates of the fingers stored from time T11 to time T12 is in the text area of the picture book, and the electronic device 100 determines the text area Q to be recognized, and executes step S208. When the electronic device 100 determines that the track formed by the coordinates of the fingers saved by the user from time T11 to time T12 is in the drawing area or table area of the picture book, the electronic device 100 does not execute step S208 .
示例性地,如图8所示,图8示例性地示出了绘本800。该绘本800可以包括绘图区域801、表格区域802、文字区域803和文字区域804。电子设备100在时刻T11到时刻T12保存的手指的坐标形成的轨迹可以是图8中的手指轨迹807或者手指轨迹809。电子设备100可以确定该手指轨迹807在绘图区域801中、或手指轨迹809在表格区域802中。当电子设备100确定该手势轨迹807在绘图区域801中后、或手指轨迹809在表格区域802中后,电子设备100可以在显示屏中提示用户当前点读区域不符合设定的可点读区域。电子设备不执行步骤S208。当电子设备100确定手指轨迹在文字区域803或文字区域804时,电子设备100可以根据手指轨迹和文字区域来确定待识别文字区域,并执行步骤S208。Illustratively, as shown in FIG. 8 , FIG. 8 exemplarily shows a picture book 800 . The picture book 800 may include a drawing area 801 , a table area 802 , a text area 803 and a text area 804 . The track formed by the coordinates of the finger stored by the electronic device 100 from time T11 to time T12 may be finger track 807 or finger track 809 in FIG. 8 . The electronic device 100 may determine that the finger trace 807 is in the drawing area 801 , or the finger trace 809 is in the table area 802 . After the electronic device 100 determines that the gesture trace 807 is in the drawing area 801 or the finger trace 809 is in the table area 802, the electronic device 100 may prompt the user on the display screen that the current point-to-read area does not conform to the set point-to-read area . The electronic device does not perform step S208. When the electronic device 100 determines that the finger track is in the text region 803 or the text region 804, the electronic device 100 may determine the text region to be recognized according to the finger track and the text region, and execute step S208.
进一步地,可以理解的是,若用户未设置只识别文字区域,当用户选定绘图区域中的文字时,电子设备100可以识别并播报出用户选定的待识别文字。电子设备100在进行版面分析时,可以得到绘图区域中所包含文字的位置信息。这样,电子设备100根据用户的手指轨迹和绘图区域中的文字的位置信息,可以确定出用户选定的待识别文字。从而,电子设备100可以识别并播报出用户选定的待识别文字。Further, it can be understood that, if the user does not set the recognition-only text area, when the user selects text in the drawing area, the electronic device 100 can recognize and broadcast the text to be recognized selected by the user. When the electronic device 100 performs layout analysis, the position information of the characters contained in the drawing area can be obtained. In this way, the electronic device 100 can determine the text to be recognized selected by the user according to the user's finger trajectory and the position information of the text in the drawing area. Therefore, the electronic device 100 can recognize and broadcast the text to be recognized selected by the user.
进一步地,若电子设备100确定时刻T11至时刻T12保存的手指的坐标形成的轨迹不在文字区域时,电子设备100可以检测采集到的图像中是否有手指,若检测到手指,则执行步骤S203。若电子设备100在预设时间内未在采集的图像中检测到手指,那电子设备100可以关闭“点读”APP。或者,电子设备100可以进入待机状态。这样,可以节约电子设备100的电量,降低功耗。Further, if the electronic device 100 determines that the track formed by the coordinates of the fingers stored from time T11 to time T12 is not in the text area, the electronic device 100 can detect whether there is a finger in the captured image, and if a finger is detected, step S203 is performed. If the electronic device 100 does not detect a finger in the captured image within a preset time, the electronic device 100 may close the "point reading" APP. Alternatively, the electronic device 100 may enter a standby state. In this way, the power of the electronic device 100 can be saved and the power consumption can be reduced.
在一种可能的实现方式中,当电子设备100确定时刻T11到时刻T12保存的手指的坐标形成的轨迹落在文字区域中的部分大于预设阈值时,电子设备确定待识别文字区域Q,并执行步骤S208。否则电子设备100不执行确定待识别文字区域Q以及步骤S208。其中,该预设阈值可以是50%,或者55%、60%等等,此处不作限定。例如,如图8示出的手指轨迹808,该手指轨迹约为20%的部分落在文字区域,若预设阈值为50%,那么电子设备100不执行确定待识别文字区域Q以及步骤S208。若电子设备100时刻T11到时刻T12保存的手指的坐 标形成的轨迹为手指轨迹805或手指轨迹806时,电子设备100可以确定该手指轨迹805或手指轨迹806轨迹落在文字区域中的部分大于预设阈值。电子设备可以根据该手指轨迹和文字区域确定待识别文字区域。In a possible implementation manner, when the electronic device 100 determines that the part of the trajectory formed by the coordinates of the fingers saved from time T11 to time T12 that falls within the text area is greater than a preset threshold, the electronic device determines the text area to be recognized Q, and Step S208 is executed. Otherwise, the electronic device 100 does not perform the determination of the to-be-recognized text area Q and step S208. Wherein, the preset threshold may be 50%, or 55%, 60%, etc., which is not limited here. For example, as shown in the finger track 808 shown in FIG. 8 , about 20% of the finger track falls in the text area. If the preset threshold is 50%, the electronic device 100 does not perform determining the text area Q to be recognized and step S208 . If the track formed by the coordinates of the finger saved by the electronic device 100 from time T11 to time T12 is the finger track 805 or the finger track 806, the electronic device 100 can determine that the part of the finger track 805 or the finger track 806 that falls within the text area is larger than the predetermined track. Set the threshold. The electronic device can determine the text region to be recognized according to the finger track and the text region.
在一种可能的实现方式中,当电子设备在时刻T11至时刻T12保存的手指的坐标形成的轨迹为两个或多个手指轨迹时,电子设备100可以将两个或多个轨迹中轨迹范围更大的轨迹与文字区域确定的文字区域作为待识别文字区域Q。例如,如图8所示,若电子设备100在时刻T11至时刻T12保存的手指的坐标形成的轨迹为手指轨迹805和手指轨迹806,手指轨迹806的范围大于手指轨迹805的范围,那么电子设备100将手指轨迹806与文字区域803确定的文字区域作为最终的待识别文字区域Q。In a possible implementation manner, when the track formed by the coordinates of the fingers saved by the electronic device from time T11 to time T12 is two or more finger tracks, the electronic device 100 may divide the two or more tracks into the track range. The larger trajectory and the character area determined by the character area are used as the character area Q to be recognized. For example, as shown in FIG. 8 , if the tracks formed by the coordinates of the fingers stored by the electronic device 100 from time T11 to time T12 are finger track 805 and finger track 806 , and the range of finger track 806 is larger than the range of finger track 805 , then the electronic device 100 takes the character area determined by the finger track 806 and the character area 803 as the final character area Q to be recognized.
在一种可能的实现方式中,电子设备100可以根据时刻T11到时刻T12保存的手指的坐标、手势G以及多个文字区域和多个文字区域在书本B中的位置确定待识别文字区域Q。电子设备100可以通过文字检测框圈定待识别区域Q中的待检测文字。如图9A所示,图9A中可以包括待识别文字区域900。电子设备100根据手指的坐标可以确定出待识别文字为文字检测框902圈定的文字“猫”。In a possible implementation manner, the electronic device 100 may determine the text region Q to be recognized according to the coordinates of the finger, the gesture G, and the positions of the multiple text regions and the multiple text regions in the book B saved from time T11 to time T12 . The electronic device 100 can delineate the text to be detected in the to-be-recognized area Q through the text detection frame. As shown in FIG. 9A , FIG. 9A may include a text area 900 to be recognized. The electronic device 100 can determine that the character to be recognized is the character “cat” delineated by the character detection frame 902 according to the coordinates of the finger.
由于文字和手指指尖存在一定夹角,电子设备100可能检测出的待识别文字不准确。Since there is a certain angle between the characters and the fingertips, the characters to be recognized that are detected by the electronic device 100 may be inaccurate.
进一步地,在一种可能的实现方式中,电子设备根据手指与待识别文字区域中文字的夹角θ、以及手指检测框的宽度d,确定偏移量S0(S0=dcosθ),电子设备100将按照偏移量S0移动文字检测框,电子设备100将文字检测框圈定的文字作为待识别文字区域Q中的待识别文字。Further, in a possible implementation manner, the electronic device determines the offset S0 (S0=dcosθ) according to the angle θ between the finger and the text in the text area to be recognized, and the width d of the finger detection frame, and the electronic device 100 The character detection frame will be moved according to the offset S0, and the electronic device 100 takes the character enclosed by the character detection frame as the character to be recognized in the character area Q to be recognized.
可以理解的是,电子设备100在将绘本的图像进行版面分析时,可以得到该绘本中文字区域中文字的倾斜角度β。电子设备100在进行手指检测时可以获取该手指的倾斜角度σ。电子设备可以根据倾斜角度β与倾斜角度σ得到手指与待识别文字区域中文字的夹角θ。It can be understood that, when the electronic device 100 performs layout analysis on the image of the picture book, the inclination angle β of the characters in the character area in the picture book can be obtained. The electronic device 100 may acquire the inclination angle σ of the finger when detecting the finger. The electronic device can obtain the angle θ between the finger and the text in the text area to be recognized according to the inclination angle β and the inclination angle σ.
如图9B所示,图9B中的文字检测框903,即为电子设备100将图9A中的文字检测框902按照偏移量移动后得到文字检测框。文字检测框903确定圈定的文字“和”,即为待识别文字。这样,电子设备100可以更准确地确定出用户指定的待识别文字。As shown in FIG. 9B , the text detection frame 903 in FIG. 9B is the text detection frame obtained after the electronic device 100 moves the text detection frame 902 in FIG. 9A according to the offset. The text detection box 903 determines that the circled text "sum" is the text to be recognized. In this way, the electronic device 100 can more accurately determine the text to be recognized specified by the user.
可选地,电子设备100可以将偏移量S0乘以偏移系数α,得到偏移量S1,电子设备按照偏移量S1移动文字检测框,电子设备100将文字检测框圈定的文字作为待识别文字区域Q中的待识别文字。偏移系数α可以由电子设备100的系统配置。偏移系数α的取值范围可以为[0.2,2]。当电子设备100确定出用户的手指多次点在同一位置时,表明电子设备未能准确识别并播报出用户指定的待识别文字,电子设备100可以调整偏移系数α。Optionally, the electronic device 100 can multiply the offset S0 by the offset coefficient α to obtain the offset S1, the electronic device moves the text detection frame according to the offset S1, and the electronic device 100 uses the text enclosed by the text detection frame as the text to be Recognize the text to be recognized in the text area Q. The offset coefficient α may be configured by the system of the electronic device 100 . The value range of the offset coefficient α may be [0.2, 2]. When the electronic device 100 determines that the user's finger is on the same position multiple times, it indicates that the electronic device fails to accurately recognize and broadcast the text to be recognized specified by the user, and the electronic device 100 can adjust the offset coefficient α.
可选地,电子设备100可以记录预设时长内点读时手指与文字的夹角,以及该角度对应的偏移量。电子设备100可以将手指与文字的夹角与偏移量建立映射关系。这样,当电子设备100确定手指与文字的夹角后,可根据手指与文字的夹角与偏移量建立映射关系,查找该夹角对应的偏移量。这样,可以减少电子设备的计算量。Optionally, the electronic device 100 may record the included angle between the finger and the text during point reading within a preset time period, and the offset corresponding to the angle. The electronic device 100 may establish a mapping relationship between the angle between the finger and the text and the offset. In this way, after the electronic device 100 determines the angle between the finger and the character, a mapping relationship can be established according to the angle between the finger and the character and the offset to find the offset corresponding to the angle. In this way, the calculation amount of the electronic device can be reduced.
可选地,在一种可能的实现方式中,当电子设备100确定采集的图像帧中的手指到书本B的垂直距离大于预设垂直距离D11时,电子设备100根据时刻T11到时刻T12保存的手指的坐标确定手指点读的手势G。Optionally, in a possible implementation manner, when the electronic device 100 determines that the vertical distance from the finger in the captured image frame to the book B is greater than the preset vertical distance D11, the electronic device 100 saves the data according to the time T11 to the time T12. The coordinates of the finger determine the gesture G that the finger reads.
在一种可能的实现方式中,当电子设备100确定采集的图像帧中的手指到书本B的垂直距离逐渐增大时,电子设备100根据时刻T11到时刻T12保存的手指的坐标确定手指点读的手势G。In a possible implementation manner, when the electronic device 100 determines that the vertical distance between the finger in the captured image frame and the book B gradually increases, the electronic device 100 determines the finger point reading according to the coordinates of the finger saved from time T11 to time T12 gesture G.
S208、电子设备100识别并播报出待识别文字区域Q中的文字。S208 , the electronic device 100 recognizes and broadcasts the text in the text region Q to be recognized.
电子设备100可以识别出在待识别文字区域Q中检测出的文字。电子设备识别文字后并将文字语音播报出来。例如,图1A中示出的,电子设备100播报出用户指定的文字“猫”。The electronic device 100 can recognize the text detected in the text region Q to be recognized. After the electronic device recognizes the text, it broadcasts the text and voice. For example, as shown in FIG. 1A , the electronic device 100 broadcasts the text "cat" designated by the user.
可以理解的是,本申请实施例中用户指定的文字包括但不限于汉字、日文、韩文、英文等等不同形式的字。It can be understood that the characters specified by the user in the embodiments of the present application include, but are not limited to, characters in different forms such as Chinese characters, Japanese, Korean, and English.
可以理解的是,步骤S202可以在步骤S203之后,步骤S207之前执行。It can be understood that step S202 can be executed after step S203 and before step S207.
通过本申请实施例提供的一种识别点读文字的方法,电子设备100可以连续采集第一绘本的图像;当电子设备采集到的图像中出现手指时,电子设备确定用户开始点读;电子设备对第一绘本的图像进行分析得到文字分析结果;当电子设备当前采集的图像帧中手指的坐标与预设时长前采集的图像帧中手指坐标的距离小于预设距离时,电子设备确定图像帧中的手指静止,电子设备可以记录手指两次静止之间的轨迹坐标。电子设备可以根据该两次静止之间的轨迹坐标确定待识别文字区域Q以及待识别文字区域Q中的待识别文字。电子设备识别并播报出该待识别文字。这样,电子设备100手指两次静止时的坐标分别作为用户点读轨迹的起点和终点。这样,电子设备100可以准确地确定出用户点读时的起始位置。从而,电子设备100可以根据手指两次静止之间的轨迹坐标准确地确定出待识别文字。这样,可以提高电子设备的点读正确率,从而可以提升用户体验。并且,电子设备100可以识别任意书本中的文字,不需要定制书本。With the method for recognizing point-to-read text provided by the embodiment of the present application, the electronic device 100 can continuously collect images of the first picture book; when a finger appears in the image collected by the electronic device, the electronic device determines that the user starts point-reading; the electronic device The image of the first picture book is analyzed to obtain a text analysis result; when the distance between the coordinates of the finger in the image frame currently collected by the electronic device and the coordinates of the finger in the image frame collected before the preset duration is less than the preset distance, the electronic device determines the image frame The finger in the middle is stationary, and the electronic device can record the track coordinates between the two stationary fingers. The electronic device may determine the to-be-recognized character area Q and the to-be-recognized characters in the to-be-recognized character area Q according to the track coordinates between the two stationary periods. The electronic device recognizes and broadcasts the text to be recognized. In this way, the coordinates of the electronic device 100 when the finger is stationary twice serve as the starting point and the ending point of the user's reading track, respectively. In this way, the electronic device 100 can accurately determine the starting position when the user clicks. Therefore, the electronic device 100 can accurately determine the character to be recognized according to the track coordinates between two times of the finger resting. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience. In addition, the electronic device 100 can recognize characters in any book, and does not need to customize the book.
图10示例性地示出了本申请实施例提供的另一种识别点读文字的方法流程图。如图10所示,本申请提供的一种识别点读文字的方法可以包括如下步骤:FIG. 10 exemplarily shows a flowchart of another method for recognizing point-to-read characters provided by an embodiment of the present application. As shown in Figure 10, a method for recognizing point-and-read characters provided by the present application may include the following steps:
S1001、响应于用户的第一操作,电子设备100开始采集图像,其中,电子设备100采集的图像包括用户的手指和书本的内容,用户的手指和书本位于电子设备的目标区域内。S1001. In response to the user's first operation, the electronic device 100 starts to capture an image, wherein the image captured by the electronic device 100 includes the user's finger and the content of the book, and the user's finger and the book are located in the target area of the electronic device.
第一操作可以参考步骤S201中的描述,此处不再赘述。响应于用户的第一操作,电子设备饿00的摄像头可以持续地采集图像。For the first operation, reference may be made to the description in step S201, which is not repeated here. In response to the user's first operation, the camera of the electronic device 00 may continuously capture images.
可选地,电子设备在执行步骤S1002之前,也可以执行上述步骤S202。Optionally, before the electronic device performs step S1002, the foregoing step S202 may also be performed.
S1002、电子设备100根据采集的图像识别出的用户手指的位置移动来识别用户的点读手势。S1002 , the electronic device 100 recognizes the pointing gesture of the user according to the position movement of the user's finger recognized by the collected image.
电子设备100可以在采集的图像中识别出用户的手指,并可以确定用户的手指在该帧图像中的位置。电子设备100可以根据采集的多帧图像中的手指位置识别出用户的点读手势。The electronic device 100 can identify the user's finger in the captured image, and can determine the position of the user's finger in the frame of image. The electronic device 100 may recognize the user's pointing gesture according to the finger positions in the collected multi-frame images.
在一种可能的实现方式中,点读手势包括如下一项或多项:点、划线、画圈。In a possible implementation manner, the point-to-read gesture includes one or more of the following: dots, dashes, and circles.
在一种可能的实现方式中,电子设备根据采集的图像识别出的用户的手指的位置移动来识别用户的点读手势,包括:电子设备通过采集的图像检测到用户手指后,若在第一预设时长内检测到采集的图像中手指的位置移动小于第一预设距离,电子设备将第一位置记录为点读手势的起点;电子设备在开始记录点读手势的起点后,若检测到第二预设时长内采集的图像中手指的第二位置移动小于第二预设距离,电子设备将第二位置记录为点读手势的终点;电子设备根据点读手势的起点和点读手势的终点来识别点读手势。In a possible implementation manner, the electronic device recognizes the user's pointing gesture according to the position movement of the user's finger recognized by the collected image, including: after the electronic device detects the user's finger through the collected image, if the first It is detected that the position of the finger in the collected image moves less than the first preset distance within the preset time period, and the electronic device records the first position as the starting point of the pointing gesture; after the electronic device starts recording the starting point of the pointing gesture, if it detects The movement of the second position of the finger in the image collected within the second preset duration is less than the second preset distance, and the electronic device records the second position as the end point of the reading gesture; End point to recognize the read gesture.
可以理解的,电子设备100可以从点读手势的起点开始记录起点处手指的坐标,到点读手势的终点时结束记录终点处手指的坐标。It can be understood that the electronic device 100 may start recording the coordinates of the finger at the starting point from the starting point of the pointing gesture, and finish recording the coordinates of the finger at the end point when the pointing gesture ends.
在一种可能的实现方式中,电子设备根据所述点读手势的起点和所述点读手势的终点来识别所述点读手势,包括:若电子设备记录的起点与起点至终点之间的任一手指的位置的距离小于第三预设距离,则电子设备识别出点读手势为点;若电子设备记录的起点至所述终点 之间的手指的位置的坐标线性相关,则电子设备识别出点读手势为划线;若电子设备记录的起点与终点的距离小于第四预设距离,且起点,与起点至终点之间的手指的位置的距离大于第五预设距离,电子设备识别出点读手势为画圈。In a possible implementation manner, the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, including: if the starting point recorded by the electronic device is between the starting point and the ending point If the distance of the position of any finger is less than the third preset distance, the electronic device recognizes that the point reading gesture is a point; if the coordinates of the position of the finger between the starting point recorded by the electronic device and the end point are linearly related, then the electronic device recognizes The gesture of reading out the point is a line; if the distance between the starting point and the ending point recorded by the electronic device is less than the fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than the fifth preset distance, the electronic device recognizes The gesture of reading out point is to draw a circle.
上述步骤S201-步骤S208中的手势G也可以称为点读手势。这里步骤S1002可以参考上述步骤S203-步骤S206中的描述,此处不再赘述。The gesture G in the above steps S201 to S208 may also be referred to as a pointing gesture. Herein, for step S1002, reference may be made to the descriptions in the foregoing steps S203 to S206, which are not repeated here.
S1003、电子设备100根据点读手势和点读手势的轨迹的位置,确定采集的图像中书本的内容中的目标文字。S1003 , the electronic device 100 determines the target text in the content of the book in the captured image according to the pointing gesture and the position of the trajectory of the pointing gesture.
电子设备100可以点读手势和点读手势的轨迹在图像中的位置,可以确定书本的内容中的目标文字。该目标文字即为用户在该书本中选定的需要识别的文字。The electronic device 100 can click the gesture and the position of the trajectory of the gesture in the image, and can determine the target text in the content of the book. The target text is the text selected by the user to be recognized in the book.
在一种可能的实现方式中,电子设备根据点读手势和点读手势的轨迹的位置,确定采集的图像中书本的内容中的目标文字,包括:电子设备根据采集的图像确定书本的内容中文字区域的位置;电子设备根据点读手势和点读手势的轨迹的位置,以及文字区域的位置,确定书本的内容中的目标文字。这里,电子设备根据采集的图像确定书本的内容中文字区域的位置即为电子设备对采集的图像进行版面分析,然后分析出该书本中文字区域的位置。电子设备进行版面分析的具体过程可以参考上述步骤S202中的描述,此处不再赘述。In a possible implementation manner, the electronic device determines the target text in the content of the book in the collected image according to the position of the reading gesture and the trajectory of the reading gesture, including: the electronic device determines the content of the book according to the collected image Chinese The position of the word area; the electronic device determines the target text in the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area. Here, the electronic device determines the position of the text area in the content of the book according to the collected image, that is, the electronic device performs layout analysis on the collected image, and then analyzes the position of the text area in the book. For the specific process of the layout analysis performed by the electronic device, reference may be made to the description in the foregoing step S202, which will not be repeated here.
在一种可能的实现方式中,电子设备根据点读手势和点读手势的轨迹的位置,以及文字区域的位置,确定书本的内容中的目标文字,包括:电子设备根据点读手势的轨迹以及第一书本中文字区域的位置,确定第一文字区域,第一文字区域中包含第一轨迹,第一轨迹为点读手势的轨迹中的大于或等于预设比例的一部分;电子设备根据第一轨迹和点读手势以及第一文字区域,确定书本的内容中的目标文字。In a possible implementation manner, the electronic device determines the target text in the content of the book according to the position of the reading gesture and the trajectory of the reading gesture, and the position of the text area, including: the electronic device according to the trajectory of the reading gesture and The position of the text area in the first book determines the first text area, the first text area includes a first trajectory, and the first trajectory is a part of the trajectory of the pointing gesture that is greater than or equal to a preset ratio; the electronic device is based on the first trajectory and The point reading gesture and the first text area are used to determine the target text in the content of the book.
在一种可能的实现方式中,电子设备根据第一轨迹和点读手势以及第一文字区域,确定书本的内容中的目标文字,包括:若点读手势为点,电子设备确定第一文字区域中与第一轨迹距离最小的文字为目标文字;若点读势为划线,电子设备确定第一文字区域中在第一轨迹上方的文字为目标文字;若点读手势为画圈,电子设备确定第一文字区域中在第一轨迹中的文字为目标文字。In a possible implementation manner, the electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture, and the first text area, including: if the pointing gesture is a point, the electronic device determines that the first text area is the same as the one in the first text area. The text with the smallest distance from the first track is the target text; if the reading potential is a dash, the electronic device determines that the text above the first track in the first text area is the target text; if the reading gesture is a circle, the electronic device determines the first text The text in the first track in the region is the target text.
这里步骤S1003可以参考步骤S207中的描述,此处不再赘述。Herein, step S1003 may refer to the description in step S207, which will not be repeated here.
S1004、电子设备100播报已识别的目标文字。S1004, the electronic device 100 broadcasts the recognized target text.
电子设备100可以识别出目标文字。电子设备识别目标文字后并将文字语音播报出来。例如,图1A中示出的,电子设备100播报出用户指定的文字“猫”。The electronic device 100 can recognize the target text. After the electronic device recognizes the target text, it broadcasts the text and voice. For example, as shown in FIG. 1A , the electronic device 100 broadcasts the text "cat" designated by the user.
步骤S208中的待识别文字区域Q中的文字可以称为目标文字。The text in the to-be-recognized text area Q in step S208 may be referred to as a target text.
可以理解的是,本申请实施例中电子设备100可以识别出用户指定的任意书本中的文字。用户指定的文字包括但不限于汉字、日文、韩文、英文等等不同形式的字。It can be understood that, in this embodiment of the present application, the electronic device 100 can recognize the text in any book designated by the user. The characters specified by the user include but are not limited to characters in different forms such as Chinese characters, Japanese, Korean, and English.
通过本申请实施例提供的一种识别点读文字的方法,电子设备100响应于用户的第一操作,电子设备100开始采集图像,其中,电子设备100采集的图像包括用户的手指和书本的内容,用户的手指和书本位于电子设备的目标区域内;电子设备100根据采集的图像识别出的用户的手指的位置移动来识别用户的点读手势;电子设备100根据点读手势和点读手势的轨迹的位置,确定采集的图像中书本的内容中的目标文字;电子设备100播报已识别出的目标文字。这样,可以提高电子设备的点读正确率,从而可以提升用户体验。并且,电子设备100可以识别任意书本中的文字,不需要定制书本。With the method for recognizing point-to-read text provided by the embodiment of the present application, the electronic device 100 starts to collect images in response to the first operation of the user, wherein the images collected by the electronic device 100 include the user's finger and the content of the book , the user's finger and the book are located in the target area of the electronic device; the electronic device 100 recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image; The position of the track determines the target text in the content of the book in the captured image; the electronic device 100 broadcasts the recognized target text. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience. In addition, the electronic device 100 can recognize characters in any book, and does not need to customize the book.
下面首先介绍本申请实施例提供的示例性电子设备100。The following first introduces the exemplary electronic device 100 provided by the embodiments of the present application.
图11是本申请实施例提供的电子设备100的结构示意图。FIG. 11 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
下面以电子设备100为例对实施例进行具体说明。应该理解的是,电子设备100可以具有比图中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。The embodiment will be described in detail below by taking the electronic device 100 as an example. It should be understood that the electronic device 100 may have more or fewer components than those shown in the figures, may combine two or more components, or may have different component configurations. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线2,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,以及显示屏194等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,磁传感器180D,加速度传感器180E,触摸传感器180K等。The electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 2, a wireless Communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194 and so on. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, a magnetic sensor 180D, an acceleration sensor 180E, a touch sensor 180K, and the like.
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be the nerve center and command center of the electronic device 100 . The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。The charging management module 140 is used to receive charging input from the charger. The charger may be a wireless charger or a wired charger.
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。The power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 . The power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
电子设备100的无线通信功能可以通过天线2,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 100 may be implemented by the antenna 2, the wireless communication module 160, a modem processor, a baseband processor, and the like.
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。Display screen 194 is used to display images, videos, and the like. Display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light). emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on. In some embodiments, the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。The ISP is used to process the data fed back by the camera 193 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193 .
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
视频编解码器用于对数字视频压缩或解压缩。Video codecs are used to compress or decompress digital video.
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文字理解等。The NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process the input information, and can continuously learn by itself. Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
内部存储器121可以包括一个或多个随机存取存储器(random access memory,RAM)和一个或多个非易失性存储器(non-volatile memory,NVM)。The internal memory 121 may include one or more random access memories (RAM) and one or more non-volatile memories (NVM).
随机存取存储器可以包括静态随机存储器(static random-access memory,SRAM)、动态随机存储器(dynamic random access memory,DRAM)、同步动态随机存储器(synchronous dynamic random access memory,SDRAM)、双倍资料率同步动态随机存取存储器(double data rate synchronous dynamic random access memory,DDR SDRAM,例如第五代DDR SDRAM一般称为DDR5 SDRAM)等;Random access memory can include static random-access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronization Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, such as fifth-generation DDR SDRAM is generally called DDR5 SDRAM), etc.;
非易失性存储器可以包括磁盘存储器件、快闪存储器(flash memory)。Non-volatile memory may include magnetic disk storage devices, flash memory.
快闪存储器按照运作原理划分可以包括NOR FLASH、NAND FLASH、3D NAND FLASH等,按照存储单元电位阶数划分可以包括单阶存储单元(single-level cell,SLC)、多阶存储单元(multi-level cell,MLC)、三阶储存单元(triple-level cell,TLC)、四阶储存单元(quad-level cell,QLC)等,按照存储规范划分可以包括通用闪存存储(英文:universal flash storage, UFS)、嵌入式多媒体存储卡(embedded multi media Card,eMMC)等。Flash memory can be divided into NOR FLASH, NAND FLASH, 3D NAND FLASH, etc. according to the operating principle, and can include single-level memory cells (single-level cells, SLC), multi-level memory cells (multi-level memory cells) according to the level of storage cell potential cell, MLC), triple-level cell (TLC), quad-level cell (QLC), etc., according to the storage specification can include universal flash storage (English: universal flash storage, UFS) , embedded multimedia memory card (embedded multi media Card, eMMC) and so on.
随机存取存储器可以由处理器110直接进行读写,可以用于存储操作系统或其他正在运行中的程序的可执行程序(例如机器指令),还可以用于存储用户及应用程序的数据等。The random access memory can be directly read and written by the processor 110, and can be used to store executable programs (eg, machine instructions) of an operating system or other running programs, and can also be used to store data of users and application programs.
非易失性存储器也可以存储可执行程序和存储用户及应用程序的数据等,可以提前加载到随机存取存储器中,用于处理器110直接进行读写。The non-volatile memory can also store executable programs and store data of user and application programs, etc., and can be loaded into the random access memory in advance for the processor 110 to directly read and write.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The receiver 170B, also referred to as "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be answered by placing the receiver 170B close to the human ear.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone jack 170D is used to connect wired earphones. The earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。The pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .
磁传感器180D包括霍尔传感器。The magnetic sensor 180D includes a Hall sensor.
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。Touch sensor 180K, also called "touch panel". The touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to touch operations may be provided through display screen 194 . In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
马达191可以产生振动提示。指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。Motor 191 can generate vibrating cues. The indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
图12是本申请实施例的电子设备100的软件结构框图。可以理解的是,图12仅为电子 设备100示例性的软件结构示意图。本申请实施例中的电子设备100的软件结构还可以是其他操作系统(例如,ISO操作系统,鸿蒙操作系统等等)提供的软件架构,此处不作限定。FIG. 12 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application. It can be understood that FIG. 12 is only a schematic diagram of an exemplary software structure of the electronic device 100. The software structure of the electronic device 100 in the embodiment of the present application may also be a software structure provided by other operating systems (eg, ISO operating system, Hongmeng operating system, etc.), which is not limited here.
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将系统分为四层,从上至下分别为应用程序层,应用程序框架层,运行时(Runtime)和系统库,以及内核层。The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces. In some embodiments, the system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, a runtime (Runtime) and a system library, and a kernel layer.
应用程序层可以包括一系列应用程序包。The application layer can include a series of application packages.
如图12所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,音乐,视频,短信息,点读等应用程序(也可以称为应用)。As shown in FIG. 12 , the application package may include camera, gallery, calendar, call, map, navigation, WLAN, music, video, short message, reading and other applications (also referred to as applications).
其中,该点读应用程序即是指可以实现本申请实施例提供的识别点读文字方法的应用程序。该应用程序的名称可以叫“点读”、也可以是“辅助学习”等等,此处对该应用程序的名称不作限定。The point-to-read application program refers to an application program that can implement the method for point-to-read text recognition provided by the embodiments of the present application. The name of the application program may be called "Reading" or "Assisted Learning", etc. The name of the application program is not limited here.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.
如图12所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。As shown in Figure 12, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。A window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。Content providers are used to store and retrieve data and make these data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications. A display interface can consist of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide the communication function of the electronic device 100 . For example, the management of call status (including connecting, hanging up, etc.).
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文字形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话界面形式出现在屏幕上的通知。例如在状态栏提示文字信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also display notifications in the status bar at the top of the system in the form of a graph or scroll bar text, such as notifications of applications running in the background, and can also display notifications on the screen in the form of a dialog interface. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
运行时(Runtime)包括核心库和虚拟机。Runtime负责系统的调度和管理。Runtime (Runtime) includes core libraries and virtual machines. Runtime is responsible for the scheduling and management of the system.
核心库包含两部分:一部分是编程语言(例如,jave语言)需要调用的功能函数,另一部分是系统的核心库。The core library consists of two parts: one part is the function functions that the programming language (for example, jave language) needs to call, and the other part is the core library of the system.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的编程文件(例如,jave文件)执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and the application framework layer run in virtual machines. The virtual machine executes application layer and application framework layer programming files (eg, jave files) as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),二维图形引擎(例如:SGL)等。A system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了二维 (2-Dimensional,2D)和三维(3-Dimensional,3D)图层的融合。The Surface Manager is used to manage the display subsystem and provides a fusion of two-dimensional (2-Dimensional, 2D) and three-dimensional (3-Dimensional, 3D) layers for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
三维图形处理库用于实现3D图形绘图,图像渲染,合成,和图层处理等。The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
2D图形引擎是2D绘图的绘图引擎。2D graphics engine is a drawing engine for 2D drawing.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动,虚拟卡驱动。The kernel layer is the layer between hardware and software. The kernel layer contains at least display drivers, camera drivers, audio drivers, sensor drivers, and virtual card drivers.
下面结合捕获拍照场景,示例性说明电子设备100软件以及硬件的工作流程。In the following, the workflow of the software and hardware of the electronic device 100 is exemplarily described in conjunction with the capturing and photographing scene.
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer. The application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon, for example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer. The camera 193 captures still images or video.
图13是本申请实施例提供的电子设备100的另一种示例性的结构示意图。FIG. 13 is another exemplary schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
如图13所示,电子设备100可以包括:处理器1201、摄像头1202、显示屏1203、扬声器1204和传感器1205。As shown in FIG. 13 , the electronic device 100 may include: a processor 1201 , a camera 1202 , a display screen 1203 , a speaker 1204 and a sensor 1205 .
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
摄像头1202用于采集图像。A camera 1202 is used to capture images.
处理器1201用于对摄像头1202采集的图像进行检测,并确定1202采集的图像中的手指的坐标。处理器1201根据摄像头1202采集的图像确定用户开始点读和结束点读的时间,处理器1201根据摄像图1202采集的图像确定用户指定的待识别文字。处理器1201还可以将识别出的文字转换成音频电信号,并将该音频电信号发送给扬声器1204。The processor 1201 is configured to detect the image captured by the camera 1202 and determine the coordinates of the finger in the image captured by the camera 1202 . The processor 1201 determines the time when the user starts reading and ends the reading according to the image captured by the camera 1202 , and determines the text to be recognized specified by the user according to the image captured by the camera 1202 . The processor 1201 can also convert the recognized text into an audio electrical signal, and send the audio electrical signal to the speaker 1204 .
显示屏1203可以显示摄像头1202采集的图像。显示屏1203中还可以显示“点读”APP的图标,以及显示提示文字。The display screen 1203 can display the image captured by the camera 1202 . The display screen 1203 may also display the icon of the "click to read" APP, and display prompt text.
扬声器1204可以接收处理器1201发送的音频电信号,将音频电信号转换为声音信号。电子设备100可以通过该扬声器1204播报出用户点读的文字。The speaker 1204 can receive the audio electrical signal sent by the processor 1201, and convert the audio electrical signal into a sound signal. The electronic device 100 can broadcast the text read by the user through the speaker 1204 .
传感器1205可以触摸传感器,该触摸传感器可以置于显示屏1203,由触摸传感器与显示屏1203组成触摸屏,也称“触控屏”。触摸传感器用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。The sensor 1205 can be a touch sensor, and the touch sensor can be placed on the display screen 1203, and the touch sensor and the display screen 1203 form a touch screen, also called a "touch screen". A touch sensor is used to detect touch operations on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present application.
上述实施例中所用,根据上下文,术语“当…时”可以被解释为意思是“如果…”或“在… 后”或“响应于确定…”或“响应于检测到…”。类似地,根据上下文,短语“在确定…时”或“如果检测到(所陈述的条件或事件)”可以被解释为意思是“如果确定…”或“响应于确定…”或“在检测到(所陈述的条件或事件)时”或“响应于检测到(所陈述的条件或事件)”。As used in the above embodiments, the term "when" may be interpreted to mean "if" or "after" or "in response to determining..." or "in response to detecting..." depending on the context. Similarly, depending on the context, the phrases "in determining..." or "if detecting (the stated condition or event)" can be interpreted to mean "if determining..." or "in response to determining..." or "on detecting (the stated condition or event)" or "in response to the detection of (the stated condition or event)".
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘)等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented. The process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium. When the program is executed , which may include the processes of the foregoing method embodiments. The aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Claims (11)

  1. 一种识别点读文字的方法,其特征在于,所述方法包括:A method for recognizing point-to-read characters, characterized in that the method comprises:
    响应于用户的第一操作,电子设备开始采集图像,其中,所述电子设备采集的图像包括用户的手指和书本的内容,所述用户的手指和书本位于所述电子设备的目标区域内;In response to the user's first operation, the electronic device starts to capture an image, wherein the image captured by the electronic device includes the user's finger and the content of the book, and the user's finger and the book are located within the target area of the electronic device;
    所述电子设备根据所述采集的图像识别出的所述用户的手指的位置移动来识别用户的点读手势;The electronic device recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image;
    所述电子设备根据所述点读手势和所述点读手势的轨迹的位置,确定所述采集的图像中所述书本的内容中的目标文字;The electronic device determines the target text in the content of the book in the collected image according to the pointing gesture and the position of the trajectory of the pointing gesture;
    所述电子设备播报已识别出的所述目标文字。The electronic device broadcasts the recognized target text.
  2. 根据权利要求1所述的方法,其特征在于,所述点读手势包括如下一项或多项:点、划线、画圈。The method according to claim 1, wherein the pointing gesture includes one or more of the following: dots, dashes, and circles.
  3. 根据权利要求2所述的方法,其特征在于,所述电子设备根据所述采集的图像识别出的所述用户的手指的位置移动来识别用户的点读手势,包括:The method according to claim 2, wherein the electronic device recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image, comprising:
    所述电子设备通过采集的图像检测到用户手指后,若在第一预设时长内检测到所述采集的图像中所述手指的位置移动小于所述第一预设距离,所述电子设备将所述第一位置记录为所述点读手势的起点;After the electronic device detects the user's finger through the collected image, if it is detected that the position of the finger in the collected image moves less than the first preset distance within the first preset time period, the electronic device will The first position is recorded as the starting point of the reading gesture;
    所述电子设备在所述开始记录所述点读手势的起点后,若检测到第二预设时长内所述采集的图像中所述手指的第二位置移动小于第二预设距离,所述电子设备将所述第二位置记录为所述点读手势的终点;After the electronic device starts recording the starting point of the pointing gesture, if it detects that the movement of the second position of the finger in the collected image within the second preset time period is less than the second preset distance, the electronic device will The electronic device records the second position as the end point of the pointing gesture;
    所述电子设备根据所述点读手势的起点和所述点读手势的终点来识别所述点读手势。The electronic device recognizes the pointing gesture according to a starting point of the pointing gesture and an end point of the pointing gesture.
  4. 根据权利要求3所述的方法,其特征在于,所述电子设备根据所述点读手势的起点和所述点读手势的终点来识别所述点读手势,包括:The method according to claim 3, wherein the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, comprising:
    若所述电子设备记录的所述起点与所述起点至所述终点之间的任一所述手指的位置的距离小于第三预设距离,则所述电子设备识别出所述点读手势为点;If the distance between the starting point recorded by the electronic device and the position of any one of the fingers between the starting point and the ending point is less than a third preset distance, the electronic device recognizes that the reading gesture is point;
    若所述电子设备记录的所述起点至所述终点之间的所述手指的位置的坐标线性相关,则所述电子设备识别出所述点读手势为划线;If the coordinates of the position of the finger between the starting point and the ending point recorded by the electronic device are linearly correlated, the electronic device recognizes that the pointing gesture is a dash;
    若所述电子设备记录的所述起点与所述终点的距离小于第四预设距离,且所述起点与所述起点至所述终点之间的所述手指的位置的距离大于第五预设距离,所述电子设备识别出所述点读手势为画圈。If the distance between the starting point and the ending point recorded by the electronic device is less than a fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than a fifth preset distance distance, the electronic device recognizes the pointing gesture as drawing a circle.
  5. 根据权利要求4所述的方法,其特征在于,所述电子设备根据所述点读手势和所述点读手势的轨迹的位置,确定所述采集的图像中所述书本的内容中的目标文字,包括:The method according to claim 4, wherein the electronic device determines the target text in the content of the book in the collected image according to the pointing gesture and the position of the trajectory of the pointing gesture ,include:
    所述电子设备根据所述采集的图像确定所述书本的内容中文字区域的位置;The electronic device determines the position of the text area in the content of the book according to the collected image;
    所述电子设备根据所述点读手势和所述点读手势的轨迹的位置,以及所述文字区域的位置,确定所述书本的内容中的目标文字。The electronic device determines the target text in the content of the book according to the pointing gesture and the position of the trajectory of the pointing gesture, and the position of the text area.
  6. 根据权利要求5所述的方法,其特征在于,所述电子设备根据所述点读手势和所述点读手势的轨迹的位置,以及所述文字区域的位置,确定所述书本的内容中的目标文字,包括:The method according to claim 5, wherein the electronic device determines the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area. Target text, including:
    所述电子设备根据所述点读手势的轨迹以及所述第一书本中文字区域的位置,确定第一文字区域,所述第一文字区域中包含第一轨迹,所述第一轨迹为所述点读手势的轨迹中的大于或等于预设比例的一部分;The electronic device determines a first text area according to the trajectory of the pointing gesture and the position of the text area in the first book, the first text area includes a first trajectory, and the first trajectory is the point reading A portion of the gesture's trajectory that is greater than or equal to the preset ratio;
    所述电子设备根据所述第一轨迹和所述点读手势以及所述第一文字区域,确定所述书本的内容中的目标文字。The electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture and the first text area.
  7. 根据权利要求6所述的方法,其特征在于,所述电子设备根据所述第一轨迹和所述点读手势以及所述第一文字区域,确定所述书本的内容中的目标文字,包括:The method according to claim 6, wherein the electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture and the first text area, comprising:
    若所述点读手势为点,所述电子设备确定所述第一文字区域中与所述第一轨迹距离最小的文字为所述目标文字;If the pointing gesture is a point, the electronic device determines that the text with the smallest distance from the first trajectory in the first text area is the target text;
    若所述点读势为划线,所述电子设备确定所述第一文字区域中在所述第一轨迹上方的文字为所述目标文字;If the point reading potential is a dashed line, the electronic device determines that the text above the first track in the first text area is the target text;
    若所述点读手势为画圈,所述电子设备确定所述第一文字区域中在所述第一轨迹中的文字为所述目标文字。If the pointing gesture is to draw a circle, the electronic device determines that the text in the first track in the first text area is the target text.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述第一预设时长和所述第二预设时长相等,所述第一预设距离和所述第二预设距离相等。The method according to any one of claims 1-7, wherein the first preset duration and the second preset duration are equal, and the first preset distance and the second preset distance equal.
  9. 一种电子设备,其特征在于,包括:摄像头、显示屏、一个或多个处理器和一个或多个存储器;所述一个或多个处理器与所述摄像头、所述一个或多个存储器以及所述显示屏耦合,所述一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,使得所述终端执行上述权利要求1-8中的任一项所述的识别点读文字的方法。An electronic device, comprising: a camera, a display screen, one or more processors, and one or more memories; the one or more processors and the camera, the one or more memories, and The display screen is coupled, and the one or more memories are used to store computer program code, the computer program code including computer instructions, when the one or more processors execute the computer instructions, cause the terminal to execute the above rights The method for recognizing point-read characters according to any one of requirements 1-8.
  10. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1-8中任一项所述的识别点读文字的方法。A computer-readable storage medium, characterized by comprising computer instructions, which, when the computer instructions are executed on an electronic device, cause the electronic device to perform the identification point reading according to any one of claims 1-8 text method.
  11. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-8中任一项所述的识别点读文字的方法。A computer program product, characterized in that, when the computer program product runs on a computer, the computer is made to execute the method for recognizing point-and-read characters according to any one of claims 1-8.
PCT/CN2022/081042 2021-03-19 2022-03-15 Method for recognizing touch-to-read text, and electronic device WO2022194180A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110298494.6 2021-03-19
CN202110298494.6A CN115116075A (en) 2021-03-19 2021-03-19 Method for recognizing click-to-read characters and electronic equipment

Publications (1)

Publication Number Publication Date
WO2022194180A1 true WO2022194180A1 (en) 2022-09-22

Family

ID=83321727

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081042 WO2022194180A1 (en) 2021-03-19 2022-03-15 Method for recognizing touch-to-read text, and electronic device

Country Status (2)

Country Link
CN (1) CN115116075A (en)
WO (1) WO2022194180A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909342B (en) * 2023-01-03 2023-05-23 湖北瑞云智联科技有限公司 Image mark recognition system and method based on contact movement track

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217197A (en) * 2014-08-27 2014-12-17 华南理工大学 Touch reading method and device based on visual gestures
CN104820560A (en) * 2015-05-21 2015-08-05 马兰英 Method for selecting character or image and computation equipment
WO2016113969A1 (en) * 2015-01-13 2016-07-21 三菱電機株式会社 Gesture recognition device and method, program, and recording medium
CN109255989A (en) * 2018-08-30 2019-01-22 广东小天才科技有限公司 A kind of intelligent point-reading method and point read equipment
CN111090343A (en) * 2019-06-09 2020-05-01 广东小天才科技有限公司 Method and device for identifying point-reading content in point-reading scene
CN111324201A (en) * 2020-01-20 2020-06-23 上海纸上绝知智能科技有限公司 Reading method, device and system based on somatosensory interaction
CN112016346A (en) * 2019-05-28 2020-12-01 阿里巴巴集团控股有限公司 Gesture recognition method, device and system and information processing method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217197A (en) * 2014-08-27 2014-12-17 华南理工大学 Touch reading method and device based on visual gestures
WO2016113969A1 (en) * 2015-01-13 2016-07-21 三菱電機株式会社 Gesture recognition device and method, program, and recording medium
CN104820560A (en) * 2015-05-21 2015-08-05 马兰英 Method for selecting character or image and computation equipment
CN105320437A (en) * 2015-05-21 2016-02-10 马兰英 Method of selecting character or image and computing device
CN109255989A (en) * 2018-08-30 2019-01-22 广东小天才科技有限公司 A kind of intelligent point-reading method and point read equipment
CN112016346A (en) * 2019-05-28 2020-12-01 阿里巴巴集团控股有限公司 Gesture recognition method, device and system and information processing method
CN111090343A (en) * 2019-06-09 2020-05-01 广东小天才科技有限公司 Method and device for identifying point-reading content in point-reading scene
CN111324201A (en) * 2020-01-20 2020-06-23 上海纸上绝知智能科技有限公司 Reading method, device and system based on somatosensory interaction

Also Published As

Publication number Publication date
CN115116075A (en) 2022-09-27

Similar Documents

Publication Publication Date Title
RU2766255C1 (en) Voice control method and electronic device
US9286895B2 (en) Method and apparatus for processing multiple inputs
WO2021032097A1 (en) Air gesture interaction method and electronic device
WO2021110133A1 (en) Control operation method and electronic device
CN113132526B (en) Page drawing method and related device
CN114816167B (en) Application icon display method, electronic device and readable storage medium
WO2023051511A1 (en) Icon moving method, related graphical interface, and electronic device
CN113536866A (en) Character tracking display method and electronic equipment
EP4216563A1 (en) Photographing method and electronic device
WO2022194180A1 (en) Method for recognizing touch-to-read text, and electronic device
CN114371985A (en) Automated testing method, electronic device, and storage medium
CN115113751A (en) Method and device for adjusting numerical range of recognition parameter of touch gesture
WO2023066165A1 (en) Animation effect display method and electronic device
WO2022095983A1 (en) Gesture misrecognition prevention method, and electronic device
CN114022570B (en) Method for calibrating external parameters between cameras and electronic equipment
WO2022222688A1 (en) Window control method and device
WO2022002213A1 (en) Translation result display method and apparatus, and electronic device
CN114691002B (en) Page sliding processing method and related device
US10901520B1 (en) Content capture experiences driven by multi-modal user inputs
WO2023222097A1 (en) Text recognition method and related apparatus
WO2022166550A1 (en) Data transmission method and electronic device
WO2022143891A1 (en) Focal point synchronization method and electronic device
WO2022143335A1 (en) Dynamic effect processing method and related apparatus
WO2022143094A1 (en) Window page interaction method and apparatus, electronic device, and readable storage medium
WO2023236908A1 (en) Image description method, electronic device and computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22770533

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22770533

Country of ref document: EP

Kind code of ref document: A1