WO2022194180A1 - Procédé de reconnaissance de texte à lecture tactile, et dispositif électronique - Google Patents
Procédé de reconnaissance de texte à lecture tactile, et dispositif électronique Download PDFInfo
- Publication number
- WO2022194180A1 WO2022194180A1 PCT/CN2022/081042 CN2022081042W WO2022194180A1 WO 2022194180 A1 WO2022194180 A1 WO 2022194180A1 CN 2022081042 W CN2022081042 W CN 2022081042W WO 2022194180 A1 WO2022194180 A1 WO 2022194180A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- electronic device
- finger
- text
- gesture
- user
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000004044 response Effects 0.000 claims abstract description 14
- 238000004590 computer program Methods 0.000 claims abstract description 13
- 230000033001 locomotion Effects 0.000 claims abstract description 13
- 230000015654 memory Effects 0.000 claims description 31
- 230000002596 correlated effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 description 31
- 238000010586 diagram Methods 0.000 description 31
- 238000004458 analytical method Methods 0.000 description 21
- 241000282326 Felis catus Species 0.000 description 16
- 230000006870 function Effects 0.000 description 16
- 230000000875 corresponding effect Effects 0.000 description 14
- 238000007726 management method Methods 0.000 description 13
- 238000012545 processing Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 11
- 230000005236 sound signal Effects 0.000 description 10
- 210000004027 cell Anatomy 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 229920001621 AMOLED Polymers 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000002096 quantum dot Substances 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000000284 resting effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013529 biological neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 210000000352 storage cell Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Definitions
- the present application relates to the field of terminal artificial intelligence (Artificial Intelligence, AI) and the field of character recognition, and in particular, to a method and electronic device for recognizing point-and-read characters.
- AI Artificial Intelligence
- electronic devices with a reading function such as a reading pen, a tablet computer, a robot, etc.
- a reading pen can be used to assist users in reading picture books.
- the reading pen can only recognize the text in a specific picture book.
- Some electronic devices such as tablet computers and robots can only recognize the text in the electronic picture book. This limits the user's learning resources.
- the accuracy of recognizing the text that the user needs to read according to the user's gesture is not high.
- the present application provides a method and an electronic device for recognizing point-and-read characters. Through the method for recognizing point-and-read characters, the electronic device can more accurately recognize the characters specified by a user in a book.
- the present application provides a method for recognizing point-and-read characters, the method may include: the electronic device 100 starts to capture an image in response to a first operation of the user, wherein the image captured by the electronic device 100 includes the user The user's finger and the book are located in the target area of the electronic device; the electronic device 100 recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image; the electronic device 100 The position of the trajectory of the gesture and the reading gesture determines the target text in the content of the book in the captured image; the electronic device 100 broadcasts the recognized target text.
- the electronic device can accurately determine the target text from the image collected by the electronic device in combination with the user's gesture. Therefore, the electronic device can accurately identify the target character. In this way, user experience can be improved.
- the point-to-read gesture includes one or more of the following: dots, dashes, and circles.
- the electronic device recognizes the user's pointing gesture according to the position movement of the user's finger recognized by the collected image, including: after the electronic device detects the user's finger through the collected image, if the first It is detected that the position of the finger in the collected image moves less than the first preset distance within the preset time period, and the electronic device records the first position as the starting point of the pointing gesture; after the electronic device starts recording the starting point of the pointing gesture, if it detects The movement of the second position of the finger in the image collected within the second preset duration is less than the second preset distance, and the electronic device records the second position as the end point of the reading gesture; End point to recognize the read gesture.
- the electronic device 100 may start recording the coordinates of the finger at the starting point from the starting point of the pointing gesture, and finish recording the coordinates of the finger at the end point when the pointing gesture ends. In this way, the electronic device can more accurately determine the starting point and the ending point of the user's reading gesture trajectory.
- the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, including: if the electronic device records the starting point and the position of any finger between the starting point and the ending point If the distance is less than the third preset distance, the electronic device recognizes that the reading gesture is a point; if the coordinates of the position of the finger between the starting point and the end point recorded by the electronic device are linearly correlated, the electronic device recognizes that the reading gesture is a dash; If the distance between the starting point and the ending point recorded by the electronic device is less than the fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than the fifth preset distance, the electronic device recognizes that the reading gesture is a circle.
- the electronic device can accurately determine the specific type of the point-to-read gesture.
- the electronic device determines the target text in the content of the book in the collected image according to the position of the reading gesture and the trajectory of the reading gesture, including: the electronic device determines the content of the book according to the collected image Chinese The position of the word area; the electronic device determines the target text in the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area.
- the electronic device can more accurately determine the target text, that is, the text that the user needs to recognize and broadcast in the selected book.
- the electronic device determines the target text in the content of the book according to the position of the reading gesture and the trajectory of the reading gesture, and the position of the text area, including: the electronic device according to the trajectory of the reading gesture and The position of the text area in the first book determines the first text area, the first text area includes a first trajectory, and the first trajectory is a part of the trajectory of the pointing gesture that is greater than or equal to a preset ratio; the electronic device is based on the first trajectory and The point reading gesture and the first text area are used to determine the target text in the content of the book.
- the electronic device determines that the user needs to recognize the text in the first text region only when most of the trajectory of the user's reading gesture falls within the first text region. In this way, when a part of the user's gesture track falls in the first text area and a part falls in the second text area, the electronic device can also correctly determine the target text.
- the electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture, and the first text area, including: if the pointing gesture is a point, the electronic device determines that the first text area is the same as the one in the first text area. The text with the smallest distance from the first track is the target text; if the reading potential is a dash, the electronic device determines that the text above the first track in the first text area is the target text; if the reading gesture is a circle, the electronic device determines the first text The text in the first track in the region is the target text.
- the electronic device adopts different strategies to determine the target text, which can improve the accuracy of the electronic device in determining the target text in the captured image. Therefore, the accuracy of identifying the target character by the electronic device can be improved.
- the first preset distance is equal to the second preset distance
- the second preset duration is equal to the first preset duration
- the present application provides a method for recognizing point-to-read characters, the method may include: in response to a first operation, the electronic device collects an image of the first book; When the distance between the coordinates of the finger and the coordinates of the finger in the image frame collected before the first preset time is less than the first preset distance, the electronic device starts to record the coordinates of the finger in the image frame; when the image collected by the electronic device at the second moment When the distance between the coordinates of the finger in the frame and the coordinates of the finger in the image frame collected before the second preset duration is less than the second preset distance, the electronic device stops recording the coordinates of the finger in the image frame, and the second time is after the first time time; the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the first time to the second time; the electronic device recognizes and broadcasts the text to be recognized.
- the text to be recognized may be referred to as a target text, that is, the text specified by the user in the book to be recognized.
- the coordinates of the two times when the finger of the electronic device is stationary are used as the starting point and the ending point of the user's reading track, respectively.
- the electronic device can accurately determine the starting position when the user clicks to read. Therefore, the electronic device can accurately determine the characters to be recognized according to the track coordinates between two times of the finger resting. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience.
- the electronic device can recognize the characters in any book, and there is no need to customize the book.
- the method further includes: when the electronic device detects the finger in the image frame collected at the third moment, start to obtain the coordinates of the finger in the collected image frame, and the third moment is: Moments before the first moment.
- the electronic device may take the moment when the finger appears in the image as the start of the user's reading, and only when the user is clicking, can the coordinates of the finger in the image frame be collected. In this way, it is avoided that the electronic device still performs subsequent steps of point reading when the user's finger is not detected. Therefore, the calculation amount of the electronic device can be reduced, and the power consumption of the electronic device can be saved.
- the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the first time to the second time, which specifically includes: the electronic device records the text from the first time to the second time The coordinates of the finger determine the pointing gesture of the finger; the electronic device determines the position of the text area in the first book according to the image of the first book; the electronic device determines the coordinates of the finger, the pointing gesture and the first The position of the text area in the book determines the text to be recognized in the first book.
- the electronic device can more accurately determine the target text, that is, the text that the user needs to recognize and broadcast in the selected book.
- the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first moment to the second moment, specifically including: if the electronic device determines the coordinates of the finger recorded from the first moment to the second moment The distance between any one of the coordinates and the other coordinates is less than the third preset distance, then the electronic device determines that the pointing gesture is the first gesture; if the electronic device determines that the coordinates of the fingers recorded from the first moment to the second moment are linearly related, then the electronic device The device determines that the reading gesture is the second gesture; if the electronic device determines that the distance between the coordinates of the finger recorded at the first moment and the coordinates of the finger recorded at the second moment among the coordinates of the fingers recorded from the first moment to the second moment is smaller than the fourth The preset distance, the distance between the coordinates of the finger recorded at the fourth moment and the coordinates of the finger recorded at the first moment is greater than the fifth preset distance, then the electronic device determines that the pointing gesture is the third
- the electronic device can accurately determine the specific type of the point-to-read gesture.
- the electronic device determines the text to be recognized in the first book according to the coordinates of the finger recorded from the second time to the third time, the pointing gesture, and the position of the text area in the first book, which specifically includes :
- the electronic device connects the coordinates of the fingers recorded from the first moment to the second moment to obtain the first finger trajectory according to the recording sequence;
- the electronic device determines the first text area according to the first finger trajectory and the position of the text area in the first book.
- the area includes a second finger trajectory, and the second finger trajectory is a part of the first finger trajectory that is greater than or equal to a preset ratio; the electronic device determines the text to be recognized according to the second finger trajectory, the finger gesture, and the first text area.
- the electronic device determines that the user needs to recognize the text in the first text region only when most of the trajectory of the user's reading gesture falls within the first text region. In this way, when a part of the user's gesture track falls in the first text area and a part falls in the second text area, the electronic device can also correctly determine the target text.
- the electronic device determines the text to be recognized according to the trajectory of the second finger, the gesture of the finger, and the first text area, which specifically includes: if the gesture of the finger is the first gesture, the electronic device determines the difference between the first text area and the second text area. The text with the smallest finger track distance is the text to be recognized; if the finger gesture is the second gesture, the electronic device determines that the text above the second finger track in the first text area is the text to be recognized; if the finger gesture is the third gesture, the electronic device determines that the text above the second finger track is the text to be recognized. It is determined that the text in the track of the second finger in the first text area is the text to be recognized.
- the first gesture is a dot
- the second gesture is a line
- the third gesture is a circle.
- the electronic device adopts different strategies to determine the target text, which can improve the accuracy of the electronic device in determining the target text in the captured image. Therefore, the accuracy of identifying the target character by the electronic device can be improved.
- the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first time to the second time, which specifically includes: when the electronic device does not detect the finger in the image collected at the fifth time , the electronic device determines the pointing gesture of the finger according to the coordinates of the finger recorded from the first moment to the second moment.
- the electronic device can determine the end time of the user's reading.
- the electronic device may stop performing the pointing step (eg, determine the coordinates of the finger in the image) until the user starts pointing again. In this way, the power consumption of the electrons can be saved.
- the electronic device determines that the conditions under which the finger is still in the images collected by the user at different times are the same, which can reduce the calculation amount of the electronic device.
- an electronic device comprising: one or more processors and a memory; the memory is coupled to the one or more processors, the memory is used to store computer program code, the computer program code includes computer instructions,
- the one or more processors invoke computer instructions to cause the electronic device to execute the method of recognizing point-and-click text in any possible implementation manner of the first aspect and any possible implementation manner of the second aspect.
- the embodiments of the present application provide a computer storage medium, including computer instructions, when the computer instructions are run on an electronic device, the electronic device is made to perform the identification point reading in any of the possible implementations of any of the above aspects text method.
- an embodiment of the present application provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to execute the method for recognizing point-and-read characters in any of the possible implementations of any one of the above aspects .
- FIG. 1A is a schematic diagram of an application scenario of a robot that can be used for point reading provided by an embodiment of the present application;
- 1B is a schematic diagram of another application scenario of the robot that can be used for point reading provided by an embodiment of the present application;
- 1C is a schematic diagram of another application scenario of the robot that can be used for point reading provided by an embodiment of the present application;
- FIG. 2 is a schematic flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application
- 3A-3D are schematic diagrams of a set of user interfaces of the electronic device 100 provided by the embodiments of the present application.
- FIG. 3E is a schematic diagram of collecting a picture book image by the electronic device 100 provided by the embodiment of the present application.
- 3F is a schematic diagram of content area division of a picture book provided by an embodiment of the present application.
- FIG. 4A is a schematic diagram of an image frame when the electronic device 100 according to the embodiment of the present application collects a frame of images when a user clicks and reads;
- FIG. 4B is a schematic diagram of finger detection performed on the collected image frame by the electronic device 100 provided by the embodiment of the present application;
- FIG. 5 is a schematic diagram of a group of image frames when the electronic device 100 according to an embodiment of the present application collects a user's point reading;
- FIG. 6 is a schematic diagram of a trajectory when an electronic user clicks and reads provided by an embodiment of the present application
- FIGS. 7A-7C are schematic diagrams of polar coordinate diagrams corresponding to different finger trajectories when a user points and reads according to an embodiment of the present application;
- FIG. 8 is a schematic diagram of the combination of the analysis result of the picture book layout and the trajectory of the user's finger provided by an embodiment of the present application;
- 9A-9B are schematic diagrams of a group of text detection provided by an embodiment of the present application.
- FIG. 10 is a schematic flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application
- FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 12 is a schematic diagram of a software architecture of an electronic device provided by an embodiment of the present application.
- FIG. 13 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
- first and second are only used for descriptive purposes, and should not be construed as implying or implying relative importance or implying the number of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise specified, the “multiple” The meaning is two or more.
- "point reading” may mean that the electronic device can recognize and play the text specified by the user in the picture book by voice.
- the user specifies the text in the picture book. For example, the user's finger points below the text to be recognized in the picture book, or the user's finger draws a line below the text to be recognized in the picture book, and the user's finger draws a circle to delineate the text to be recognized, and so on.
- the user may use a finger to point under the text (ie, “cat”) to be recognized in the picture book 102 .
- the robot 100 may include a camera 103 and a camera 104 .
- the picture book 102 is placed in the photographing field of view area 101 of the camera 103 and the camera 104 .
- the camera 103 and/or the camera 104 of the robot 100 may capture the image of the user in the picture book 102 where the user's finger is below the word "cat”.
- the robot 100 can recognize the character "cat” and play the character "cat”.
- the user can use a finger to draw a line (or cross) under the text (ie, “cat and mouse”) to be recognized in the picture book 102 .
- the robot 100 can recognize the text “cat and mouse” and play the text "cat and mouse”.
- the user may draw a circle with a finger in the picture book 102 to delineate the desired identification text (ie, "cat and mouse”).
- the robot 100 can recognize the text “cat and mouse” and play the text "cat and mouse”.
- the gestures of the user when reading can be divided into: “dot”, “line”, “circle” and so on. It can be understood that the categories and names of gestures in the embodiments of the present application are not limited.
- the electronic device may be the robot 100 shown in FIG. 1A , FIG. 1B and FIG. 1C , or may be a terminal device with a camera such as a tablet computer, a smart phone, or the like, and the electronic device may be a camera and a camera.
- a point reading device composed of a terminal with a character recognition function is not limited in this embodiment of the present application.
- An embodiment of the present application provides a method for recognizing point-to-read text, the method may include: the electronic device 100 can continuously collect images of the first picture book; when a finger appears in the image collected by the electronic device, the electronic device determines that the user starts point-to-read The electronic device analyzes the image of the first picture book to obtain a text analysis result; the electronic device determines the trajectory and gesture of the user according to the coordinates of the fingers in the collected multi-frame images, and the electronic device The reading gesture and the text analysis result determine the text area Q containing the text to be recognized; the electronic device recognizes and voice broadcasts the text to be recognized in the text area Q.
- FIG. 2 exemplarily shows a flowchart of a method for recognizing point-to-read characters provided by an embodiment of the present application.
- a method for recognizing point-and-read characters provided by the present application may include the following steps:
- the camera of the electronic device 100 starts to capture the image of the book B.
- the electronic device 100 may receive the user's first operation.
- the first operation of the user may be to turn on the electronic device 100 or to turn on the reading APP in the electronic device 100 .
- the electronic device 100 may start capturing images by the camera of the electronic device 100 in response to the user's first operation.
- the electronic device 100 continuously collects multiple frames of images.
- the electronic device 100 may be the robot 100 shown in FIGS. 3A-3E
- the book B may be the book 102 shown in FIG. 3E .
- the embodiment of the present application does not limit the book B. That is, in the method provided by the embodiment of the present application, the electronic device 100 can recognize the text in any book that the user clicks to read.
- the electronic device 100 may be the robot 100 shown in FIG. 3A .
- the electronic device 100 may include a camera 103 and a camera 104 and a display screen 105 .
- the icon 106 of the Dot-Read APP can be displayed on the display screen 105 .
- the user's first operation may be to click on the icon 106 of the "click to read" APP.
- the camera 103 and the camera 104 of the robot 100 start to capture images.
- the display screen 105 of the robot 100 can display the book display area 1051 and the prompt text 1052 .
- the book display area 1051 can display the images collected by the camera 103 and the camera 104 .
- the prompt text 1052 may prompt the user to place the book to be learned in the shooting area of the camera 103 and the camera 104 .
- the content of the prompt text 1052 may be "please put the book in the shooting area", and the specific content of the prompt text 1052 is not limited here.
- the prompt text 1052 may be displayed in the book display area 1051, or may be displayed outside the book display area 1051, and the specific position of the prompt text 1052 is not limited here.
- the display screen 105 of the robot 100 may further include a control 1053 .
- the control 1053 is used to trigger the robot 100 to perform layout analysis and finger detection on the collected images.
- the image 1021 of the book 102 may be displayed in the book display area 1051 of the display screen 105 .
- the user can adjust the position of the book according to the image 1021 in the display area 1051 . For example, if the user sees that only the right half of the book 102 is displayed in the display area 1051 , the user can move the book to the right so that the book 102 moves into the shooting field 101 of the camera 103 and the camera 104 . After the user sees the complete image of the book 102 in the display area 1051, the user can click on the control 1053.
- the electronic device may display a gesture available for reading text on the display screen 105 .
- the user may click below the text to be recognized, may also draw a line below the text to be recognized, or may draw a circle to delineate the text to be recognized. In this way, the user can be prompted to use a gesture recognizable by the electronic device 100 to read.
- the electronic device 100 performs layout analysis on the image of the book B, and determines the type of the book content in the book B and the position corresponding to the book content.
- the electronic device 100 may perform layout analysis on a frame of image of the book B collected by the camera, and obtain the position of the text area in the current page of the book B on the book page.
- the electronic device 100 may store a layout analysis model, the electronic device 100 inputs a frame of image into the layout analysis model, and the model can output the type of book content (text, drawing, table, etc.) contained in the image and the book content corresponding location.
- the layout analysis model the electronic device 100 can determine that the image of the book B may include one or more of a text area, a drawing area, and a table area.
- the text area may refer to an area that only contains text in a frame of image.
- the drawing area can refer to the area of an image that contains drawing.
- the table area may refer to an area of a frame image that contains a table. It will be appreciated that the plot area and table area may also contain text.
- One frame of image of the book B collected by the electronic device 100 may include one or more text regions, and/or drawing regions, and/or table regions.
- the image 1021 of the book 102 may include area A, area B, and area C.
- Area A and Area C are drawing areas, and area B is text area.
- the area A may be a rectangular area with vertices A1(xa1, ya1, za1), A2(xa2, ya2, za2), A3(xa3, ya3, za3), A4(xa4, ya4, za4).
- Region B may be a rectangular region with vertices B1 (xb1, yb1, zb1), B2 (xb2, yb2, zb2), B3 (xb3, yb3, zb3), B4 (xb4, yb4, zb4).
- Region C may be a rectangular region with vertices C1(xc1, yc1, zc1), C2(xc2, yc2, zc2), C3(xc3, yc3, zc3), C4(xc4, yc4, zc4).
- the shape of the text area and the drawing area obtained by the electronic device 100 by performing layout analysis on the image of the book B is not limited to a rectangle, and may also be other shapes, such as polygons, circles, and the like.
- the electronic device 100 may use the upper left vertex of the photographing field of view of the electronic device 100 as the origin to establish a coordinate system.
- the origin O of the coordinate system XYZ is the upper left vertex of the camera field 101 of the robot.
- the electronic device 100 may input the image 1021 of the book 102 shown in FIG. 3F into the layout analysis model for layout analysis, and obtain the type of content contained in the image 1021 and the location of the content as shown in Table 1 below.
- the electronic device 100 performs layout analysis on the image 1021, and can determine the drawings and characters contained in the image 1021, as well as the positions of the drawings and characters.
- the contents included in the area A and the area C in the image 1021 are drawings, and the contents included in the area B are characters.
- the coordinates of the four vertices of the area A are (xa1, ya1, za1), (xa2, ya2, za2), (xa3, ya3, za3), (xa4, ya4, za4), respectively.
- the area A in Table 1 may be the area A shown in FIG. 3F .
- the coordinates of the four vertices of the region B are (xb1, yb1, zb1), (xb2, yb2, zb2), (xb3, yb3, zb3), (xb4, yb4, zb4) respectively.
- Region A in Table 1 may be Region B shown in FIG. 3F .
- the coordinates of the four vertices of the region C are (xc1, yc1, zc1), (xc2, yc2, zc2), (xc3, yc3, zc3), (xc4, yc4, zc4) respectively.
- Region C in Table 1 may be Region C shown in FIG. 3F .
- the electronic device 100 can also detect the text and the position of the text in the drawing area when performing layout analysis.
- the electronic device 100 may detect the inclination angle of the text in the text area.
- the electronic device 100 detects a finger in the image collected at time T10, and starts to determine the coordinates of the finger in the collected image.
- the image captured by the electronic device 100 may include the user's finger.
- the electronic device 100 can detect the finger in the captured image.
- the electronic device 100 may store a finger detection model, and the electronic device 100 inputs the collected image into the finger detection model, and the finger detection model can determine whether the input image contains a finger or does not contain a finger.
- the electronic device 100 inputs the image 401 in FIG. 4A into the finger detection model, and the finger detection model can output the image 402 as shown in FIG. 4B .
- the finger detection model can label the detected fingers with the finger detection box 4022 .
- the finger detection model can also label fingertips 4021.
- the electronic device 100 continuously collects multiple frames of images, and the electronic device 100 can sequentially input each frame of images collected into the finger detection model for finger detection.
- the device can begin to determine the coordinates of the finger in the frame of image.
- the electronic device 100 may take the coordinates of the fingertip as the coordinates of the finger. If the electronic device 100 does not detect a finger in one frame of image, the electronic device 100 can detect whether the next frame of image (or an image frame collected after a preset time interval) contains a finger. Until the electronic device 100 detects the finger in the image collected at time T10, it starts to determine the coordinates of the finger in the collected image.
- (a) to (i) of FIG. 5 are exemplary image frames captured by the electronic device 100 at time t1-t9.
- the image frames captured from the time t1 to the time t9 can show the complete process of the user clicking and reading the character "cat" to be recognized. First, the user drops his finger and clicks on the text "cat" in the picture book, then moves his finger away from the picture book.
- (a) of FIG. 5 is an image frame captured by the electronic device 100 at time t1.
- (b) The picture shows the image frame captured by the electronic device at time t2. No finger is detected in the image frames captured by the electronic device 100 at time t1 and time t2.
- the figure is an image frame captured by the electronic device 100 at time t3, and the electronic device 100 can detect a finger in the image frame at time t3.
- the electronic device 100 starts to acquire the coordinates of the finger in the image frame.
- the picture shows the image frame captured by the electronic device at time t4.
- the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t4.
- the picture shows the image frame captured by the electronic device at time t5.
- the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t5.
- the picture shows the image frame captured by the electronic device at time t6.
- the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t6.
- the picture shows the image frame captured by the electronic device at time t7.
- the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t7.
- the picture shows the image frame captured by the electronic device at time t8.
- the electronic device 100 may acquire the coordinates of the finger in the image frame captured at time t8.
- the picture shows the image frame captured by the electronic device at time t9. No finger is detected in the image frame captured by the electronic device 100 at time t9.
- the electronic device 100 acquires an image frame at time t1.
- the electronic device 100 performs finger detection on the image frame captured at time t1, and no finger is detected.
- the electronic device 100 does not execute the steps after step S203.
- the electronic device 100 continues to perform finger detection on the next frame of image.
- the electronic device 100 may perform finger detection on the image captured at time t2. If no finger is detected, the electronic device 100 performs finger detection on the image captured at time t3.
- the electronic device 100 detects a finger in the image frame captured at time t3.
- the electronic device 100 can determine the coordinates of the finger in the image frame captured at time t3. Specifically, it may be the coordinates of the fingertip.
- Time T10 may be time t3 shown in FIG. 5 .
- the image frames captured at time t1 to the image frames captured at time t9 shown in FIG. 5 may be consecutive image frames captured by the electronic device 100 .
- the interval is related to the frame rate at which the electronic device 100 captures images.
- the image frames captured at time t1 to the image frames captured at time t9 shown in FIG. 5 may also be image frames captured by the electronic device 100 at preset time intervals. That is, the time interval between time t1 and time t2 may be a preset time interval, the time interval between time t2 and time t3 may be a preset time interval, and the time interval between time t3 and time t4 may be a preset time interval interval, the time interval between the time t4 and the time t5 can be the preset time interval, the time interval between the time t5 and the time t6 can be the preset time interval, and the time interval between the time t6 and the time t7 can be the preset time interval.
- the time interval, the time interval between time t7 and time t8 may be a preset time interval, and the time interval between time t8 and time t9 may be a preset time interval.
- the preset time interval may be configured by the system of the electronic device 100 .
- the electronic device 100 can sequentially perform finger detection on each frame of images collected. In this way, the finger in the image frame can be detected in time, so that the time when the user starts to read can be accurately determined.
- the electronic device 100 may also perform finger detection on images collected at preset time intervals. In this way, the power of the electronic device can be saved.
- the electronic device 100 when the electronic device 100 detects that a finger appears in the image frame, and the vertical distance between the finger in the image frame and the book B is reduced to the preset vertical distance D01, the electronic device 100 starts to acquire the coordinates of the finger in the captured image frame.
- the electronic device 100 when the electronic device 100 detects that a finger appears in the image frame, and the vertical distance between the finger in the image frame and the book B gradually decreases, the electronic device 100 starts to acquire the finger in the captured image frame coordinate of.
- the electronic device 100 may record the coordinates of the finger in the image collected at time T11 and use the coordinates as the starting point of the user's reading track.
- the electronic device 100 may determine that the user's finger is stationary at time T11 state.
- the preset duration T21 may be 0.5 seconds, may be 1 second, or may be 2 seconds, which is not limited here.
- the preset distance D1 may be 10 pixels, 5 pixels, or 15 pixels, and the specific value of the preset distance D1 is not limited in this embodiment of the present application.
- the preset duration T21 and the preset distance D1 may be configured by the system of the electronic device 100 .
- the user When a general user needs the electronic device 100 to assist in learning the character to be recognized, the user will point his finger on the character to be recognized. Hold for a while before moving your finger. For example, as shown in (d) of Fig. 5, the user points his finger on the text "cat" for a period of time (for example, 0.5 seconds, 1 second, not limited here), and then starts from (d) Move from the position of the middle finger to the position of the finger in the figure (e).
- a period of time for example, 0.5 seconds, 1 second, not limited here
- the electronic device 100 determines the time T11 The finger is in a stationary state in the captured image.
- the coordinate point of the finger in the image collected at time T11 is the starting point of the user's reading track. That is, the user starts to point the finger on the text to be recognized, and starts to select the text to be recognized.
- the electronic device 100 records the coordinates of the finger in the image frame.
- the electronic device 100 detects that there is a finger in the image frame, and then acquires the coordinates of the finger.
- the electronic device 100 may temporarily store the coordinates in the memory, and after the coordinates of the finger in the image frame are used to calculate the distance from the coordinates of the finger in the next frame of image, the electronic device releases the stored coordinates of the finger in the image frame.
- the electronic device 100 may record the coordinates of the finger in the image frame captured at time T11.
- the coordinates of the finger in the image frame captured at time T11 can be recorded in the memory for recording the point reading track.
- the electronic device 100 After the electronic device 100 calculates the distance between the coordinates of the finger in the image frame captured at time T11 and the coordinates of the finger in the image frame after the preset duration T21, the electronic device 100 still records the coordinates of the finger in the image frame captured at time T11 in the image frame used for recording. Record the point read trace in memory.
- S205 The distance between the coordinates of the finger in the image collected by the electronic device 100 at time T12 and the coordinates of the finger in the image frame collected before the preset duration T22 is less than the preset distance D2, and stops recording the coordinates of the finger in the collected image.
- the electronic device 100 detects that the user's finger is in a stationary state again. That is, the electronic device 100 determines that the distance of the coordinates of the fingers in the image frames collected before the preset duration T22 of the coordinates of the fingers in the images collected at time T12 is smaller than the preset distance D2.
- the electronic device 100 stops recording the coordinates of the finger in the captured image. That is, the electronic device 100 takes the coordinates of the finger in the image collected at time T12 as the coordinates of the end point of the user's reading track.
- T22 may be greater than T21, and may also be less than or equal to T21, which is not limited here.
- D2 may be greater than D1, and may also be less than or equal to D1, which is not limited here.
- the preset duration T22 and the preset distance D2 may be configured by the system of the electronic device 100 .
- the distance between the coordinates of the finger in the image frame captured at time t7 in (g) in FIG. 5 and the coordinates of the finger in the image frame captured at time t6 in (f) is smaller than the predetermined distance. Assuming the distance D2, the electronic device 100 stops saving the coordinates of the finger in the captured image.
- the electronic device 100 saves the coordinates of the fingers in the images collected between time T11 and time T12. That is, the trajectory coordinates of the user's finger in the picture book to read at one time.
- the coordinate trajectory of the finger in the image collected between time T11 and time T12 may be shown as line segment P3P4 in FIG. 6 .
- the electronic device 100 may store the coordinates of the points between the line segments P3P4.
- the line segment P3P4 is the finger trajectory of the user's finger in the picture book, and the finger 807 trajectory is used to select the text to be recognized in the picture book.
- the electronic device 100 determines the gesture G to be read by the finger according to the coordinates of the finger saved from time T11 to time T12 .
- the electronic device 100 may determine that the user has selected the text to be recognized.
- the text to be recognized is the text selected by the finger from time T11 to time T12.
- the electronic device 100 may determine the text to be recognized according to the coordinates of the fingers in the image frames captured from the time T11 to the time T12, the gestures clicked by the user, and the layout analysis result.
- the electronic device 100 may determine the gesture G when the user clicks according to the coordinates of the fingers in the multiple image frames captured from the time T11 to the time T12.
- the distance between the finger coordinates in the two image frames before and after the time T11 to the time T12 is smaller than the preset distance D10
- the distance between the finger coordinates in the image frame captured at the time T11 and the finger coordinates in the image frame captured at the time T12 is smaller than the preset distance D10
- the electronic device 100 determines that the pointing gesture of the finger is "point". D10 is less than or equal to D11.
- the electronic device 100 may It is determined that the pointing gesture of the finger is "drawing a line".
- the electronic device 100 may determine that the pointing finger of the finger is "drawing a circle”.
- the electronic device 100 performs convex hull fitting on the finger coordinate points in the image frames captured from time T11 to time T12, and converts the sampling points into polar coordinate points after uniform sampling to obtain a polar coordinate map.
- the electronic device 100 inputs the polar coordinate graph into the gesture recognition model, and after the gesture recognition model recognizes the polar coordinate graph, it outputs the gesture type corresponding to the polar coordinate graph.
- old(x,y) is the coordinates of the finger determined from the image frames collected by the electronic device 100 from time T11 to time T12
- New(x,y) is the coordinates of the finger after convex hull fitting is performed on the coordinates of the finger.
- the electronic device 100 can determine the center point M(xm, ym) in the sampling points:
- the electronic device 100 can calculate the coordinates of the relative position of each convex hull fitting point relative to the center point as:
- the electronic device 100 can convert the convex hull fitting into polar coordinates, where the origin of the polar coordinates is the center point calculated by the above formula 2, and the electronic device 100 can determine each convex hull according to the relative position of the convex hull fitting point and the center point.
- the polar coordinates of the fitting point of the package can be referred to the following formula:
- the electronic device 100 can convert the sampling points into polar coordinate points according to Formula 4 and Formula 5, and then save the plurality of polar coordinate points as a polar coordinate map.
- the electronic device 100 inputs the polar coordinate graph into the gesture recognition model, and can obtain the gesture type corresponding to the polar coordinate graph.
- FIG. 7A exemplarily shows a polar diagram.
- the coordinates of the finger in the polar coordinate diagram can be sequentially connected according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and a closed curve can be formed.
- the electronic device 100 may input the polar coordinate diagram shown in FIG. 7A into the gesture recognition model, and the gesture recognition model may output the gesture type of the finger corresponding to the polar coordinate diagram.
- the gesture type is "circle".
- FIG. 7B exemplarily shows another polar diagram.
- the coordinates of the finger in the polar coordinate graph may be sequentially stored in the polar coordinate graph according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and the coordinate points of a plurality of fingers are concentrated in a certain area.
- the electronic device 100 may input the polar coordinate diagram shown in FIG. 7B into the gesture recognition model, and the gesture recognition model may output the gesture type of the finger corresponding to the polar coordinate diagram.
- the gesture type is "point".
- FIG. 7C exemplarily shows yet another polar plot.
- the coordinates of the finger in the polar coordinate diagram can be sequentially connected according to the time sequence in which the electronic device 100 obtains the coordinates of the finger, and a polyline can be formed.
- the electronic device 100 can input the polar coordinate diagram shown in FIG. 7C into the gesture recognition model, and the gesture recognition model can output the gesture type of the finger corresponding to the polar coordinate diagram.
- the gesture type is "Draw".
- the electronic device determines the text region Q to be recognized according to the coordinates of the finger, the gesture G, and the positions of the multiple text regions and the multiple text regions in the book B stored from time T11 to time T12 .
- the electronic device can determine the text area Q to be recognized according to the coordinates of the fingers recorded from time T11 to time T12, that is, the trajectory of the user's finger in the picture book, and the result of the gesture result layout analysis.
- the electronic device 100 uses the text area where the track recorded from time T11 to time T12 is located as the text area to be recognized.
- the electronic device 100 takes the text with the smallest distance from the coordinates of the finger stored from time T11 to time T12 in the text area to be recognized as the text to be recognized specified by the user.
- the electronic device 100 uses the text region intersecting with the track recorded from time T11 to time T12 as the text region Q to be recognized. It can be understood that, the intersection of the track and the text area may be that all the track is within the text area, or a preset proportion of the track is within the text area (for example, half of the track is within the text area A).
- the electronic device 100 may take the characters above the track in the character area Q as the characters to be recognized designated by the user.
- the electronic device may use the text region that overlaps with the track recorded from time T11 to time T12 as the text region Q to be recognized.
- the electronic device may use the text in the track recorded from time T11 to time T12 in the text area Q as the text to be recognized selected by the user.
- the user may set point reading that only recognizes the text area in the electronic device 100 . That is, the electronic device 100 determines that the track formed by the coordinates of the fingers stored from time T11 to time T12 is in the text area of the picture book, and the electronic device 100 determines the text area Q to be recognized, and executes step S208. When the electronic device 100 determines that the track formed by the coordinates of the fingers saved by the user from time T11 to time T12 is in the drawing area or table area of the picture book, the electronic device 100 does not execute step S208 .
- FIG. 8 exemplarily shows a picture book 800 .
- the picture book 800 may include a drawing area 801 , a table area 802 , a text area 803 and a text area 804 .
- the track formed by the coordinates of the finger stored by the electronic device 100 from time T11 to time T12 may be finger track 807 or finger track 809 in FIG. 8 .
- the electronic device 100 may determine that the finger trace 807 is in the drawing area 801 , or the finger trace 809 is in the table area 802 .
- the electronic device 100 may prompt the user on the display screen that the current point-to-read area does not conform to the set point-to-read area .
- the electronic device does not perform step S208.
- the electronic device 100 may determine the text region to be recognized according to the finger track and the text region, and execute step S208.
- the electronic device 100 can recognize and broadcast the text to be recognized selected by the user.
- the electronic device 100 performs layout analysis, the position information of the characters contained in the drawing area can be obtained. In this way, the electronic device 100 can determine the text to be recognized selected by the user according to the user's finger trajectory and the position information of the text in the drawing area. Therefore, the electronic device 100 can recognize and broadcast the text to be recognized selected by the user.
- the electronic device 100 can detect whether there is a finger in the captured image, and if a finger is detected, step S203 is performed. If the electronic device 100 does not detect a finger in the captured image within a preset time, the electronic device 100 may close the "point reading" APP. Alternatively, the electronic device 100 may enter a standby state. In this way, the power of the electronic device 100 can be saved and the power consumption can be reduced.
- the electronic device 100 determines that the part of the trajectory formed by the coordinates of the fingers saved from time T11 to time T12 that falls within the text area is greater than a preset threshold, the electronic device determines the text area to be recognized Q, and Step S208 is executed. Otherwise, the electronic device 100 does not perform the determination of the to-be-recognized text area Q and step S208.
- the preset threshold may be 50%, or 55%, 60%, etc., which is not limited here. For example, as shown in the finger track 808 shown in FIG. 8 , about 20% of the finger track falls in the text area. If the preset threshold is 50%, the electronic device 100 does not perform determining the text area Q to be recognized and step S208 .
- the electronic device 100 can determine that the part of the finger track 805 or the finger track 806 that falls within the text area is larger than the predetermined track. Set the threshold. The electronic device can determine the text region to be recognized according to the finger track and the text region.
- the electronic device 100 may divide the two or more tracks into the track range.
- the larger trajectory and the character area determined by the character area are used as the character area Q to be recognized.
- the electronic device 100 takes the character area determined by the finger track 806 and the character area 803 as the final character area Q to be recognized.
- the electronic device 100 may determine the text region Q to be recognized according to the coordinates of the finger, the gesture G, and the positions of the multiple text regions and the multiple text regions in the book B saved from time T11 to time T12 .
- the electronic device 100 can delineate the text to be detected in the to-be-recognized area Q through the text detection frame.
- FIG. 9A may include a text area 900 to be recognized.
- the electronic device 100 can determine that the character to be recognized is the character “cat” delineated by the character detection frame 902 according to the coordinates of the finger.
- the character detection frame will be moved according to the offset S0, and the electronic device 100 takes the character enclosed by the character detection frame as the character to be recognized in the character area Q to be recognized.
- the inclination angle ⁇ of the characters in the character area in the picture book can be obtained.
- the electronic device 100 may acquire the inclination angle ⁇ of the finger when detecting the finger.
- the electronic device can obtain the angle ⁇ between the finger and the text in the text area to be recognized according to the inclination angle ⁇ and the inclination angle ⁇ .
- the text detection frame 903 in FIG. 9B is the text detection frame obtained after the electronic device 100 moves the text detection frame 902 in FIG. 9A according to the offset.
- the text detection box 903 determines that the circled text "sum" is the text to be recognized. In this way, the electronic device 100 can more accurately determine the text to be recognized specified by the user.
- the electronic device 100 can multiply the offset S0 by the offset coefficient ⁇ to obtain the offset S1, the electronic device moves the text detection frame according to the offset S1, and the electronic device 100 uses the text enclosed by the text detection frame as the text to be Recognize the text to be recognized in the text area Q.
- the offset coefficient ⁇ may be configured by the system of the electronic device 100 .
- the value range of the offset coefficient ⁇ may be [0.2, 2].
- the electronic device 100 may record the included angle between the finger and the text during point reading within a preset time period, and the offset corresponding to the angle.
- the electronic device 100 may establish a mapping relationship between the angle between the finger and the text and the offset. In this way, after the electronic device 100 determines the angle between the finger and the character, a mapping relationship can be established according to the angle between the finger and the character and the offset to find the offset corresponding to the angle. In this way, the calculation amount of the electronic device can be reduced.
- the electronic device 100 when the electronic device 100 determines that the vertical distance from the finger in the captured image frame to the book B is greater than the preset vertical distance D11, the electronic device 100 saves the data according to the time T11 to the time T12.
- the coordinates of the finger determine the gesture G that the finger reads.
- the electronic device 100 determines the finger point reading according to the coordinates of the finger saved from time T11 to time T12 gesture G.
- the electronic device 100 recognizes and broadcasts the text in the text region Q to be recognized.
- the electronic device 100 can recognize the text detected in the text region Q to be recognized. After the electronic device recognizes the text, it broadcasts the text and voice. For example, as shown in FIG. 1A , the electronic device 100 broadcasts the text "cat" designated by the user.
- characters specified by the user in the embodiments of the present application include, but are not limited to, characters in different forms such as Chinese characters, Japanese, Korean, and English.
- step S202 can be executed after step S203 and before step S207.
- the electronic device 100 can continuously collect images of the first picture book; when a finger appears in the image collected by the electronic device, the electronic device determines that the user starts point-reading; the electronic device The image of the first picture book is analyzed to obtain a text analysis result; when the distance between the coordinates of the finger in the image frame currently collected by the electronic device and the coordinates of the finger in the image frame collected before the preset duration is less than the preset distance, the electronic device determines the image frame The finger in the middle is stationary, and the electronic device can record the track coordinates between the two stationary fingers.
- the electronic device may determine the to-be-recognized character area Q and the to-be-recognized characters in the to-be-recognized character area Q according to the track coordinates between the two stationary periods.
- the electronic device recognizes and broadcasts the text to be recognized.
- the coordinates of the electronic device 100 when the finger is stationary twice serve as the starting point and the ending point of the user's reading track, respectively.
- the electronic device 100 can accurately determine the starting position when the user clicks. Therefore, the electronic device 100 can accurately determine the character to be recognized according to the track coordinates between two times of the finger resting. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience.
- the electronic device 100 can recognize characters in any book, and does not need to customize the book.
- FIG. 10 exemplarily shows a flowchart of another method for recognizing point-to-read characters provided by an embodiment of the present application.
- a method for recognizing point-and-read characters provided by the present application may include the following steps:
- the electronic device 100 starts to capture an image, wherein the image captured by the electronic device 100 includes the user's finger and the content of the book, and the user's finger and the book are located in the target area of the electronic device.
- the camera of the electronic device 00 may continuously capture images.
- step S1002 before the electronic device performs step S1002, the foregoing step S202 may also be performed.
- the electronic device 100 recognizes the pointing gesture of the user according to the position movement of the user's finger recognized by the collected image.
- the electronic device 100 can identify the user's finger in the captured image, and can determine the position of the user's finger in the frame of image.
- the electronic device 100 may recognize the user's pointing gesture according to the finger positions in the collected multi-frame images.
- the point-to-read gesture includes one or more of the following: dots, dashes, and circles.
- the electronic device recognizes the user's pointing gesture according to the position movement of the user's finger recognized by the collected image, including: after the electronic device detects the user's finger through the collected image, if the first It is detected that the position of the finger in the collected image moves less than the first preset distance within the preset time period, and the electronic device records the first position as the starting point of the pointing gesture; after the electronic device starts recording the starting point of the pointing gesture, if it detects The movement of the second position of the finger in the image collected within the second preset duration is less than the second preset distance, and the electronic device records the second position as the end point of the reading gesture; End point to recognize the read gesture.
- the electronic device 100 may start recording the coordinates of the finger at the starting point from the starting point of the pointing gesture, and finish recording the coordinates of the finger at the end point when the pointing gesture ends.
- the electronic device recognizes the pointing gesture according to the starting point of the pointing gesture and the end point of the pointing gesture, including: if the starting point recorded by the electronic device is between the starting point and the ending point If the distance of the position of any finger is less than the third preset distance, the electronic device recognizes that the point reading gesture is a point; if the coordinates of the position of the finger between the starting point recorded by the electronic device and the end point are linearly related, then the electronic device recognizes The gesture of reading out the point is a line; if the distance between the starting point and the ending point recorded by the electronic device is less than the fourth preset distance, and the distance between the starting point and the position of the finger between the starting point and the ending point is greater than the fifth preset distance, the electronic device recognizes The gesture of reading out point is to draw a circle.
- the gesture G in the above steps S201 to S208 may also be referred to as a pointing gesture.
- step S1002 reference may be made to the descriptions in the foregoing steps S203 to S206, which are not repeated here.
- the electronic device 100 determines the target text in the content of the book in the captured image according to the pointing gesture and the position of the trajectory of the pointing gesture.
- the electronic device 100 can click the gesture and the position of the trajectory of the gesture in the image, and can determine the target text in the content of the book.
- the target text is the text selected by the user to be recognized in the book.
- the electronic device determines the target text in the content of the book in the collected image according to the position of the reading gesture and the trajectory of the reading gesture, including: the electronic device determines the content of the book according to the collected image Chinese The position of the word area; the electronic device determines the target text in the content of the book according to the position of the pointing gesture and the trajectory of the pointing gesture, and the position of the text area.
- the electronic device determines the position of the text area in the content of the book according to the collected image, that is, the electronic device performs layout analysis on the collected image, and then analyzes the position of the text area in the book.
- the electronic device determines the position of the text area in the content of the book according to the collected image, that is, the electronic device performs layout analysis on the collected image, and then analyzes the position of the text area in the book.
- the electronic device determines the target text in the content of the book according to the position of the reading gesture and the trajectory of the reading gesture, and the position of the text area, including: the electronic device according to the trajectory of the reading gesture and The position of the text area in the first book determines the first text area, the first text area includes a first trajectory, and the first trajectory is a part of the trajectory of the pointing gesture that is greater than or equal to a preset ratio; the electronic device is based on the first trajectory and The point reading gesture and the first text area are used to determine the target text in the content of the book.
- the electronic device determines the target text in the content of the book according to the first trajectory, the pointing gesture, and the first text area, including: if the pointing gesture is a point, the electronic device determines that the first text area is the same as the one in the first text area. The text with the smallest distance from the first track is the target text; if the reading potential is a dash, the electronic device determines that the text above the first track in the first text area is the target text; if the reading gesture is a circle, the electronic device determines the first text The text in the first track in the region is the target text.
- step S1003 may refer to the description in step S207, which will not be repeated here.
- the electronic device 100 broadcasts the recognized target text.
- the electronic device 100 can recognize the target text. After the electronic device recognizes the target text, it broadcasts the text and voice. For example, as shown in FIG. 1A , the electronic device 100 broadcasts the text "cat" designated by the user.
- the text in the to-be-recognized text area Q in step S208 may be referred to as a target text.
- the electronic device 100 can recognize the text in any book designated by the user.
- the characters specified by the user include but are not limited to characters in different forms such as Chinese characters, Japanese, Korean, and English.
- the electronic device 100 starts to collect images in response to the first operation of the user, wherein the images collected by the electronic device 100 include the user's finger and the content of the book , the user's finger and the book are located in the target area of the electronic device; the electronic device 100 recognizes the user's reading gesture according to the position movement of the user's finger recognized by the collected image; The position of the track determines the target text in the content of the book in the captured image; the electronic device 100 broadcasts the recognized target text. In this way, the reading accuracy rate of the electronic device can be improved, thereby improving user experience. In addition, the electronic device 100 can recognize characters in any book, and does not need to customize the book.
- FIG. 11 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
- the electronic device 100 may have more or fewer components than those shown in the figures, may combine two or more components, or may have different component configurations.
- the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
- the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 2, a wireless Communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194 and so on.
- the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, a magnetic sensor 180D, an acceleration sensor 180E, a touch sensor 180K, and the like.
- the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
- the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
- the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
- application processor application processor, AP
- modem processor graphics processor
- graphics processor graphics processor
- ISP image signal processor
- controller memory
- video codec digital signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- the controller may be the nerve center and command center of the electronic device 100 .
- the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
- a memory may also be provided in the processor 110 for storing instructions and data.
- the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
- the processor 110 may include one or more interfaces.
- the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
- I2C integrated circuit
- I2S integrated circuit built-in audio
- PCM pulse code modulation
- PCM pulse code modulation
- UART universal asynchronous transceiver
- MIPI mobile industry processor interface
- GPIO general-purpose input/output
- SIM subscriber identity module
- USB universal serial bus
- the charging management module 140 is used to receive charging input from the charger.
- the charger may be a wireless charger or a wired charger.
- the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
- the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
- the wireless communication function of the electronic device 100 may be implemented by the antenna 2, the wireless communication module 160, a modem processor, a baseband processor, and the like.
- the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
- the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
- Display screen 194 is used to display images, videos, and the like.
- Display screen 194 includes a display panel.
- the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
- LED diode AMOLED
- flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
- the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
- the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
- the ISP is used to process the data fed back by the camera 193 .
- the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
- ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
- ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
- the ISP may be provided in the camera 193 .
- Camera 193 is used to capture still images or video.
- the object is projected through the lens to generate an optical image onto the photosensitive element.
- the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
- CMOS complementary metal-oxide-semiconductor
- the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
- the ISP outputs the digital image signal to the DSP for processing.
- DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
- the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
- a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
- Video codecs are used to compress or decompress digital video.
- the NPU is a neural-network (NN) computing processor.
- NN neural-network
- Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
- the internal memory 121 may include one or more random access memories (RAM) and one or more non-volatile memories (NVM).
- RAM random access memories
- NVM non-volatile memories
- Random access memory can include static random-access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronization Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, such as fifth-generation DDR SDRAM is generally called DDR5 SDRAM), etc.;
- SRAM static random-access memory
- DRAM dynamic random access memory
- SDRAM synchronous dynamic random access memory
- DDR SDRAM double data rate synchronous dynamic random access memory
- DDR SDRAM double data rate synchronous dynamic random access memory
- DDR SDRAM double data rate synchronous dynamic random access memory
- fifth-generation DDR SDRAM is generally called DDR5 SDRAM
- Non-volatile memory may include magnetic disk storage devices, flash memory.
- Flash memory can be divided into NOR FLASH, NAND FLASH, 3D NAND FLASH, etc. according to the operating principle, and can include single-level memory cells (single-level cells, SLC), multi-level memory cells (multi-level memory cells) according to the level of storage cell potential cell, MLC), triple-level cell (TLC), quad-level cell (QLC), etc., according to the storage specification can include universal flash storage (English: universal flash storage, UFS) , embedded multimedia memory card (embedded multi media Card, eMMC) and so on.
- SLC single-level memory cells
- multi-level memory cells multi-level memory cells
- MLC multi-level memory cells
- TLC triple-level cell
- QLC quad-level cell
- UFS universal flash storage
- eMMC embedded multimedia memory card
- the random access memory can be directly read and written by the processor 110, and can be used to store executable programs (eg, machine instructions) of an operating system or other running programs, and can also be used to store data of users and application programs.
- executable programs eg, machine instructions
- the random access memory can be directly read and written by the processor 110, and can be used to store executable programs (eg, machine instructions) of an operating system or other running programs, and can also be used to store data of users and application programs.
- the non-volatile memory can also store executable programs and store data of user and application programs, etc., and can be loaded into the random access memory in advance for the processor 110 to directly read and write.
- the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
- the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
- Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
- the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
- the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
- the voice can be answered by placing the receiver 170B close to the human ear.
- the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
- the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
- the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
- the earphone jack 170D is used to connect wired earphones.
- the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
- OMTP open mobile terminal platform
- CTIA cellular telecommunications industry association of the USA
- the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
- the gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .
- the magnetic sensor 180D includes a Hall sensor.
- the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
- Touch sensor 180K also called “touch panel”.
- the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
- the touch sensor 180K is used to detect a touch operation on or near it.
- the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
- Visual output related to touch operations may be provided through display screen 194 .
- the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
- the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
- the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
- the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
- FIG. 12 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application. It can be understood that FIG. 12 is only a schematic diagram of an exemplary software structure of the electronic device 100.
- the software structure of the electronic device 100 in the embodiment of the present application may also be a software structure provided by other operating systems (eg, ISO operating system, Hongmeng operating system, etc.), which is not limited here.
- the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
- the system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, a runtime (Runtime) and a system library, and a kernel layer.
- the application layer can include a series of application packages.
- the application package may include camera, gallery, calendar, call, map, navigation, WLAN, music, video, short message, reading and other applications (also referred to as applications).
- the point-to-read application program refers to an application program that can implement the method for point-to-read text recognition provided by the embodiments of the present application.
- the name of the application program may be called "Reading” or “Assisted Learning”, etc.
- the name of the application program is not limited here.
- the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
- the application framework layer includes some predefined functions.
- the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
- a window manager is used to manage window programs.
- the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
- Content providers are used to store and retrieve data and make these data accessible to applications.
- the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
- the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
- a display interface can consist of one or more views.
- the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
- the phone manager is used to provide the communication function of the electronic device 100 .
- the management of call status including connecting, hanging up, etc.).
- the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
- the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
- the notification manager can also display notifications in the status bar at the top of the system in the form of a graph or scroll bar text, such as notifications of applications running in the background, and can also display notifications on the screen in the form of a dialog interface. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
- Runtime includes core libraries and virtual machines. Runtime is responsible for the scheduling and management of the system.
- the core library consists of two parts: one part is the function functions that the programming language (for example, jave language) needs to call, and the other part is the core library of the system.
- the application layer and the application framework layer run in virtual machines.
- the virtual machine executes application layer and application framework layer programming files (eg, jave files) as binary files.
- the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
- a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
- surface manager surface manager
- media library Media Libraries
- 3D graphics processing library eg: OpenGL ES
- 2D graphics engine eg: SGL
- the Surface Manager is used to manage the display subsystem and provides a fusion of two-dimensional (2-Dimensional, 2D) and three-dimensional (3-Dimensional, 3D) layers for multiple applications.
- the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
- the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
- 2D graphics engine is a drawing engine for 2D drawing.
- the kernel layer is the layer between hardware and software.
- the kernel layer contains at least display drivers, camera drivers, audio drivers, sensor drivers, and virtual card drivers.
- a corresponding hardware interrupt is sent to the kernel layer.
- the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer.
- the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon, for example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
- the camera 193 captures still images or video.
- FIG. 13 is another exemplary schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
- the electronic device 100 may include: a processor 1201 , a camera 1202 , a display screen 1203 , a speaker 1204 and a sensor 1205 .
- the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
- the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
- the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
- a camera 1202 is used to capture images.
- the processor 1201 is configured to detect the image captured by the camera 1202 and determine the coordinates of the finger in the image captured by the camera 1202 .
- the processor 1201 determines the time when the user starts reading and ends the reading according to the image captured by the camera 1202 , and determines the text to be recognized specified by the user according to the image captured by the camera 1202 .
- the processor 1201 can also convert the recognized text into an audio electrical signal, and send the audio electrical signal to the speaker 1204 .
- the display screen 1203 can display the image captured by the camera 1202 .
- the display screen 1203 may also display the icon of the "click to read” APP, and display prompt text.
- the speaker 1204 can receive the audio electrical signal sent by the processor 1201, and convert the audio electrical signal into a sound signal.
- the electronic device 100 can broadcast the text read by the user through the speaker 1204 .
- the sensor 1205 can be a touch sensor, and the touch sensor can be placed on the display screen 1203, and the touch sensor and the display screen 1203 form a touch screen, also called a "touch screen".
- a touch sensor is used to detect touch operations on or near it.
- the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
- the term “when” may be interpreted to mean “if” or “after” or “in response to determining" or “in response to detecting" depending on the context.
- the phrases “in determining" or “if detecting (the stated condition or event)” can be interpreted to mean “if determining" or “in response to determining" or “on detecting (the stated condition or event)” or “in response to the detection of (the stated condition or event)”.
- the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
- software it can be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
- the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
- the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media.
- the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.
- the process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium.
- the program When the program is executed , which may include the processes of the foregoing method embodiments.
- the aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
L'invention concerne un procédé de reconnaissance de texte à lecture tactile. Le procédé de reconnaissance de texte à lecture tactile comprend les étapes dans lesquelles : en réponse à une première opération d'un utilisateur, un dispositif électronique commence à collecter une image ; le dispositif électronique reconnaît un geste de lecture tactile de l'utilisateur en fonction du mouvement de position d'un doigt de l'utilisateur qui est reconnu dans l'image collectée ; le dispositif électronique détermine un texte cible dans le contenu d'un livre dans l'image collectée selon le geste de lecture tactile et la position d'une trajectoire du geste de lecture tactile ; et le dispositif électronique diffuse le texte cible reconnu. L'invention concerne en outre un dispositif électronique, un support de stockage lisible par ordinateur et un produit programme informatique.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110298494.6A CN115116075A (zh) | 2021-03-19 | 2021-03-19 | 一种识别点读文字的方法及电子设备 |
CN202110298494.6 | 2021-03-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022194180A1 true WO2022194180A1 (fr) | 2022-09-22 |
Family
ID=83321727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/081042 WO2022194180A1 (fr) | 2021-03-19 | 2022-03-15 | Procédé de reconnaissance de texte à lecture tactile, et dispositif électronique |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115116075A (fr) |
WO (1) | WO2022194180A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115909342B (zh) * | 2023-01-03 | 2023-05-23 | 湖北瑞云智联科技有限公司 | 基于触点运动轨迹的图像标记识别系统及方法 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217197A (zh) * | 2014-08-27 | 2014-12-17 | 华南理工大学 | 一种基于视觉手势的点读方法和装置 |
CN104820560A (zh) * | 2015-05-21 | 2015-08-05 | 马兰英 | 选择字符或图像的方法及计算设备 |
WO2016113969A1 (fr) * | 2015-01-13 | 2016-07-21 | 三菱電機株式会社 | Dispositif et procédé de reconnaissance de geste ainsi que programme et support d'enregistrement |
CN109255989A (zh) * | 2018-08-30 | 2019-01-22 | 广东小天才科技有限公司 | 一种智能点读方法及点读设备 |
CN111090343A (zh) * | 2019-06-09 | 2020-05-01 | 广东小天才科技有限公司 | 在点读场景下识别点读内容的方法及装置 |
CN111324201A (zh) * | 2020-01-20 | 2020-06-23 | 上海纸上绝知智能科技有限公司 | 基于体感交互的阅读方法以及装置、系统 |
CN112016346A (zh) * | 2019-05-28 | 2020-12-01 | 阿里巴巴集团控股有限公司 | 手势的识别方法、装置、系统以及信息的处理方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8566044B2 (en) * | 2009-03-16 | 2013-10-22 | Apple Inc. | Event recognition |
CN104978010A (zh) * | 2014-04-03 | 2015-10-14 | 冠捷投资有限公司 | 三维空间手写轨迹取得方法 |
-
2021
- 2021-03-19 CN CN202110298494.6A patent/CN115116075A/zh active Pending
-
2022
- 2022-03-15 WO PCT/CN2022/081042 patent/WO2022194180A1/fr active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217197A (zh) * | 2014-08-27 | 2014-12-17 | 华南理工大学 | 一种基于视觉手势的点读方法和装置 |
WO2016113969A1 (fr) * | 2015-01-13 | 2016-07-21 | 三菱電機株式会社 | Dispositif et procédé de reconnaissance de geste ainsi que programme et support d'enregistrement |
CN104820560A (zh) * | 2015-05-21 | 2015-08-05 | 马兰英 | 选择字符或图像的方法及计算设备 |
CN105320437A (zh) * | 2015-05-21 | 2016-02-10 | 马兰英 | 选择字符或图像的方法及计算设备 |
CN109255989A (zh) * | 2018-08-30 | 2019-01-22 | 广东小天才科技有限公司 | 一种智能点读方法及点读设备 |
CN112016346A (zh) * | 2019-05-28 | 2020-12-01 | 阿里巴巴集团控股有限公司 | 手势的识别方法、装置、系统以及信息的处理方法 |
CN111090343A (zh) * | 2019-06-09 | 2020-05-01 | 广东小天才科技有限公司 | 在点读场景下识别点读内容的方法及装置 |
CN111324201A (zh) * | 2020-01-20 | 2020-06-23 | 上海纸上绝知智能科技有限公司 | 基于体感交互的阅读方法以及装置、系统 |
Also Published As
Publication number | Publication date |
---|---|
CN115116075A (zh) | 2022-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7142783B2 (ja) | 音声制御方法及び電子装置 | |
WO2021032097A1 (fr) | Procédé d'interaction par commande gestuelle dans l'air et dispositif électronique associé | |
US9286895B2 (en) | Method and apparatus for processing multiple inputs | |
WO2021110133A1 (fr) | Procédé d'opération de commande et dispositif électronique | |
CN113132526B (zh) | 一种页面绘制方法及相关装置 | |
CN114816167B (zh) | 应用图标的显示方法、电子设备及可读存储介质 | |
CN113536866A (zh) | 一种人物追踪显示方法和电子设备 | |
WO2023051511A1 (fr) | Procédé de déplacement d'icône, interface graphique associée et dispositif électronique | |
CN114371985A (zh) | 自动化测试方法、电子设备及存储介质 | |
WO2022194180A1 (fr) | Procédé de reconnaissance de texte à lecture tactile, et dispositif électronique | |
WO2022002213A1 (fr) | Procédé et appareil d'affichage de résultat de traduction, et dispositif électronique | |
CN115113751A (zh) | 调整触摸手势的识别参数的数值范围的方法和装置 | |
WO2023066177A1 (fr) | Procédé d'affichage d'effet d'animation et dispositif électronique | |
WO2023066165A1 (fr) | Procédé d'affichage d'effet d'animation et dispositif électronique | |
WO2022095983A1 (fr) | Procédé de prévention de fausse reconnaissance de geste et dispositif électronique | |
US20240194182A1 (en) | Text reading method and device | |
WO2022222688A1 (fr) | Procédé et dispositif de commande de fenêtre | |
WO2022143335A1 (fr) | Procédé de traitement d'effet dynamique et appareil associé | |
CN114691002B (zh) | 一种页面滑动的处理方法及相关装置 | |
CN114022570A (zh) | 相机间外参的标定方法及电子设备 | |
US10901520B1 (en) | Content capture experiences driven by multi-modal user inputs | |
CN115691486A (zh) | 语音指令执行方法、电子设备及介质 | |
WO2022166550A1 (fr) | Procédé de transmission de données et dispositif électronique | |
EP4425451A1 (fr) | Procédé de reconnaissance de texte et appareil associé | |
WO2022143891A1 (fr) | Procédé de synchronisation de point focal et dispositif électronique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22770533 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22770533 Country of ref document: EP Kind code of ref document: A1 |