WO2021018110A1 - 翻译笔及其控制方法 - Google Patents

翻译笔及其控制方法 Download PDF

Info

Publication number
WO2021018110A1
WO2021018110A1 PCT/CN2020/105026 CN2020105026W WO2021018110A1 WO 2021018110 A1 WO2021018110 A1 WO 2021018110A1 CN 2020105026 W CN2020105026 W CN 2020105026W WO 2021018110 A1 WO2021018110 A1 WO 2021018110A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
translated
image
pen
processor
Prior art date
Application number
PCT/CN2020/105026
Other languages
English (en)
French (fr)
Inventor
赵骥伯
姜幸群
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/423,413 priority Critical patent/US20220076042A1/en
Publication of WO2021018110A1 publication Critical patent/WO2021018110A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/17Image acquisition using hand-held instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/235Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on user input or interaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/142Image acquisition using hand-held instruments; Constructional details of the instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • G06V30/1456Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on user interactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure relates to the technical field of translation devices, in particular to a translation pen and a control method thereof.
  • Scanning translation pen scans the printed text to the text recognition system in the pen through scanning technology, so that the text recognition system can recognize the scanned text, and then translate the recognized text through the translation software in the pen.
  • the scanning translation pen has a pen tip similar to the appearance of a brush, and the pen tip is attached to the page and brushed along the line direction of the printed text on the page to scan the printed text for translation.
  • the above scanning translation pen scans line by line in the process of scanning text, and there will be occlusions for the text to be scanned on the same line, which affects the reader's perception.
  • a scanning method cannot perform a good scanning volume. To be sure, it is easy to cause too much scan volume to cause inaccurate translation results.
  • a translation pen which includes a pen body, a pointing component, an image collector, and a first processor.
  • the pen body includes a pen tip end.
  • the indicating component is arranged at the tip end of the pen body.
  • the image collector is arranged on the pen body; the image collector is configured to collect an image of the text to be translated according to the position indicated by the indicating component, and send the collected image of the text to be translated.
  • the first processor is arranged in the pen body, and the first processor is electrically connected to the image collector; the first processor is configured to receive the text to be translated sent by the image collector To identify the text to be translated in the image of the text to be translated.
  • the pen body further has a pen tail end opposite to the pen tip end.
  • the image collector is arranged on a side outside of the pen body, and along the direction from the tip end of the pen body to the end of the pen, the image collector is far away from the tip end of the pen relative to the indicating member.
  • the indicating component is arranged within the viewing angle range of the image collector.
  • the indicating member can be used to paint colors on external objects.
  • the indicating member has a tip.
  • the first processor includes: a detection component electrically connected to the image collector, and a locking component electrically connected to the detection component.
  • the detection component is configured to detect the image of the text to be translated to form at least one text box, and the text box contains part of the text in the image of the text to be translated.
  • the locking component is configured to lock a text box that meets the set requirements in at least one of the formed text boxes, and use the text in the locked text box as the text to be translated.
  • the setting requirement is that along the column direction of the text, the row where the text box is located is the row closest to the indicating component; and along the row direction of the text, the center line of the text box is distanced from the waiting The center line of the image of the translated text is closest.
  • the setting requirement is that along the column direction of the text, the row where the text box is located is the row closest to the indicating component ; And at least part of the area in the text box is painted.
  • the first processor further includes: an adaptive word segmentation component electrically connected to the locking component.
  • the adaptive word segmentation component is configured to calculate a threshold value of the separation distance between adjacent letters in the text to be translated according to the size of the locked text to be translated, and to segment the locked text to be translated according to the threshold .
  • the translation pen further includes: a second processor disposed in the pen body, and the second processor is electrically connected to the first processor.
  • the second processor is configured to receive the text to be translated recognized by the first processor, translate the text to be translated, and generate and send a translation result.
  • an observation window is provided on the pen body.
  • the translation pen further includes: a display screen arranged in the observation window of the pen body, and the display screen is electrically connected with the second processor.
  • the display screen is configured to receive the translation result sent by the second processor, and display the translation result.
  • a control method applied to the above translation pen including: the image collector of the translation pen collects an image of the text to be translated according to the position indicated by the indicating part of the translation pen; The first processor recognizes the text to be translated in the image of the text to be translated.
  • the first processor recognizing the text to be translated in the image of the text to be translated includes: detecting the image of the text to be translated to form at least one text box.
  • the text box contains part of the text in the image of the text to be translated; the text box that meets the set requirements is locked in at least one text box formed, and the text in the locked text box is used as the text to be translated.
  • the locking a text box that meets a set requirement in the formed at least one text box includes: selecting a text line that is closest to the indicating component along a column direction of the text; determining the selected text The center line of each text box in the row, and the center line of the image of the text to be translated, lock the text box whose line is closest to the center line of the image of the text to be translated.
  • locking a text box that meets a set requirement in the formed at least one text box includes: The column direction is the text line closest to the indicating component; the text box in which at least part of the area in the text box is colored is locked.
  • the method further includes: adaptively segmenting the text to be translated.
  • the adaptive segmentation of the text to be translated includes: obtaining sub-text boxes circumscribed by each letter in the text to be translated; and determining according to the area of each of the sub-text boxes Reference text box; calculate the distance threshold between two adjacent letters according to the width of the reference text box; obtain the actual distance value between every two adjacent letters; according to the distance between every two adjacent letters The actual separation distance value and the separation distance threshold value are used to segment the text to be translated.
  • the determining the reference text box according to the area of each of the sub text boxes includes: selecting a sub text box whose area value is in the middle range according to the area of each of the sub text boxes, and selecting As the reference text box.
  • the difference between the lower limit of the middle range and the minimum area value of each of the sub-text boxes is equal to or approximately equal to the upper limit of the middle range and the maximum area of each of the sub-text boxes The difference between the values.
  • the calculating the distance threshold between two adjacent letters according to the width of the reference text box includes: according to the following formula Calculate the distance threshold between two adjacent letters. Wherein, N is the distance threshold between two adjacent letters, and W is the ratio of the width of the reference text box to the width of a unit pixel in the image.
  • the segmentation of the text to be translated according to the actual separation distance value between every two adjacent letters and the separation distance threshold includes: comparing the actual distance between two adjacent letters The separation distance value and the separation distance threshold; if the actual separation distance value is greater than the separation distance threshold, then two adjacent letters belong to two adjacent words; if the actual separation distance value is less than or equal to the Distance threshold, then two adjacent letters belong to the same word.
  • the translation pen when the translation pen further includes a second processor and a display screen, after the first processor recognizes the text to be translated in the image of the text to be translated, it further includes: The second processor translates the text to be translated recognized by the first processor to generate a translation result; the display screen displays the translation result.
  • the first processor re-identifies the text to be translated in the image of the text to be translated, and the re-recognized text to be translated cannot be the above The text to be translated once recognized.
  • Figure 1 is a structural diagram of a translation pen according to some embodiments of the present disclosure.
  • Fig. 2 is a schematic diagram of a connection of various components of the translation pen according to some embodiments of the present disclosure
  • Fig. 3 is another schematic diagram of connection of various components of the translation pen according to some embodiments of the present disclosure.
  • Figure 4 is a schematic diagram of the use of a translation pen according to some embodiments of the present disclosure.
  • FIG. 5 is a schematic diagram of a detection image of a translation pen according to some embodiments of the present disclosure.
  • FIG. 6 is a schematic diagram of a translation pen locking a text to be translated according to some embodiments of the present disclosure
  • FIG. 7 is another schematic diagram of locking the text to be translated by the translation pen according to some embodiments of the present disclosure.
  • FIG. 8 is a schematic diagram of invalid text collected by a translation pen according to some embodiments of the present disclosure.
  • Fig. 9 is a flowchart of a control method applied to a translation pen according to some embodiments of the present disclosure.
  • FIG. 10 is a flowchart of adaptive word segmentation in a control method according to some embodiments of the present disclosure.
  • FIG. 11 is a flowchart of acquiring sub-text boxes in a control method according to some embodiments of the present disclosure.
  • 12 to 15 are schematic diagrams of various steps of adaptive word segmentation according to some embodiments of the present disclosure.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present disclosure, unless otherwise specified, “plurality” means two or more.
  • the expressions “coupled” and “connected” and their extensions may be used.
  • the term “connected” may be used when describing some embodiments to indicate that two or more components are in direct physical or electrical contact with each other.
  • the term “coupled” may be used when describing some embodiments to indicate that two or more components have direct physical or electrical contact.
  • the term “coupled” or “communicatively coupled” may also mean that two or more components are not in direct contact with each other, but still cooperate or interact with each other.
  • the embodiments disclosed herein are not necessarily limited to the content herein.
  • a and/or B includes the following three combinations: A only, B only, and the combination of A and B.
  • the translation pen 100 includes a pen body 1 for packaging, an indicating component 2 provided on the pen body 1, and an image collector 3 , And the first processor 4 (see Fig. 2) arranged in the pen body.
  • the pen body 1 of the translation pen 100 has a pen tip 11 and a pen tail 12 opposite to the pen tip 11.
  • the indicating member 2 is disposed on the pen tip 11 of the pen body 1.
  • the pen tip 11 has a mounting hole, and the indicating member 2 is disposed on the pen tip 11 through the mounting hole.
  • the indicating component 2 is configured to indicate the position of the image of the text to be translated.
  • the image collector 3 is configured to collect an image of the text to be translated according to the position indicated by the indicating component 2 and send the collected image of the text to be translated.
  • the image collector 3 may use an electronic device with an image acquisition function such as a camera.
  • the first processor 4 is electrically connected to the image collector 3.
  • the first processor 4 is configured to receive the image of the text to be translated sent by the image collector 3, and to recognize the text to be translated in the image of the text to be translated.
  • the first processor 4 may also be arranged outside the pen body 1.
  • the first processor 1 is a computer terminal or a computer server.
  • the "text to be translated” can be the text printed on a book page or the text on an electronic display device. Moreover, the "text to be translated” may be English letters and/or words, and the embodiments of the present disclosure are not limited thereto.
  • the indicating member 2 is arranged on the tip end 11 of the pen body 1
  • the image collector 3 is arranged on the pen body 1
  • the indicating member 2 and the image collector 3 are arranged separately, It is avoided that the tip end 11 of the pen body 1 blocks the image of the text to be translated collected by the image collector 3.
  • the position of the image of the text to be translated is indicated by the indicating component 2
  • the image collector 3 collects the image of the text to be translated according to the position indicated by the indicating component 2
  • the first processor 4 compares the collected images of the text to be translated
  • Recognition of the text to be translated can improve the accuracy of the image collected by the image collector 3 and avoid collecting redundant images, thereby helping the first processor 4 to accurately recognize the text to be translated and improving the accuracy of translation by the translation pen 100.
  • the image collector 3 is arranged on a side outside the pen body 1, for example, the image collector 3 is fixed on the side wall of the pen body 1.
  • the image collector 3 is away from the tip end 11 relative to the pointing part 2, and it is ensured that the pointing part 2 is set within the viewing angle range of the image collector 3. In this way, it can be ensured that the image collector 3 collects the image of the text to be translated indicated by the indicating component 2.
  • the tip end 11 of the pen body 1 is a quadrangular pyramid or similarly a quadrangular pyramid, one end of the quadrangular pyramid is integrally formed with the pen body 1, and the other end is a pointed end.
  • the indicating component 2 is fixed at the tip of the quadrangular pyramid, and the lens of the image collector 3 is located at or near the junction of the quadrangular pyramid and the main part of the pen body 1, ensuring that the indicating component 2 is within the viewing angle range of the image collector 3, thus achieving
  • the image collector 3 collects an image of the text to be translated according to the position indicated by the indicating component 2.
  • the distance between the image collector 3 and the indicating component 2 there is a distance between the image collector 3 and the indicating component 2.
  • the distance between the image collector 3 and the indicating component 2 can be ensured that the size of the image collected by the image collector 3 meets the size requirement.
  • the pen body 1 of the translation pen 100 is also provided with a switch button or a controller electrically connected to the first processor 4, which can be used to control the image collector 3 to perform image collection work.
  • the indicating member 2 can be used to paint colors on external objects.
  • the indicating component 2 can be used to color the text to be translated, so that the image collector 3 can accurately collect the image of the text to be translated.
  • the indicating member 2 can be the tip of a highlighter pen or a colored pen, and can perform the same operations of drawing, coloring and marking as a conventional pen.
  • the color painted by the indicating component 2 (the color painted in the area where "recurrent” is located in the figure) is different from the printing color of the text to be translated.
  • the indicating member 2 has a tip. Compared with the pen tip with a brush-like appearance of the scanning translation pen in the related art, the indicating member 2 mounted on the translation pen 100 in the embodiment of the present disclosure has a sharp tip, and the size of the tip is relatively small. Is small, so when the indicating component 2 indicates the location of the image of the text to be translated, the indicating component 2 will not block the image of the text to be translated.
  • the first processor 4 includes a detection component 41 and a locking component 42.
  • the detection component 41 is electrically connected to the image collector 3, and the image collector 3 sends the collected image of the text to be translated to the detection component 41.
  • the detection component 41 is configured to detect the image of the text to be translated, and form at least one text box in the image of the text to be translated, and each text box contains part of the text in the image of the text to be translated .
  • Fig. 5 shows a situation where multiple text boxes are formed.
  • the locking component 42 is electrically connected to the detecting component 41.
  • the locking component 42 is configured to lock a text box that meets the setting requirements in at least one of the formed text boxes according to the setting requirements, and use the text in the locked text box as the waiting Translate text.
  • the above setting requirements are that along the column direction Y of the text, the row where the text box T is located is the row closest to the indicator 2; and along the row direction X of the text, the text box The center line L2 of T is closest to the center line L1 of the image P of the text to be translated.
  • the text box T where "recurrent” is located in Figure 6 meets the set requirements, so "recurrent” is the text to be translated.
  • the setting requirement is that along the column direction Y of the text to be translated, the row of the text box T is Indicate the line closest to the component 2; and at least part of the area in the text box T is painted.
  • the text box T where "recurrent” is located in Figure 7 meets the set requirements, so "recurrent” is the text to be translated.
  • the center line L2 of the text box T and “the center line L1 of the image P of the text to be translated” are the center lines of the text box T and the image P of the text to be translated along the column direction Y, respectively.
  • the size of "at least part of the area” in "at least part of the area in the text box T is painted” can be set according to actual requirements, which is not specifically limited in the embodiment of the present disclosure.
  • the area of the colored area in the text box T may account for 70% to 100% of the total area of the text box T, for example, 70%, 80%, 90% or 100%. That is to say, in the column direction Y of the text, the area where the text box T is located is the area where the coloring in the text box T is closest to the indicator 2 and can account for 70% of the total area of the text box T.
  • the first processor 4 further includes an adaptive word segmentation component 43 electrically connected to the locking component 42.
  • the adaptive word segmentation component 43 is configured to calculate a threshold value of the separation distance between adjacent letters in the text to be translated according to the size of the locked text to be translated, and to segment the locked text to be translated according to the threshold.
  • CTPN International full name: Connectionist Text Proposal Network
  • YOLO English full name: You Only Look Once
  • the adaptive word segmentation component 43 in the translation pen 100 provided in the above-mentioned embodiment of the present disclosure can calculate the threshold of the separation distance between adjacent letters in the text to be translated according to the font size of the text to be translated in the image of the text to be translated.
  • the function of adaptively adjusting the threshold of the separation distance is realized, and the locked text to be translated is segmented according to the threshold, so that the translation pen 100 is suitable for the translation of texts of various font sizes.
  • the translation pen 100 further includes a second processor 5 arranged in the pen body, and the second processor 5 is electrically connected to the first processor 4.
  • the second processor 5 is configured to receive the text to be translated recognized by the first processor 4, translate the text to be translated, and generate and send the translation result.
  • the second processor 5 may be arranged in the pen body 1.
  • the second processor 5 and the first processor 4 may be combined into one processor, or may be arranged separately.
  • an observation window 7 is provided on the pen body 1.
  • the translation pen 100 further includes a display screen 6 arranged in the observation window 7 of the pen body 1, and the display screen 6 is electrically connected to the second processor 5.
  • the display screen 6 is configured to receive the translation result sent by the second processor 5 and display the translation result.
  • the observation window 7 is an opening that penetrates the side wall on one side of the pen body 1, and the display screen 6 is embedded in the observation window 7 to ensure the stable installation of the display screen 6.
  • One side of the display screen 6 for displaying is exposed outside the pen body 1 to facilitate reading the translation results displayed on the display screen 6.
  • Some embodiments of the present disclosure also provide a control method applied to the aforementioned translation pen 100, as shown in FIG. 9, including the following S1 to S2:
  • the image collector 3 of the translation pen 100 collects an image P of the text to be translated according to the position indicated by the indicating component 2 of the translation pen 100.
  • S2 The first processor 4 of the translation pen 100 recognizes the text to be translated in the image P of the text to be translated.
  • the indicating component 2 indicates the position of the image P of the text to be translated
  • the image collector 3 collects the text to be translated according to the position indicated by the indicating component 2
  • the first processor 4 recognizes the text to be translated in the collected image P of the text to be translated, which can improve the accuracy of the image collected by the image collector 3, thereby helping the first processor 4 to accurately recognize the text to be translated Translate text, improve the accuracy of translation pen 100.
  • the first processor 4 recognizes the text to be translated in the image of the text to be translated, including the following S21 to S23:
  • S21 Detect the image of the text to be translated to form at least one text box; the text box contains part of the text in the image of the text to be translated.
  • the text to be translated in the image P of the text to be translated contains multiple letters and has multiple text lines
  • the image P of the text to be translated is detected by the detection component 51
  • the text A plurality of text boxes T are formed in the image P
  • the text box T contains part of the text in the image P of the text to be translated, thereby facilitating subsequent locking of the text box T where the text to be translated is located, and realizing the locking of the text to be translated.
  • S22 Lock a text box that meets the setting requirements in at least one of the formed text boxes, and use the text in the locked text box as the text to be translated.
  • the locking component 52 of the first processor 4 locks the text box T that meets the setting requirements, and places the locked text box T in As the text to be translated.
  • the above setting requirements are that along the column direction Y of the text, the row of the text box T is the row closest to the indicator 2 and along the row direction X of the text, the text box The center line L2 of T is closest to the center line L1 of the image P of the text to be translated.
  • the setting requirement is that along the column direction Y of the text, the row where the text box T is located is the closest to the indicating member 2 , And at least part of the area in the text box T is painted.
  • the row where the text box T is located is the row closest to the indicator 2 and along the row direction X of the text, the center line L2 of the text box T If it is closest to the center line L1 of the image P of the text to be translated, locking a text box T that meets the set requirements in the formed at least one text box T (S22), including the following steps:
  • the text line that is closest to the indicator 2 along the column direction Y of the text is selected, that is, the text line where "s recurrent la" is located in the figure.
  • the setting requirement is that along the column direction Y of the text, the row of the text box T is the row closest to the indicating component 2, and the text At least part of the area in the frame T is painted, then the text frame T meeting the set requirements is locked in the formed at least one text frame T (S22), including the following steps:
  • the text line that is closest to the indicator 2 along the column direction Y of the text is selected, that is, the text line where "s recurrent la" is located in the figure.
  • the control method applied to the above-mentioned translation pen 100 further includes:
  • S23 includes the following steps:
  • the sub-text box circumscribed by each letter in the text to be translated can be obtained through the following S2311 ⁇ S2315:
  • the opencv function cv::threshold() can be used to perform binarization processing on the collected image of the text to be translated to obtain the binarized image of the text to be translated as shown in FIG. 12.
  • the corrosive image processing method cv::erode() can be used to remove the tiny connections between the text to be translated due to ink printing problems in the binary image of the text to be translated.
  • Figure 11 shows that there is a tiny connection between the two letters "N" in "CNN".
  • the tiny connection between the two letters "N” is removed. Connection.
  • the expanded image processing method cv::dilate() can be used to expand the letters in the text to be translated, so as to restore the original thickness of the letters, that is, the thickness before the letters are corroded.
  • Figure 13 shows the removal of two The tiny connection between the letters "N", and the effect of the morphological transformation of the letters in the text to be translated (restoring the original thickness).
  • the function cv::findContours() can be used to find the contour of each letter in the text to be translated, and the function cv::convexHull() can be used to obtain the convex hull 21 of the contour. Convex hull 21 with the outline of two letters "N".
  • the minimum circumscribed rectangle of the convex hull 21 of the outline of each letter is calculated to obtain the sub-text box 22 of each letter, and the minimum circumscribed rectangle is the sub-text box 22 of each letter.
  • the sub-text boxes 22 are sorted according to the area size of each sub-text box, a sub-text box 22 with an area value in the middle range is selected, and the selected sub-text box 22 is selected.
  • the text box 22 serves as a reference text box.
  • the difference between the lower limit of the "middle range” and the minimum area value of each sub-text box 22 is equal to or approximately equal to the upper limit of the intermediate range and the maximum area value of each sub-text box 22 The difference between.
  • the "intermediate range” can also be a fixed intermediate value. At this time, the difference between the intermediate value and the minimum area value of each sub-text box 22 is equal to the intermediate value and the maximum area value of each sub-text box 22 The difference.
  • S233 Calculate the threshold of the separation distance between two adjacent letters according to the width of the reference text box.
  • the threshold of the separation distance between two adjacent letters is calculated according to the following formula:
  • N is the distance threshold between two adjacent letters
  • W is the ratio of the width of the reference text box (the size of the reference text box along the line direction of the text) to the width of the unit pixel in the image
  • 0.6 is the present application
  • the empirical coefficient value obtained by the inventor through many experiments, 16 is the width of 16 unit pixels set by the commonly used text detection network CTPN or YOLO according to the row direction X.
  • S235 Perform word segmentation on the text to be translated according to the actual separation distance value and the separation distance threshold between every two adjacent letters.
  • the actual separation distance value is greater than the separation distance threshold, it is determined that two adjacent letters belong to two adjacent words respectively. If the actual separation distance value is less than or equal to the separation distance threshold, it is determined that two adjacent letters belong to the same word. In this way, each letter of the text to be translated is divided into at least one word, and word segmentation is realized.
  • the translation pen 100 when the translation pen 100 further includes the second processor 5 and the display screen 6, after the first processor 4 recognizes the text to be translated in the image of the text to be translated, That is, after S2, the above-mentioned control method applied to the translation pen 100 further includes the following steps:
  • the second processor 5 translates the text to be translated recognized by the first processor 4 to generate a translation result, and the display screen 6 displays the translation result.
  • the first processor 4 re-identifies the text to be translated in the image of the text to be translated, and the re-recognized text to be translated cannot be the text to be translated that was recognized last time.
  • the image collector 3 when the image collector 3 collects images, it may be affected by external factors, such as the trembling of the operator's hand, which causes only a part of the text to be translated to be collected in the locked text box, thereby locking
  • the text to be translated in the text box of is invalid text ("recurrent" in the locked text box in FIG. 8 is the invalid text), resulting in the translation pen 100 unable to generate the translation result.
  • the first processor 4 re-identifies the text to be translated in the image of the text to be translated. For example, along the column direction Y of the text, except for the row where "recurrent" is located, lock the row that is closest to the indicating part 2, that is, the row where the text boxes of "Network not” and “o” are located; and along the line direction of the text X. According to the principle that the center line of the text box is closest to the center line of the image of the text to be translated, lock the "Network not” text box, and "Network Not” is the re-identified text to be translated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)
  • Character Input (AREA)
  • Document Processing Apparatus (AREA)

Abstract

一种翻译笔,包括笔体、指示部件、图像采集器和第一处理器,笔体包括笔头端,指示部件设置于笔体的笔头端;图像采集器设置于笔体上,图像采集器根据指示部件所指示的位置采集包括待翻译文字的图像,并发送所采集的包括待翻译文字的图像;第一处理器设置于笔体内,第一处理器与图像采集器电连接,第一处理器接收图像采集器所发送的包括待翻译文字的图像并识别待翻译文字。

Description

翻译笔及其控制方法
本申请要求于2019年07月29日提交的、申请号为201910690400.2的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及翻译装置技术领域,尤其涉及一种翻译笔及其控制方法。
背景技术
扫描翻译笔通过扫描技术将印刷文字扫描到笔里面的文字识别系统,以使文字识别系统对扫描的文字进行识别,之后再通过笔里面的翻译软件对所识别的文字进行翻译。
具体的,扫描翻译笔具有类似刷子外观的笔头,将笔头贴在书页上,并沿书页上印刷文字的行方向刷过,以对印刷文字进行扫描,即可进行翻译。
但是,上述的扫描翻译笔在扫描文字过程中是逐行扫描的,对于同一行上的即将扫描的文字会存在遮挡,影响阅读者的观感,同时如此的扫描方式对扫描量无法进行较好的把握,容易造成扫描量过大导致翻译结果不准确。
发明内容
一方面,提供一种翻译笔,包括笔体、指示部件、图像采集器和第一处理器。所述笔体包括具有笔头端。所述指示部件设置于所述笔体的笔头端。所述图像采集器设置于所述笔体上;所述图像采集器被配置为,根据所述指示部件所指示的位置采集待翻译文字的图像,并发送所采集的待翻译文字的图像。所述第一处理器设置于所述笔体内,所述第一处理器与所述图像采集器电连接;所述第一处理器被配置为,接收所述图像采集器所发送的待翻译文字的图像,识别所述待翻译文字的图像中的待翻译文字。
在一些实施例中,所述笔体还具有与所述笔头端相对的笔尾端。所述图像采集器设置于所述笔体外的一侧,沿由所述笔体的笔头端指向笔尾端的方向,所述图像采集器相对于所述指示部件远离所述笔头端。所述指示部件设置于所述图像采集器的视角范围内。
在一些实施例中,所述图像采集器与所述指示部件之间具有间距。
在一些实施例中,所述指示部件能够用于在外部物体上涂色。
在一些实施例中,所述指示部件具有尖端。
在一些实施例中,所述第一处理器包括:与所述图像采集器电连接的检测部件,以及与所述检测部件电连接的锁定部件。所述检测部件被配置为,检测所述待翻译文字的图像,形成至少一个文本框,所述文本框内包含所述 待翻译文字的图像中的部分文字。所述锁定部件被配置为,在所形成的至少一个文本框中锁定符合设定要求的文本框,将锁定的文本框中的文字作为待翻译文字。其中,所述设定要求为,沿文字的列方向,所述文本框所在的行是与所述指示部件距离最近的行;且沿文字的行方向,所述文本框的中线距离所述待翻译文字的图像的中线最近。或者,在所述指示部件能够用于在外部物体上涂色的情况下,所述设定要求为,沿文字的列方向,所述文本框所在的行是与所述指示部件距离最近的行;且所述文本框内的至少部分区域被涂色。
在一些实施例中,所述第一处理器还包括:与所述锁定部件电连接的自适应分词部件。所述自适应分词部件被配置为,根据锁定的待翻译文字的大小,计算所述待翻译文字中相邻字母之间的间隔距离的阈值,并根据所述阈值对锁定的待翻译文字进行分词。
在一些实施例中,所述翻译笔还包括:设置于所述笔体内的第二处理器,所述第二处理器与所述第一处理器电连接。所述第二处理器被配置为,接收所述第一处理器所识别的待翻译文字,并对所述待翻译文字进行翻译,生成并发送翻译结果。
在一些实施例中,所述笔体上设置有观察窗。所述翻译笔还包括:设置于所述笔体的观察窗内的显示屏,所述显示屏与所述第二处理器电连接。所述显示屏被配置为,接收所述第二处理器发送的翻译结果,并显示所述翻译结果。
另一方面,提供一种应用于上述翻译笔的控制方法,包括:所述翻译笔的图像采集器根据所述翻译笔的指示部件所指示的位置采集待翻译文字的图像;所述翻译笔的第一处理器识别所述待翻译文字的图像中的待翻译文字。
在一些实施例中,所述第一处理器识别所述待翻译文字的图像中的待翻译文字,包括:检测所述待翻译文字的图像,形成至少一个文本框。所述文本框内包含所述待翻译文字的图像中的部分文字;在所形成的至少一个文本框中锁定符合设定要求的文本框,将锁定的文本框中的文字作为待翻译文字。
在一些实施例中,所述在所形成的至少一个文本框中锁定符合设定要求的文本框,包括:选择沿文字的列方向与所述指示部件距离最近的文字行;确定所选择的文字行中的各文本框的中线,以及所述待翻译文字的图像的中线,锁定文本框中线距离所述待翻译文字的图像的中线最近的文本框。
在一些实施例中,在所述指示部件能够用于在外部物体上涂色的情况下,所述在所形成的至少一个文本框中锁定符合设定要求的文本框,包括:选择 沿文字的列方向与所述指示部件距离最近的文字行;锁定文本框内的至少部分区域被涂色的文本框。
在一些实施例中,在所述将锁定的文本框中的文字作为待翻译文字之后,还包括:对所述待翻译文字进行自适应分词。
在一些实施例中,所述对所述待翻译文字进行自适应分词,包括:获取所述待翻译文字中的每个字母所外接的子文本框;根据各所述子文本框的面积,确定参考文本框;根据所述参考文本框的宽度,计算相邻两个字母之间的间隔距离阈值;获取每相邻两个字母之间的实际间隔距离值;根据每相邻两个字母之间的实际间隔距离值和所述间隔距离阈值,对所述待翻译文字进行分词。
在一些实施例中,所述根据各所述子文本框的面积,确定参考文本框,包括:根据各所述子文本框的面积,选取面积值处于中间范围的一个子文本框,将所选取的子文本框作为所述参考文本框。其中,所述中间范围的下限值与各所述子文本框的最小面积值之间的差值,等于或大致等于,所述中间范围的上限值与各所述子文本框的最大面积值之间的差值。
在一些实施例中,所述根据所述参考文本框的宽度,计算相邻两个字母之间的间隔距离阈值,包括:根据如下公式
Figure PCTCN2020105026-appb-000001
计算相邻两个字母之间的间隔距离阈值。其中,N为相邻两个字母之间的间隔距离阈值,W为所述参考文本框的宽度与图像中单位像素的宽度的比值。
在一些实施例中,所述根据每相邻两个字母之间的实际间隔距离值和所述间隔距离阈值,对所述待翻译文字进行分词,包括:比较相邻两个字母之间的实际间隔距离值和所述间隔距离阈值;若所述实际间隔距离值大于所述间隔距离阈值,则相邻两个字母分别属于相邻两个单词;若所述实际间隔距离值小于或等于所述间隔距离阈值,则相邻两个字母属于同一个单词。
在一些实施例中,在所述翻译笔还包括第二处理器和显示屏的情况下,在所述第一处理器识别所述待翻译文字的图像中的待翻译文字之后,还包括:所述第二处理器对所述第一处理器所识别的待翻译文字进行翻译,生成翻译结果;所述显示屏显示所述翻译结果。
在一些实施例中,若所述第二处理器无法生成翻译结果,则所述第一处理器重新识别所述待翻译文字的图像中的待翻译文字,且重新识别的待翻译文字不能是上一次被识别的待翻译文字。
附图说明
为了更清楚地说明本公开中的技术方案,下面将对本公开一些实施 例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例的附图,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。此外,以下描述中的附图可以视作示意图,并非对本公开实施例所涉及的产品的实际尺寸、方法的实际流程、信号的实际时序等的限制。
图1为根据本公开的一些实施例的翻译笔的结构图;
图2为根据本公开的一些实施例的翻译笔的各器件的一种连接示意图;
图3为根据本公开的一些实施例的翻译笔的各器件的另一种连接示意图;
图4为根据本公开的一些实施例的翻译笔的使用示意图;
图5为根据本公开的一些实施例的翻译笔的检测图像的示意图;
图6为根据本公开的一些实施例的翻译笔的锁定待翻译文字的示意图;
图7为根据本公开的一些实施例的翻译笔的另一种锁定待翻译文字的示意图;
图8为根据本公开的一些实施例的翻译笔采集的无效文字的示意图;
图9为根据本公开的一些实施例的应用于翻译笔的控制方法的流程图;
图10为根据本公开的一些实施例的控制方法中自适应分词的流程图;
图11为根据本公开的一些实施例的控制方法中获取子文本框的流程图;
图12~图15为根据本公开的一些实施例的自适应分词的各步骤的示意图。
具体实施方式
下面将结合附图,对本公开一些实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开所提供的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本公开保护的范围。
除非上下文另有要求,否则,在整个说明书和权利要求书中,术语“包括(comprise)”及其其他形式例如第三人称单数形式“包括(comprises)”和现在分词形式“包括(comprising)”被解释为开放、包含的意思,即为“包含,但不限于”。在说明书的描述中,术语“一个实施例(one embodiment)”、“一些 实施例(some embodiments)”、“示例性实施例(exemplary embodiments)”、“示例(example)”、“特定示例(specific example)”或“一些示例(some examples)”等旨在表明与该实施例或示例相关的特定特征、结构、材料或特性包括在本公开的至少一个实施例或示例中。上述术语的示意性表示不一定是指同一实施例或示例。此外,所述的特定特征、结构、材料或特点可以以任何适当方式包括在任何一个或多个实施例或示例中。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本公开实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
在描述一些实施例时,可能使用了“耦接”和“连接”及其衍伸的表达。例如,描述一些实施例时可能使用了术语“连接”以表明两个或两个以上部件彼此间有直接物理接触或电接触。又如,描述一些实施例时可能使用了术语“耦接”以表明两个或两个以上部件有直接物理接触或电接触。然而,术语“耦接”或“通信耦合(communicatively coupled)”也可能指两个或两个以上部件彼此间并无直接接触,但仍彼此协作或相互作用。这里所公开的实施例并不必然限制于本文内容。
“A和/或B”,包括以下三种组合:仅A,仅B,及A和B的组合。
本文中“适用于”或“被配置为”的使用意味着开放和包容性的语言,其不排除适用于或被配置为执行额外任务或步骤的设备。
如本文所使用的那样,“约”或“大致”包括所阐述的值以及处于特定值的可接受偏差范围内的平均值,其中所述可接受偏差范围如由本领域普通技术人员考虑到正在讨论的测量以及与特定量的测量相关的误差(即,测量系统的局限性)所确定。
本公开的一些实施例提供了一种翻译笔,如图1和图2所示,该翻译笔100包括用于封装的笔体1、设置于笔体1上的指示部件2和图像采集器3、以及设置于笔体内的第一处理器4(可参见图2)。
其中,翻译笔100的笔体1具有笔头端11,以及与笔头端11相对的笔尾端12。指示部件2设置于笔体1的笔头端11,示例性的,笔头端11具有安装孔,指示部件2通过该安装孔设置于笔头端11。指示部件2被配置为,指示待翻译文字的图像所在的位置。
图像采集器3被配置为,根据指示部件2所指示的位置采集待翻译文字的图像,并发送所采集的待翻译文字的图像。例如,图像采集器3可采用摄 像头等具有图像采集功能的电子器件。
第一处理器4与图像采集器3电连接。第一处理器4被配置为,接收图像采集器3所发送的待翻译文字的图像,识别待翻译文字的图像中的待翻译文字。
需要说明的是,第一处理器4还可以设置于笔体1外,在此情况下,第一处理器1为计算机终端或计算机服务器。
“待翻译文字”可以是印刷在书页上的文字,或电子显示设备上的文字。并且,“待翻译文字”可以是英语字母和/或单词,本公开的实施例不限于此。
在本公开的上述实施例的翻译笔100中,指示部件2设置于笔体1的笔头端11,图像采集器3设置于笔体1上,且指示部件2与图像采集器3分开设置,可避免笔体1的笔头端11对图像采集器3采集待翻译文字的图像造成遮挡。并且,通过指示部件2指示待翻译文字的图像所在的位置,图像采集器3根据指示部件2所指示的位置采集待翻译文字的图像,第一处理器4对所采集的待翻译文字的图像中的待翻译文字进行识别,可提高图像采集器3采集图像的准确性,避免采集多余的图像,从而有利于第一处理器4准确识别待翻译文字,提高翻译笔100翻译的准确性。
在一些实施例中,如图1所示,图像采集器3设置于笔体1外的一侧,示例性的,图像采集器3固定在笔体1的侧壁上。并且,沿由笔体1的笔头端11指向笔尾端12的方向A,图像采集器3相对于指示部件2远离笔头端11,并保证指示部件2设置于图像采集器3的视角范围内,这样可以保证图像采集器3采集到指示部件2所指示的待翻译文字的图像。
示例性的,笔体1的笔头端11为四棱锥体或类似为四棱锥体,该四棱锥体的一端与笔体1一体成型,另一端呈尖端。指示部件2固定在四棱锥体的尖端,图像采集器3的镜头位于或靠近四棱锥体与笔体1的主体部分的连接处,保证指示部件2处于图像采集器3的视角范围内,从而实现图像采集器3根据指示部件2所指示的位置采集待翻译文字的图像。
示例性的,图像采集器3与指示部件2之间具有间距。在图像采集器3的视角范围一定的情况下,使图像采集器3与指示部件2之间具有间距,可保证图像采集器3采集图像的大小满足尺寸要求。
在一些实施例中,翻译笔100的笔体1上还设置有与第一处理器4电连接的开关按钮或控制器,可用于控制图像采集器3进行图像采集工作。
在一些实施例中,如图1和图4所示,指示部件2可用于在外部物体上涂色。可采用指示部件2对待翻译文字进行涂色,以便于图像采集器3准确 采集待翻译文字的图像。在此情况下,指示部件2可以是荧光笔或彩笔的笔尖,可进行与常规笔相同的勾勒、涂色以及标记等操作。
需要说明的是,如图7所示,指示部件2所涂的颜色(图中“recurrent”所在区域被涂的颜色),与待翻译文字的印刷的颜色相区别。
示例性的,指示部件2具有尖端,相比于相关技术中扫描翻译笔具有类似刷子外观的笔头,本公开的实施例中的翻译笔100所安装的指示部件2由于具有尖端,尖端的尺寸较小,因此在指示部件2指示待翻译文字的图像所在的位置时,指示部件2不会对待翻译文字的图像造成遮挡。
在一些实施例中,如图3所示,第一处理器4包括检测部件41和锁定部件42。
其中,检测部件41与图像采集器3电连接,图像采集器3将所采集的待翻译文字的图像发送至检测部件41。如图5所示,检测部件41被配置为,对待翻译文字的图像进行检测,并在待翻译文字的图像中形成至少一个文本框,每个文本框内包含待翻译文字的图像中的部分文字。图5中示出了形成多个文本框的情形。
锁定部件42与检测部件41电连接,锁定部件42被配置为,根据设定要求,在所形成的至少一个文本框中锁定符合设定要求的文本框,将锁定的文本框中的文字作为待翻译文字。
其中,如图4和图6所示,上述设定要求为,沿文字的列方向Y,文本框T所在的行是与指示部件2距离最近的行;且沿文字的行方向X,文本框T的中线L2距离待翻译文字的图像P的中线L1最近。图6中“recurrent”所在的文本框T符合设定要求,因此“recurrent”为待翻译文字。
或者,如图4和图7所示,在指示部件2能够用于在外部物体上涂色的情况下,设定要求为,沿待翻译文字的列方向Y,文本框T所在的行是与指示部件2距离最近的行;且文本框T内的至少部分区域被涂色。图7中“recurrent”所在的文本框T符合设定要求,因此“recurrent”为待翻译文字。
需要说明的是,在本文中,“文本框T的中线L2”和“待翻译文字的图像P的中线L1”分别为文本框T和待翻译文字的图像P的沿列方向Y的中线。
“文本框T内的至少部分区域被涂色”中的“至少部分区域”的大小可根据实际需求设定,本公开实施例对此并不具体限定。示例性的,文本框T内被涂色的区域的面积可占文本框T总面积的70%~100%,例如为70%、80%、90%或100%。也就是说,在沿文字的列方向Y,文本框T所在的行是与指示部件2距离最近在文本框T内被涂色的区域的面积可占文本框T总面积的 70%。
在一些实施例中,如图3所示,第一处理器4还包括与锁定部件42电连接的自适应分词部件43。自适应分词部件43被配置为,根据锁定的待翻译文字的大小,计算待翻译文字中相邻字母之间的间隔距离的阈值,并根据阈值对锁定的待翻译文字进行分词。
在相关技术中,常用的文本检测网络例如CTPN(英文全称:Connectionist Text Proposal Network)或YOLO(英文全称:You Only Look Once),其基本原理都是训练文本检测网络,使文本检测网络能够按照字母的行方向设定一定数量的单位像素(通常是16个单位像素)进行图像遍历,找到符合设定的字母区域并组合成完整的待翻译文字。此类方法中,区分字母区域是一个单词还是多个单词的方法是,预设一个相邻两个字母之间的间隔距离阈值,若相邻两个字母之间的实际间隔距离值小于或等于间隔距离阈值,则判定相邻两个字母属于同一个单词,相反则分别属于相邻两个单词。但是,在一些场景例如名片、商品标签等,同一个文字区域可能存在字号不同的文字,仅预设一个间隔距离阈值不适用于这些场景,即不能准确地对字号较大或较小的字母(相比于预设间隔距离阈值的字母的字号)进行分词。
本公开上述实施例所提供的翻译笔100中的自适应分词部件43,可根据待翻译文字的图像中待翻译文字的字号不同,计算待翻译文字中相邻字母之间的间隔距离的阈值,实现自适应调整间隔距离的阈值的功能,并根据阈值对锁定的待翻译文字进行分词,以使翻译笔100适用于各种字号的文字的翻译。
在一些实施例中,如图3所示,翻译笔100还包括设置于笔体内的第二处理器5,第二处理器5与第一处理器4电连接。第二处理器5被配置为,接收第一处理器4所识别的待翻译文字,并对待翻译文字进行翻译,生成并发送翻译结果。
示例性的,第二处理器5可以设置在笔体1内。在第一处理器4设置于笔体1内的情况下,第二处理器5可以与第一处理器4合并为一个处理器,也可以分体设置。
在一些实施例中,如图1和图2所示,笔体1上设置有观察窗7。翻译笔100还包括设置于笔体1的观察窗7内的显示屏6,显示屏6与第二处理器5电连接。显示屏6被配置为,接收第二处理器5发送的翻译结果,并显示翻译结果。
示例性的,观察窗7为贯穿笔体1一侧的侧壁的开口,显示屏6嵌在观 察窗7内,保证显示屏6的稳定安装。显示屏6用于显示的一侧暴露于笔体1外,以便于读取显示屏6显示的翻译结果。
本公开的一些实施例还提供了一种应用于上述翻译笔100的控制方法,如图9所示,包括如下S1~S2:
S1:如图4所示,翻译笔100的图像采集器3根据翻译笔100的指示部件2所指示的位置采集待翻译文字的图像P。
S2:翻译笔100的第一处理器4识别待翻译文字的图像P中的待翻译文字。
本公开的上述实施例所提供的控制方法,通过操作翻译笔100,使指示部件2指示待翻译文字的图像P所在的位置,并且图像采集器3根据指示部件2所指示的位置采集待翻译文字的图像P,第一处理器4对所采集的待翻译文字的图像P中的待翻译文字进行识别,可提高图像采集器3采集图像的准确性,从而有利于第一处理器4准确识别待翻译文字,提高翻译笔100的翻译的准确性。
在一些实施例中,如图9所示,第一处理器4识别待翻译文字的图像中的待翻译文字,包括如下S21~S23:
S21:检测待翻译文字的图像,形成至少一个文本框;文本框内包含待翻译文字的图像中的部分文字。
示例性的,如图5所示,待翻译文字的图像P中的待翻译文字包含多个字母且有多个文字行,通过检测部件51对待翻译文字的图像P进行检测,在待翻译文字的图像P中形成多个文本框T,且文本框T内包含待翻译文字的图像P中的部分文字,从而便于后续对待翻译文字所在的文本框T进行锁定,实现待翻译文字的锁定。
S22:在所形成的至少一个文本框中锁定符合设定要求的文本框,将锁定的文本框中的文字作为待翻译文字。
示例性的,如图6和图7所示,在所形成的多个文本框T中,第一处理器4的锁定部件52锁定符合设定要求的文本框T,将锁定的文本框T中的文字作为待翻译文字。
需要说明的是,如图6所示,上述设定要求为,沿文字的列方向Y,文本框T所在的行是与指示部件2距离最近的行,且沿文字的行方向X,文本框T的中线L2距离待翻译文字的图像P的中线L1最近。或者,如图7所示,在指示部件2能够用于在外部物体上涂色的情况下,设定要求为,沿文字的列方向Y,文本框T所在的行是与指示部件2距离最近的行,且文本框T内 的至少部分区域被涂色。
在一些实施例中,若上述设定要求为,沿文字的列方向Y,文本框T所在的行是与指示部件2距离最近的行,且沿文字的行方向X,文本框T的中线L2距离待翻译文字的图像P的中线L1最近,则在所形成的至少一个文本框T中锁定符合设定要求的文本框T(S22),包括如下步骤:
(1)在待翻译文字的图像P中,选择沿文字的列方向Y与指示部件2距离最近的文字行。
示例性的,如图4和图5所示,选择沿文字的列方向Y与指示部件2距离最近的文字行,即图中“s recurrent la”所在的文字行。
(2)确定所选择的文字行中的各文本框T的中线,以及待翻译文字的图像P的中线L1,锁定文本框T中线距离待翻译文字的图像P的中线L1最近的文本框T。
示例性的,如图6所示,确定“s recurrent la”所在的文字行中“s”、“recurrent”和“la”所在的各文本框T的中线,以及待翻译文字的图像P的中线L1,可得出“recurrent”所在的文本框T的中线L2距离待翻译文字的图像P的中线L1最近,即锁定了“recurrent”所在的文本框T。
在一些实施例中,若指示部件2能够用于在外部物体上涂色,设定要求为,沿文字的列方向Y,文本框T所在的行是与指示部件2距离最近的行,且文本框T内的至少部分区域被涂色,则在所形成的至少一个文本框T中锁定符合设定要求的文本框T(S22),包括如下步骤:
(1)选择沿文字的列方向Y与指示部件2距离最近的文字行;
示例性的,如图4和图7所示,选择沿文字的列方向Y与指示部件2距离最近的文字行,即图中“s recurrent la”所在的文字行。
(2)锁定文本框T内的至少部分区域被涂色的文本框T。
示例性的,如图7所示,“s recurrent la”所在的文字行中“s”、“recurrent”和“la”所在的各文本框T中,“recurrent”所在的文本框T的区域内被涂色,即锁定了“recurrent”所在的文本框T。
在一些实施例中,如图9所示,在锁定的文本框中的文字作为待翻译文字之后,即在S22之后,应用于上述翻译笔100的控制方法还包括:
S23:对待翻译文字进行自适应分词。
示例性的,如图10所示,S23包括如下步骤:
S231:获取待翻译文字中的每个字母所外接的子文本框。
可选的,如图11所示,可通过如下S2311~S2315获取待翻译文字中的每 个字母所外接的子文本框:
S2311:对待翻译文字的图像进行二值化处理。
例如,可采用opencv函数cv::threshold(),对采集到的待翻译文字的图像进行二值化处理,得到图12所示的待翻译文字的二值化图像。
S2312:腐蚀字母之间的微小的连接。
例如,可采用腐蚀的图像处理手段cv::erode(),去除待翻译文字的二值化图像中,因油墨印刷问题导致的待翻译文字之间的微小的连接。例如,图11中示出了“CNN”中两个字母“N”之间存在微小的连接,通过采用腐蚀的图像处理手段cv::erode(),去除两个字母“N”之间的微小的连接。
上述去除待翻译文字之间的微小的连接的过程中,待翻译文字也会受到腐蚀的图像处理手段cv::erode()的腐蚀,导致待翻译文字变细。由此,可通过下述S2313解决前述问题。
S2313:膨胀字母,恢复原来的粗细度。
例如,可采用膨胀的图像处理手段cv::dilate(),使待翻译文字中的字母膨胀,从而使字母恢复原来的粗细度,即字母受到腐蚀之前的粗细度,图13示出了去除两个字母“N”之间的微小的连接,以及待翻译文字中的字母的形态学变换后的效果(恢复原来的粗细度)。
S2314:获取字母的轮廓的凸包。
例如,如图14所示,可采用函数cv::findContours()找到待翻译文字中每个字母的轮廓,并采用函数cv::convexHull()获取轮廓的凸包21,图14中示出获得了两个字母“N”的轮廓的凸包21。
S2315:获取字母的子文本框。
例如,如图15所示,计算每个字母的轮廓的凸包21的最小外接矩形,以获取每个字母的子文本框22,最小外接矩形即为每个字母的子文本框22。
S232:根据各子文本框22的面积,确定参考文本框。
示例性的,根据各个字母的子文本框22的面积,按照各子文本框的面积大小对各子文本框22进行排序,选取面积值处于中间范围的一个子文本框22,将所选取的子文本框22作为参考文本框。
需要说明的是,“中间范围”的下限值与各子文本框22的最小面积值之间的差值,等于或大致等于,中间范围的上限值与各子文本框22的最大面积值之间的差值。“中间范围”也可以是一个固定的中间值,此时该中间值与各子文本框22的最小面积值之间的差值,等于该中间值与各子文本框22的最大面积值之间的差值。
通过上述方法,选取面积值处于中间范围的一个子文本框22,并以这个子文本框22作为参考文本框,可排除待翻译文字的图像中较小的字母(例如图14中示出的字母“i”和“t”),以及存在微小的连接的两个字母(例如上述图11中的两个字母“N”),避免对后续的间隔距离阈值的计算造成干扰。
S233:根据参考文本框的宽度,计算相邻两个字母之间的间隔距离阈值。
示例性的,根据如下公式计算相邻两个字母之间的间隔距离阈值:
Figure PCTCN2020105026-appb-000002
其中,N为相邻两个字母之间的间隔距离阈值;W为参考文本框的宽度(参考文本框沿文字的行方向的尺寸)与图像中单位像素的宽度的比值;0.6是本申请的发明人经过多次试验得到的经验系数值,16为常用的文本检测网络CTPN或YOLO按照行方向X设定的16个单位像素的宽度。
S234:获取每相邻两个字母之间的实际间隔距离值。
S235:根据每相邻两个字母之间的实际间隔距离值和间隔距离阈值,对待翻译文字进行分词。
示例性的,若实际间隔距离值大于间隔距离阈值,则判定相邻两个字母分别属于相邻两个单词。若实际间隔距离值小于或等于间隔距离阈值,则判定相邻两个字母属于同一个单词。这样,将待翻译文字的各个字母划分入至少一个单词中,实现了分词。
在一些实施例中,如图9所示,在翻译笔100还包括第二处理器5和显示屏6的情况下,在第一处理器4识别待翻译文字的图像中的待翻译文字之后,即在S2之后,上述应用于翻译笔100的控制方法还包括如下步骤:
S3:第二处理器5对第一处理器4所识别的待翻译文字进行翻译,生成翻译结果,显示屏6显示翻译结果。
若第二处理器5无法生成翻译结果,则第一处理器4重新识别待翻译文字的图像中的待翻译文字,且重新识别的待翻译文字不能是上一次被识别的待翻译文字。
示例性的,如图8所示,图像采集器3采集图像时可能会受到外界因素的影响,比如操作者的手颤抖等,导致锁定的文本框中仅采集到待翻译文字的一部分,从而锁定的文本框中的待翻译文字为无效文字(图8中锁定的文本框中的“recurrent”即为无效文字),导致翻译笔100无法生成翻译结果。
在此情况下,第一处理器4重新识别待翻译文字的图像中的待翻译文字。 例如,沿文字的列方向Y,除“recurrent”所在的行外,锁定与指示部件2距离最近的行,即“Network not”和“o”的文本框所在的行;且沿文字的行方向X,根据文本框的中线距离待翻译文字的图像的中线最近的原则,锁定“Network not”的文本框,“Network not”即为重新识别的待翻译文字。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种翻译笔,包括:
    笔体,所述笔体具有笔头端;
    设置于所述笔体的笔头端的指示部件;
    设置于所述笔体上的图像采集器;所述图像采集器被配置为,根据所述指示部件所指示的位置采集待翻译文字的图像,并发送所采集的待翻译文字的图像;
    设置于所述笔体内的第一处理器,所述第一处理器与所述图像采集器电连接;所述第一处理器被配置为,接收所述图像采集器所发送的待翻译文字的图像,识别所述待翻译文字的图像中的待翻译文字。
  2. 根据权利要求1所述的翻译笔,其中,所述笔体还具有与所述笔头端相对的笔尾端;
    所述图像采集器设置于所述笔体外的一侧,沿由所述笔体的笔头端指向笔尾端的方向,所述图像采集器相对于所述指示部件远离所述笔头端;
    所述指示部件设置于所述图像采集器的视角范围内。
  3. 根据权利要求2所述的翻译笔,其中,所述图像采集器与所述指示部件之间具有间距。
  4. 根据权利要求1~3中任一项所述的翻译笔,其中,所述指示部件能够用于在外部物体上涂色。
  5. 根据权利要求1~4中任一项所述的翻译笔,其中,所述指示部件具有尖端。
  6. 根据权利要求1~5中任一项所述的翻译笔,其中,所述第一处理器包括:
    与所述图像采集器电连接的检测部件;所述检测部件被配置为,检测所述待翻译文字的图像,形成至少一个文本框,所述文本框内包含所述待翻译文字的图像中的部分文字;
    与所述检测部件电连接的锁定部件;所述锁定部件被配置为,在所形成的至少一个文本框中锁定符合设定要求的文本框,将锁定的文本框中的文字作为待翻译文字;
    其中,所述设定要求为,沿文字的列方向,所述文本框所在的行是与所述指示部件距离最近的行;且沿文字的行方向,所述文本框的中线距离所述待翻译文字的图像的中线最近;或者,
    在所述指示部件能够用于在外部物体上涂色的情况下,所述设定要求为,沿文字的列方向,所述文本框所在的行是与所述指示部件距离最近的行;且所述文本框内的至少部分区域被涂色。
  7. 根据权利要求6所述的翻译笔,其中,所述第一处理器还包括:
    与所述锁定部件电连接的自适应分词部件;
    所述自适应分词部件被配置为,根据锁定的待翻译文字的大小,计算所述待翻译文字中相邻字母之间的间隔距离的阈值,并根据所述阈值对锁定的待翻译文字进行分词。
  8. 根据权利要求1~7中任一项所述的翻译笔,其中,所述翻译笔还包括:
    设置于所述笔体内的第二处理器,所述第二处理器与所述第一处理器电连接;
    所述第二处理器被配置为,接收所述第一处理器所识别的待翻译文字,并对所述待翻译文字进行翻译,生成并发送翻译结果。
  9. 根据权利要求8所述的翻译笔,其中,所述笔体上设置有观察窗;
    所述翻译笔还包括:
    设置于所述笔体的观察窗内的显示屏,所述显示屏与所述第二处理器电连接;
    所述显示屏被配置为,接收所述第二处理器发送的翻译结果,并显示所述翻译结果。
  10. 一种应用于如权利要求1~9中任一项所述的翻译笔的控制方法,包括:
    所述翻译笔的图像采集器根据所述翻译笔的指示部件所指示的位置采集待翻译文字的图像;
    所述翻译笔的第一处理器识别所述待翻译文字的图像中的待翻译文字。
  11. 根据权利要求10所述的控制方法,其中,所述第一处理器识别所述待翻译文字的图像中的待翻译文字,包括:
    检测所述待翻译文字的图像,形成至少一个文本框;所述文本框内包含所述待翻译文字的图像中的部分文字;
    在所形成的至少一个文本框中锁定符合设定要求的文本框,将锁定的文本框中的文字作为待翻译文字。
  12. 根据权利要求11所述的控制方法,其中,所述在所形成的至少一个文本框中锁定符合设定要求的文本框,包括:
    选择沿文字的列方向与所述指示部件距离最近的文字行;
    确定所选择的文字行中的各文本框的中线,以及所述待翻译文字的图像的中线,锁定文本框中线距离所述待翻译文字的图像的中线最近的文本框。
  13. 根据权利要求11所述的控制方法,其中,在所述指示部件能够用于在外部物体上涂色的情况下,
    所述在所形成的至少一个文本框中锁定符合设定要求的文本框,包括:
    选择沿文字的列方向与所述指示部件距离最近的文字行;
    锁定文本框内的至少部分区域被涂色的文本框。
  14. 根据权利要求11所述的控制方法,其中,在所述将锁定的文本框中的文字作为待翻译文字之后,还包括:
    对所述待翻译文字进行自适应分词。
  15. 根据权利要求14所述的控制方法,其中,所述对所述待翻译文字进行自适应分词,包括:
    获取所述待翻译文字中的每个字母所外接的子文本框;
    根据各所述子文本框的面积,确定参考文本框;
    根据所述参考文本框的宽度,计算相邻两个字母之间的间隔距离阈值;
    获取每相邻两个字母之间的实际间隔距离值;
    根据每相邻两个字母之间的实际间隔距离值和所述间隔距离阈值,对所述待翻译文字进行分词。
  16. 根据权利要求15所述的控制方法,其中,所述根据各所述子文本框的面积,确定参考文本框,包括:
    根据各所述子文本框的面积,选取面积值处于中间范围的一个子文本框,将所选取的子文本框作为所述参考文本框;
    其中,所述中间范围的下限值与各所述子文本框的最小面积值之间的差值,等于或大致等于,所述中间范围的上限值与各所述子文本框的最大面积值之间的差值。
  17. 根据权利要求15或16所述的控制方法,其中,所述根据所述参考文本框的宽度,计算相邻两个字母之间的间隔距离阈值,包括:
    根据如下公式计算相邻两个字母之间的间隔距离阈值;
    Figure PCTCN2020105026-appb-100001
    其中,N为相邻两个字母之间的间隔距离阈值,W为所述参考文本框的宽度与图像中单位像素的宽度的比值。
  18. 根据权利要求15~17中任一项所述的控制方法,其中,所述根据每 相邻两个字母之间的实际间隔距离值和所述间隔距离阈值,对所述待翻译文字进行分词,包括:
    比较相邻两个字母之间的实际间隔距离值和所述间隔距离阈值;
    若所述实际间隔距离值大于所述间隔距离阈值,则相邻两个字母分别属于相邻两个单词;
    若所述实际间隔距离值小于或等于所述间隔距离阈值,则相邻两个字母属于同一个单词。
  19. 根据权利要求10~18中任一项所述的控制方法,其中,在所述翻译笔还包括第二处理器和显示屏的情况下,
    在所述第一处理器识别所述待翻译文字的图像中的待翻译文字之后,还包括:
    所述第二处理器对所述第一处理器所识别的待翻译文字进行翻译,生成翻译结果;
    所述显示屏显示所述翻译结果。
  20. 根据权利要求19所述的控制方法,其中,若所述第二处理器无法生成翻译结果,则所述第一处理器重新识别所述待翻译文字的图像中的待翻译文字,且重新识别的待翻译文字不能是上一次被识别的待翻译文字。
PCT/CN2020/105026 2019-07-29 2020-07-28 翻译笔及其控制方法 WO2021018110A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/423,413 US20220076042A1 (en) 2019-07-29 2020-07-28 Translation pen and control method therefor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910690400.2 2019-07-29
CN201910690400.2A CN112308063B (zh) 2019-07-29 2019-07-29 文字识别装置、翻译笔、图像翻译方法及图像翻译装置

Publications (1)

Publication Number Publication Date
WO2021018110A1 true WO2021018110A1 (zh) 2021-02-04

Family

ID=74230176

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105026 WO2021018110A1 (zh) 2019-07-29 2020-07-28 翻译笔及其控制方法

Country Status (3)

Country Link
US (1) US20220076042A1 (zh)
CN (1) CN112308063B (zh)
WO (1) WO2021018110A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627417A (zh) * 2021-08-16 2021-11-09 广州番禺职业技术学院 外语翻译用文本检阅装置及其实施方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11978267B2 (en) 2022-04-22 2024-05-07 Verkada Inc. Automatic multi-plate recognition
US11557133B1 (en) * 2022-04-22 2023-01-17 Verkada Inc. Automatic license plate recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169541A (zh) * 2011-04-02 2011-08-31 郝震龙 一种采用光学定位的字符识别输入系统及其方法
CN107220242A (zh) * 2017-04-19 2017-09-29 广东小天才科技有限公司 一种基于翻译笔的翻译方法、装置及系统
CN109263362A (zh) * 2018-10-29 2019-01-25 广东小天才科技有限公司 一种智能笔及其控制方法
US20190182402A1 (en) * 2017-12-07 2019-06-13 Nedal Shriesher Print scanner and translator
CN110874957A (zh) * 2018-08-30 2020-03-10 朱笑笑 翻译笔

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6704699B2 (en) * 2000-09-05 2004-03-09 Einat H. Nir Language acquisition aide
IT1390595B1 (it) * 2008-07-10 2011-09-09 Universita' Degli Studi Di Brescia Dispositivo di ausilio nella lettura di un testo stampato
US9251144B2 (en) * 2011-10-19 2016-02-02 Microsoft Technology Licensing, Llc Translating language characters in media content
US9519641B2 (en) * 2012-09-18 2016-12-13 Abbyy Development Llc Photography recognition translation
US9836456B2 (en) * 2015-01-12 2017-12-05 Google Llc Techniques for providing user image capture feedback for improved machine language translation
US20170177189A1 (en) * 2015-12-19 2017-06-22 Radean T. Anvari Method and System for Capturing Data on Display Using Scanner Pen
CN105718930A (zh) * 2016-01-26 2016-06-29 北京纽思曼教育科技有限公司 一种多功能翻译笔及其翻译方法
CN107992867A (zh) * 2016-10-26 2018-05-04 深圳超多维科技有限公司 用于手势指点翻译的方法、装置及电子设备
US10127673B1 (en) * 2016-12-16 2018-11-13 Workday, Inc. Word bounding box detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169541A (zh) * 2011-04-02 2011-08-31 郝震龙 一种采用光学定位的字符识别输入系统及其方法
CN107220242A (zh) * 2017-04-19 2017-09-29 广东小天才科技有限公司 一种基于翻译笔的翻译方法、装置及系统
US20190182402A1 (en) * 2017-12-07 2019-06-13 Nedal Shriesher Print scanner and translator
CN110874957A (zh) * 2018-08-30 2020-03-10 朱笑笑 翻译笔
CN109263362A (zh) * 2018-10-29 2019-01-25 广东小天才科技有限公司 一种智能笔及其控制方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627417A (zh) * 2021-08-16 2021-11-09 广州番禺职业技术学院 外语翻译用文本检阅装置及其实施方法
CN113627417B (zh) * 2021-08-16 2023-11-24 广州番禺职业技术学院 外语翻译用文本检阅装置及其实施方法

Also Published As

Publication number Publication date
CN112308063B (zh) 2022-07-29
CN112308063A (zh) 2021-02-02
US20220076042A1 (en) 2022-03-10

Similar Documents

Publication Publication Date Title
WO2021018110A1 (zh) 翻译笔及其控制方法
USRE47889E1 (en) System and method for segmenting text lines in documents
JP5379085B2 (ja) スキャンされた文書画像内の前景画素群の連結グループをマーキング種類に基づき分類する方法及びシステム
Dongre et al. Devnagari document segmentation using histogram approach
US6014450A (en) Method and apparatus for address block location
US8027539B2 (en) Method and apparatus for determining an orientation of a document including Korean characters
CN103310211B (zh) 一种基于图像处理的填注标记识别方法
CN111695555B (zh) 一种基于题号的精准框题方法、装置、设备和介质
US11823497B2 (en) Image processing system and an image processing method
CN112419260A (zh) 一种pcb文字区域缺陷检测方法
Mullick et al. An efficient line segmentation approach for handwritten Bangla document image
CN115588208A (zh) 一种基于数字图像处理技术的全线表结构识别方法
CN108062548B (zh) 一种盲文方自适应定位方法及系统
Naz et al. Challenges in baseline detection of cursive script languages
Munir et al. Automatic character extraction from handwritten scanned documents to build large scale database
Dongre et al. Segmentation of printed Devnagari documents
JP2006107534A (ja) 文字認識方法および文字認識装置
JP4492258B2 (ja) 文字・図形の認識方法および検査方法
Kleber et al. Document reconstruction by layout analysis of snippets
JP3914119B2 (ja) 文字認識方法および文字認識装置
Tikader et al. Edge based directional features for English-Bengali script recognition
Mandal et al. Slant Estimation and Correction for Online Handwritten Bengali Words
Saroui et al. Recognition of handwritten mathematical characters on whiteboards using colour images
CN106408021A (zh) 一种基于笔画粗细的手写体与印刷体的鉴别算法
CN117711004A (zh) 一种基于图像识别的表格文档信息抽取方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20846771

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20846771

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20846771

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20846771

Country of ref document: EP

Kind code of ref document: A1