WO2012016505A1 - 一种文件处理方法及文件处理装置 - Google Patents

一种文件处理方法及文件处理装置 Download PDF

Info

Publication number
WO2012016505A1
WO2012016505A1 PCT/CN2011/077865 CN2011077865W WO2012016505A1 WO 2012016505 A1 WO2012016505 A1 WO 2012016505A1 CN 2011077865 W CN2011077865 W CN 2011077865W WO 2012016505 A1 WO2012016505 A1 WO 2012016505A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
file
display
annotation content
font
Prior art date
Application number
PCT/CN2011/077865
Other languages
English (en)
French (fr)
Inventor
武亚强
张建忠
王哲鹏
徐超
王巍
Original Assignee
联想(北京)有限公司
北京联想软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 联想(北京)有限公司, 北京联想软件有限公司 filed Critical 联想(北京)有限公司
Priority to US13/813,720 priority Critical patent/US10210148B2/en
Publication of WO2012016505A1 publication Critical patent/WO2012016505A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes

Definitions

  • the present invention relates to the field of word processing technology, and in particular, to a file processing method and a file processing device. Background technique
  • the technical problem to be solved by the embodiments of the present invention is to provide a file processing method and a file processing device for implementing automatic annotation of specific words in a file to improve the reading experience of the user.
  • a file processing method including:
  • the displaying the first word and the annotation content comprises: displaying the first word according to a display scheme having a first display effect, and according to having a second display effect
  • the second display scheme displays the annotation content, wherein the first display effect and the second display The effect is different.
  • the displaying the first word and the annotation content including:
  • the display position has a space for accommodating the annotation content, and displays the first word according to the new layout, and displays the annotation content at the display position.
  • the predetermined condition is that the first word does not belong to the matching font or the first word belongs to the matching font.
  • the annotation content includes a phonetic symbol for annotating the pronunciation manner and intonation of the first word, interpretation information for interpreting the meaning of the first word, and an audio file for controlling playback of the first word pronunciation. Playing a control menu, and at least one of translating the first word with a different language than the language in which the first word belongs.
  • the matching font library includes a preset common font library and an error-prone font library, and the predetermined condition is that the first word does not belong to the common font library or the first word.
  • the error-prone font library belongs to the common error font, wherein the common font library includes a preset common word, and the error-prone font library includes a preset error-reading word.
  • the method further includes: performing, according to the context of the first word, the first word Word segmentation, get the result of the word segmentation;
  • the obtaining the annotation content corresponding to the first word comprises: querying a pre-set lexicon according to the word segmentation result, and obtaining a syllable symbol of the first word.
  • each of the predetermined fonts contains not exactly the same words; before the obtaining the files, the method further includes:
  • a predetermined one of the two or more predetermined fonts is set as the matching font.
  • the embodiment of the invention further provides a file processing device, including:
  • a first obtaining unit configured to obtain a file
  • a parsing unit configured to parse the file to obtain a first word included in the file
  • a matching unit configured to match the first word with a preset matching font library
  • An annotation unit configured to obtain, when the first word satisfies a predetermined condition, an annotation content corresponding to the first word
  • a display unit configured to display the first word and the annotation content.
  • the display unit includes:
  • An effect determining unit configured to determine a first display scheme of the first word and a second display scheme of the annotation content, where a first display effect of the first display scheme and a second display scheme Second, the display effect is different;
  • a display processing unit configured to display the first word according to the first display scheme when displaying the first word; and display the annotation content according to the second display scheme when displaying the annotation content .
  • the file processing device further includes:
  • a second obtaining unit configured to obtain an original layout of the file
  • a location determining unit configured to determine a display position of the annotation content relative to the first word
  • a determining unit configured to determine, at the display location in the original layout, whether there is a space to accommodate the annotation content
  • a typesetting unit configured to re-format the file to obtain a new typesetting when there is no space for accommodating the annotation content, so that the display location in the new layout has a space for accommodating the annotation content;
  • the display unit is further configured to display the first word according to the new layout obtained by the typesetting unit, and display the annotation content at the display position.
  • the display unit is further configured to display the first word according to the new layout obtained by the typesetting unit, and display the annotation content at the display position.
  • the commenting unit is further configured to: when the first word does not belong to the matching font, obtain the annotation content corresponding to the first word; or when the first word belongs to the matching font, obtain the The comment content corresponding to the first word.
  • the file processing device further includes:
  • a storage unit configured to store the annotation content, where the annotation content includes a phonetic symbol for annotating a pronunciation manner and a tone of the first word, and a definition information for explaining the meaning of the first word, And a playback control menu for controlling an audio file for playing the first word pronunciation, and at least one of translation contents for translating the first word with a different language than the language of the first word.
  • the storage unit is further configured to store a preset vocabulary
  • the file processing apparatus further includes: a word segmentation unit, configured to: after the first word is obtained by the parsing unit, according to the context of the first word, The first word is processed for word segmentation to obtain the word segmentation result;
  • a query unit configured to query, according to the word segmentation result, the term store stored by the storage unit to obtain a phonetic symbol of the first word.
  • the file processing device further includes:
  • a storage unit configured to store at least two predetermined fonts, each of the predetermined fonts containing words that are not completely identical;
  • a receiving unit configured to receive matching font setting information
  • a setting unit configured to set, according to the matching font setting information, a predetermined font stored by the storage unit as the matching font.
  • the file processing method and the file processing apparatus provided by the embodiments of the present invention can automatically annotate the words in the file that meet the predetermined conditions, thereby preventing the user from interrupting the reading process to query the words.
  • the user's reading consistency is ensured, and the embodiment of the present invention provides the user with a learning opportunity to learn more knowledge during the reading process, which improves the user's reading experience.
  • FIG. 1 is a schematic flowchart of a file processing method according to Embodiment 1 of the present invention.
  • FIG. 2 is a flow chart showing the steps of displaying words and comment contents according to Embodiment 1 of the present invention
  • FIG. 3 is a schematic flowchart of a file processing method according to Embodiment 2 of the present invention.
  • FIG. 4 is a schematic flowchart of a file processing method according to Embodiment 3 of the present invention.
  • FIG. 5 is a schematic flowchart diagram of a file processing method according to Embodiment 4 of the present invention.
  • FIG. 6 is a schematic flowchart of a file processing method according to Embodiment 5 of the present invention.
  • FIG. 7 is a schematic flowchart of a file processing method according to Embodiment 6 of the present invention.
  • FIG. 8 is a schematic flowchart of a file processing method according to Embodiment 7 of the present invention.
  • FIG. 9 is a schematic flowchart of a file processing method according to Embodiment 8 of the present invention.
  • FIG. 10 is a schematic structural diagram of a file processing apparatus according to an embodiment of the present invention.
  • the words obtained by parsing the file are matched with the matching matching fonts, and the words that need to be automatically annotated and their annotation contents are determined, and then automatically annotated during display to improve the reading experience of the user.
  • the file processing method in this embodiment can be applied to various electronic devices, such as a computer, a PDA, a mobile phone, an MP4, and an electronic paper book, and specifically includes the following steps:
  • the obtained file may be an electronic device that reads a locally saved file, or downloads the file from a network or other device, and may also be a file that is read online through a network.
  • the file in this embodiment is not limited to a specific file format, as long as it is a file that can be obtained after parsing, and specifically includes the following three categories:
  • Files that include both textual and non-textual content such as video files and streaming media files containing subtitle information.
  • the words in the embodiment include various language characters, and may specifically be Chinese characters, English words, French words, and the like.
  • Step 12 Parse the file to obtain a first word included in the file.
  • the file is parsed according to the file format, and the words included in the file are obtained. Specifically, the above three types of documents are separately described:
  • a file containing only text content After reading a file, obtain the text content contained in it, and you can get the words contained in the file, for example, the words contained in the Word file.
  • a file containing only non-text content After reading the file, the file is recognized by the text, and the non-text content thereof is converted into text content, thereby obtaining the word contained in the file, for example, performing image in the image. Text recognition obtains the word represented by the image.
  • a file that includes both textual and non-textual content After reading the file, ignore the non-text content contained in it and extract the text content contained in it to get the words contained in the text content. For example, for a video file, the video image in it is ignored, and the words in the subtitle content are extracted. For an e-book containing an image, the image in it is ignored, and the words in the text content are extracted. Of course, if there is something in the image that needs to be identified, you can use the second class above to do further processing.
  • the first word obtained by analysis will be described as an example.
  • Step 13 Match the first word with a preset matching font.
  • the matching font can have 1, 2 or more.
  • Step 14 When the first word satisfies a predetermined condition, obtain the comment content corresponding to the first word.
  • the first word is matched with a preset matching font to obtain a matching result of the first word; in step 14, when the matching result satisfies the predetermined condition, the first is obtained.
  • the comment content corresponding to the word may be that the first word does not belong to the matching font, and the first word that does not belong to the matching font is to be annotated; as another preferred embodiment, The predetermined condition may also be that the first word belongs to the matching font, and the first word belonging to the matching font is to be annotated.
  • the annotation content may be pre-stored in a database, and the database may be stored in a storage unit local to the electronic device or a storage unit on a network connected to the electronic device. Step 14 above And, when the first word satisfies a predetermined condition, searching the database by using the first word as an index, and determining an annotation content corresponding to the first word.
  • the comment content includes at least one of the following four types:
  • a phonetic symbol for annotating the pronunciation mode and intonation of the first word for example, a Chinese pinyin and a tone for a Chinese character, and a phonetic symbol and an accent symbol for an accented syllable for an English word.
  • the interpretation information for explaining the meaning of the first word specifically can be used in the definition of the standard dictionary in the respective language, for example, for a Chinese character, you can use the "Xinhua Dictionary”, “Ancient Chinese Dictionary”, etc. Interpretation of the text in the text.
  • a play control menu for controlling an audio file for playing the first word pronunciation, by which the play audio file can be controlled to play a specific way to demonstrate the specific pronunciation of the first word.
  • translating the first word with a language different from the language in which the first word belongs for example, for the first word of a Chinese character, it may be translated in English, French or other languages
  • the first word of an English word can be a translated content translated in Chinese.
  • Step 15 Display the first word and the comment content.
  • the first word is matched with the matching font by the above steps 13 and 14, and it is determined whether the matching result of the first word satisfies a predetermined condition, and if it is satisfied, the corresponding comment is added to the first word.
  • Content At this time, the annotation content corresponding to the first word is determined, and when the file is displayed in step 15, the first word and its corresponding annotation content are simultaneously displayed.
  • step 13 If the matching result of the first word obtained in step 13 does not satisfy the predetermined condition, it is not necessary to add the comment content to the first word, and the first word can be directly displayed.
  • the first word that satisfies the predetermined condition in the file content is automatically annotated, so that the user does not need to query the first word automatically during the reading process.
  • Obtaining the relevant annotation information of the first word realizing the necessary knowledge for the user during the reading process, increasing the knowledge of the user, improving the user's full understanding of the file content, simplifying the user's reading operation, and improving the user. Reading experience.
  • the above step 12 specifically includes: Steps: Parsing the file to obtain the first content to be displayed in the file.
  • the content to be displayed may be just one page in the document file; for a streaming file, the content to be displayed may be just a certain frame of data.
  • Steps extract the first word included in the first content.
  • the first word is a word included in the first content.
  • step 15 If the first word satisfies a predetermined condition, obtaining the dry content corresponding to the first word, and in step 15, displaying the first content including the first word, and displaying the first A comment corresponding to a word.
  • step 15 specifically includes:
  • Step 151 Obtain an original layout of the file.
  • Step 152 Determine a display position of the annotation content relative to the first word.
  • the display position of the comment content can be determined according to reading habits. For example, when the comment content is a pinyin of a Chinese character, the display position is usually directly above the corresponding Chinese character; when the comment content is a phonetic symbol of an English word, the display position is usually followed by the English word, and the English word On the same line (if the same line does not show up, you can go to the next line).
  • Step 153 Determine whether there is a space at the display position in the original layout to accommodate the annotation content.
  • the line spacing of the text is too small, so that the original text is occluded when the annotation content is displayed in the line interval of the text; or the word spacing is too small, so that The original text is occluded when the annotation content is displayed in the word interval, and so on.
  • Step 154 when there is no space to accommodate the annotation content, re-type the file to obtain a new layout, so that the display location in the new layout has a space for accommodating the annotation content, and according to the Displaying the first word in a new layout, and displaying the annotation content at the display position
  • the line spacing may be adjusted as needed (such as increasing the line spacing), or a new line may be added for displaying the annotation content, or the number of the annotation may be increased.
  • the word spacing between a word and the next word of the first word to make a note The release content has enough space for accommodation.
  • the typesetting may not be adjusted, but the annotation content is first transparentized to increase the transparency to a predetermined value, and then the annotation content is superimposed and displayed on the At the position of the first word, you can see the contents of the comment without affecting the display of the first word.
  • Step 155 When there is space to accommodate the annotation content, display the first word according to the original layout, and display the annotation content at the display position.
  • the embodiment may further display the first type of words and the comment content by using different display effects in the above step 15: displaying the first display scheme according to the first display effect. a first word, and displaying the annotation content according to a second display scheme having a second display effect, wherein the first display effect and the second display effect are different.
  • the user may preset different display schemes for the first type of words and the annotation content according to personal preferences, or may be a default display scheme set in advance for the first type of words and the annotation content in the electronic device.
  • the content of the display scheme includes: font type, size, color, transparency, whether it is statically displayed, whether it is displayed dynamically (such as flashing display, gradient display, etc.).
  • the display scheme corresponding to each of the first word and the annotation content is determined, and then displayed according to the respective display schemes to achieve different display effects.
  • the above display scheme may also be combined with the above steps 151-155, in which the first word is displayed according to the new layout in step 154, and the annotation content is displayed at the display position. And displaying the first word according to the first display scheme, and displaying the annotation content according to the second display effect; displaying the first word according to an original layout in step 155, and displaying the first word
  • the first word may be further displayed according to the first display scheme, and the annotation content may be displayed according to the second display effect.
  • the embodiment further provides a file processing device.
  • the file processing device 80 specifically includes:
  • a first obtaining unit configured to obtain a file
  • a parsing unit configured to parse the file to obtain a first word included in the file
  • a matching unit configured to match the first word with a preset matching font library
  • An annotation unit configured to obtain a note corresponding to the first word when the first word satisfies a predetermined condition Release content
  • the display unit configured to display the first word and the annotation content.
  • the display unit includes:
  • An effect determining unit configured to determine a first display scheme of the first word and a second display scheme of the annotation content, wherein a first display effect of the first display scheme and a second display scheme Second, the display effect is different;
  • a display processing unit configured to display the first word according to the first display scheme when displaying the first word; and display the annotation content according to the second display scheme when displaying the annotation content .
  • the file processing apparatus further includes:
  • a second obtaining unit configured to obtain an original layout of the file
  • a location determining unit configured to determine a display position of the annotation content relative to the first word
  • a determining unit configured to determine, at the display location in the original layout, whether there is a space to accommodate the annotation content
  • a typesetting unit configured to re-format the file to obtain a new typesetting when there is no space for accommodating the annotation content, so that the display location in the new layout has a space for accommodating the annotation content;
  • the display unit is further configured to display the first word according to the new layout obtained by the typesetting unit, and display the annotation content at the display position.
  • the commenting unit is further configured to: when the first word does not belong to the matching font, obtain the annotation content corresponding to the first word; or if the first word belongs to the matching When the font is used, the annotation content corresponding to the first word is obtained.
  • the file processing apparatus further includes:
  • a storage unit configured to store the annotation content, where the annotation content includes a phonetic symbol for annotating a pronunciation manner and a tone of the first word, and a definition information for explaining the meaning of the first word, And a playback control menu for controlling an audio file for playing the first word pronunciation, and at least one of translation contents for translating the first word with a different language than the language of the first word.
  • FIG. 10 further shows the file processing apparatus according to the embodiment, in the file The display effect after Chinese kanji processing, wherein the left side displays the file displayed according to the original layout, and the right side is the display effect processed by the embodiment, wherein the pinyin label for the very word " ⁇ " is added: Yayu ". ⁇ Embodiment 2>
  • the embodiment may pre-process the entire content of the file to obtain all the words included in the file, thereby determining whether the first type of the comment content and the corresponding annotation content need to be displayed;
  • the first type of words included in the content to be displayed are determined, so that the corresponding comment content is displayed when displayed.
  • the body includes the following steps:
  • Step 21 obtain the file.
  • Step 22 Parse the file, obtain the entire content of the file, and extract all the words included in the entire content.
  • Step 23 Match all the words one by one with a matching font library set in advance, and select a first type of word that satisfies a predetermined condition.
  • the predetermined condition and the matching font can be set in the same manner as in the first embodiment.
  • Step 24 Obtain the annotation content corresponding to the first type of word.
  • Step 25 Determine a first content to be displayed in the file, and select a second type of word belonging to the first content from the first type of words;
  • Step 26 Display the first content including the second type of word, and display the annotation content corresponding to the second type of word.
  • all the words included in the file are taken as an example.
  • the first type of words satisfying the predetermined condition is selected from all the words in the file by matching the matching of the fonts, and then A type of word obtains the corresponding comment content; then, when a certain content is specifically displayed, the comment content corresponding to the first type of word in the content is simultaneously displayed, thereby also achieving the purpose of automatically annotating a specific word in the file.
  • the annotation content includes a phonetic symbol as an example for further explanation.
  • the matching font library in the embodiment includes a common font library set in advance, and the predetermined condition is that the first word does not belong to the common font library.
  • the commonly used font contains pre-set common words. For example, for Chinese characters, Chinese national standards can be used.
  • the Chinese characters in the first-level fonts included in GB2312 are used as common words; for English words, the English words of the university public English CET-6 can be used as common words, and so on.
  • the file processing method in this embodiment can be applied to various electronic devices, such as a computer, a PDA, a mobile phone, an MP4, and an electronic paper book, and specifically includes the following steps:
  • Step 31 obtain the file.
  • Step 32 Parse the file to obtain a first word included in the file.
  • Step 33 Match the first word with a preset matching font, where the matching dictionary includes a preset common font.
  • Step 34 When the first word does not belong to the common font, obtain the annotation content corresponding to the first word, the annotation content includes a diacritic symbol, and may further include content such as definition information.
  • Step 35 Display the first word and the comment content.
  • the embodiment realizes the function of automatically annotating the words with great words, so that the user can learn the very words in the reading process, improves the efficiency of reading and learning, and improves the reading experience of the user.
  • the annotation content includes a phonetic symbol as an example for further explanation.
  • the user can take the initiative to check the dictionary to obtain the pronunciation, interpretation and other information for the reading process, but for some erroneous words, if the user regards the wrong pronunciation as the correct pronunciation, the user is During the reading process, the pronunciation of the erroneous word is usually not confirmed, and the error cannot be corrected, and the correct pronunciation cannot be learned.
  • the pronunciation of the words in the error-prone font library is automatically marked during the reading process, thereby providing the user with an opportunity to learn a correct pronunciation and improve the user's reading experience.
  • the matching font library in the embodiment includes a pre-set error-prone font library, and the predetermined condition is that the first word belongs to the error-prone font library.
  • the error-prone font contains pre-set words that are easy to read, for example, polyphonic words in Chinese characters, such as "line” characters. There are different pronunciations in "bank,” and “pedestrian”; for example, the place name “San Jose” in English is an English phrase derived from Spanish and is often misread. This easy-to-read word, When determining the pronunciation, it is necessary to perform word segmentation according to the context, and to find the thesaurus with the pronunciation information according to the result of the word segmentation, in order to determine the accurate pronunciation.
  • the file processing method in this embodiment can be applied to various electronic devices, such as a computer, a PDA, a mobile phone, an MP4, and an electronic paper book, and specifically includes the following steps:
  • Step 41 obtain the file.
  • Step 42 Parse the file to obtain the first word contained in the file.
  • Step 43 Match the first word with a preset matching font, where the matching dictionary includes a preset error-tolerant font.
  • Step 44 When the first word belongs to the error-prone font, obtain an annotation content corresponding to the first word, where the annotation content includes a phonetic symbol, and may further include content such as definition information.
  • Step 45 Display the first word and the comment content.
  • the method further includes: according to the first The context of the word, the word segmentation process is performed on the first word to obtain a word segmentation result; in the above step 44, the predetermined word library is further queried according to the word segmentation result, and the phonetic symbol of the first word is obtained.
  • the embodiment realizes the function of automatically annotating words that are easy to read, so that the user can learn the correct pronunciation of the erroneous words during the reading process, improve the efficiency of reading and learning, and improve the reading experience of the user.
  • the embodiment also provides a file processing apparatus, which specifically includes:
  • a first obtaining unit configured to obtain a file
  • a parsing unit configured to parse the file to obtain a first word included in the file
  • a matching unit configured to match the first word with a preset matching font library
  • a commenting unit configured to obtain, when the first word satisfies a predetermined condition, an annotation content corresponding to the first word
  • a display unit configured to display the first word and the annotation content
  • a storage unit configured to store the annotation content, where the annotation content includes a labeling a pronunciation manner of the first word and a phonetic symbol of the intonation, a definition information for explaining the meaning of the first word, a playback control menu for controlling an audio file for playing the first word pronunciation, and a utilization difference At least one of the translated content of the first word translated by another language of the language to which the first word belongs.
  • the storage unit is further configured to store a preset vocabulary; when the annotation content includes the diacritic symbol, the file processing apparatus further includes: a word segmentation unit, configured to After obtaining the first word, the parsing unit performs word segmentation processing on the first word according to the context of the first word to obtain a word segmentation result;
  • a query unit configured to query, according to the word segmentation result, the term store stored by the storage unit to obtain a phonetic symbol of the first word.
  • the matching font library includes a common font library and an error-prone font library.
  • the predetermined condition is that the first word does not belong to the common font library or the first word belongs to the error-prone font library.
  • the file processing method described in this embodiment is as shown in FIG. 6, and specifically includes the following steps: Step 51: Obtain a file.
  • Step 52 Parse the file to obtain a first word included in the file.
  • Step 53 Match the first word with a preset common font: when the first word belongs to the common font, go to step 54, when the first word does not belong to the common font, go to step 55. .
  • Step 54 Match the first word with a preset error-tolerant font library: when the first word belongs to the error-prone font library, proceed to step 55, when the first word does not belong to the error-prone font library, Go to step 57.
  • Step 55 Obtain the annotation content corresponding to the first word, and then proceed to step 56.
  • Step 56 Display the first word and the comment content, where the comment content includes a phonetic symbol.
  • Step 57 displaying the first word.
  • the above steps first match the first word with the common font, and if the first word belongs to the common font, further determine that the first word matches the error-to-error font, and finally determine whether the first word is an error-prone word: if yes, the first word needs to be determined.
  • the comment content and display the first word and its comment content when displayed.
  • the embodiment may also change the order of the foregoing matching, first matching the first word with the error-prone font library, and if the first word does not belong to the error-prone font library, further determining that the first word matches the common font database, and finally determining whether the first word is It is easy to typo or very word.
  • the corresponding comment content is added to the first word when the first word is displayed, thereby improving the user's reading experience.
  • the embodiment presets at least two predetermined font banks, each of the predetermined font banks containing words that are not identical.
  • the file processing method in this embodiment can be applied to various electronic devices such as a computer, a PDA, a mobile phone, an MP4, and an electronic paper book. As shown in FIG. 7, the method includes the following steps: Step 61: Receive user input Match font setting information;
  • Step 62 Set, according to the matching font setting information, a predetermined font library of the at least two predetermined font banks as a matching font library.
  • Step 63 obtain the file.
  • Step 64 Parse the file to obtain a first word included in the file.
  • Step 65 Match the first word with the matching font.
  • Step 66 When the matching result of the first word satisfies a predetermined condition, obtain the comment content corresponding to the first word.
  • Step 67 Display the first word and the comment content.
  • step 65 if the matching result of the first word in step 65 does not satisfy the predetermined condition, it is not necessary to display the comment content of the first word at the time of display.
  • the embodiment realizes the function of automatically annotating the very words, so that the user is
  • the use of words in the reading process can improve the efficiency of reading and learning and improve the user's reading experience.
  • the first word on which the annotated content is displayed can be learned. After reading the file for a certain number of times, the user may have already grasped the comment content of the first word, and the necessity of displaying the comment content of the first word at this time is greatly reduced. Therefore, in this embodiment, after the matching font is set, the number of times the file is displayed is further counted.
  • determining whether the number of times the file is displayed reaches a preset number of times corresponding to the matching lexicon if the number of times corresponding to the matching lexicon is reached, then The annotation content is not displayed when the first word is displayed; if the number of times corresponding to the matching lexicon is not reached, the first word and the annotation content are simultaneously displayed.
  • the embodiment also provides a file processing apparatus, which specifically includes:
  • a first obtaining unit configured to obtain a file
  • a parsing unit configured to parse the file to obtain a first word included in the file
  • a matching unit configured to match the first word with a preset matching font library
  • An annotation unit configured to obtain, when the first word satisfies a predetermined condition, an annotation content corresponding to the first word
  • a display unit configured to display the first word and the annotation content
  • a storage unit configured to store at least two predetermined fonts, each of the predetermined fonts containing words that are not completely identical;
  • a receiving unit configured to receive matching font setting information
  • a setting unit configured to set, according to the matching font setting information, a predetermined font stored by the storage unit as the matching font.
  • the present embodiment sets the current matching font according to the number of times the user reads the file, so that the matching font is adapted to the current user's cognitive level, as follows:
  • the threshold of the number of times corresponding to each of the predetermined fonts is set in advance, wherein the thresholds of the times corresponding to each of the predetermined fonts are different.
  • the file processing method in this embodiment can be applied to various electronic devices, such as a computer, a PDA, a mobile phone, an MP4, and an electronic paper book. As shown in FIG. 8, the following steps are specifically included:
  • Step 71 Count the number of times the file is displayed.
  • Step 72 Select, according to the number of times of display, a first predetermined font from the at least two predetermined fonts, thereby obtaining matching font setting information including the first predetermined font information, wherein the first predetermined font
  • the predetermined font library having the minimum number of thresholds in the predetermined font whose number of times is greater than the number of times of display.
  • Step 73 Set the first predetermined font library as a current matching font library according to the matching font setting information.
  • Step 74 obtaining a file.
  • Step 75 Parse the file to obtain a first word included in the file.
  • Step 76 Match the first word with the matching font.
  • Step 77 When the matching result of the first word satisfies a predetermined condition, obtain the comment content corresponding to the first word.
  • Step 78 Display the first word and the comment content.
  • the above step 73 of this embodiment is different from step 62 of the sixth embodiment.
  • the matching font setting information automatically generated by the electronic device according to the predetermined policy is used, and then the corresponding predetermined font is set as the matching font according to the matching font setting information, instead of receiving in steps 61 and 62 of the sixth embodiment.
  • the matching font setting information input by the user is used to perform matching font setting according to the information.
  • the function of automatically matching the current matching font is automatically set according to the number of times the file is read (displayed), so that the matching font is adapted to the current cognitive level of the user.
  • An example is as follows: Assume that the predetermined font is a common font, and there are 3 different levels of common fonts, the number of common words included in the first-level common font ⁇ the number of common words contained in the secondary common font ⁇ three-level common font The number of common words included in the set, and the number of times corresponding to the common fonts of the first level ⁇ the number of times corresponding to the common fonts of the second level ⁇ the threshold of the number of commonly used fonts of the third level.
  • a possible example is listed in the following table: First-level common fonts, second-level common fonts, three-level common fonts
  • the meaning of the threshold is that if the number of times of display reaches the threshold of the current matching font, a predetermined font with a higher number of thresholds should be selected as the matching font. For example, if the current matching font is a common font, if the number of times the file has been displayed has reached 3 times, the secondary common font with the threshold higher than 3 should be selected as the matching font; if the number of times the file has been displayed has reached 3 times, then The second-level and third-level common fonts with the threshold higher than 3 select the second-level common font with a smaller threshold of 10 as the matching font; if the number of times the file has been displayed has reached more than 30, the common font with the threshold of more than 30 does not exist. Therefore, the matching font is no longer set. At this time, since the file has been displayed many times, the user has fully learned the very common words, so it is not necessary to display the comment content.
  • the embodiment pre-sets a predetermined font library, the words contained in the predetermined font library have at least two pronunciations, wherein the first pronunciation corresponds to the first geographic location, and the second pronunciation corresponds to the second geographic location.
  • the first geographic location is different from the second geographic location.
  • a phonetic symbol database is also provided, in which the phonetic symbols of the different pronunciations of the words in the predetermined font are stored in different geographical locations.
  • the file processing method in this embodiment is applied to an electronic device, as shown in FIG. 9, specifically including the following steps:
  • Step 81 obtain the file.
  • Step 82 Parse the file to obtain a first word included in the file.
  • Step 83 Match the first word with the predetermined font.
  • Step 84 When the first word belongs to the predetermined font, obtain a current geographic location of the electronic device.
  • the current geographic location of the electronic device may be queried according to the IP address of the electronic device, and the database storing the correspondence between the geographic location and the IP address may be obtained to obtain the current geographic location of the electronic device;
  • the electronic device can also be located using GPS global positioning system to obtain the current geographic location of the electronic device.
  • Step 85 Search the phonetic symbol database according to a current geographic location of the electronic device, and determine a first phonetic symbol of the pronunciation of the first word when the current geographic location is.
  • Step 86 Display the first word and the first phonetic symbol.
  • the present embodiment can display the pronunciation of the word at the current geographic location according to the current geographic location of the user, so that the user can follow the local customs and facilitate communication between the user and the local resident.
  • the file processing method and the file processing apparatus provided by the embodiments of the present invention can automatically annotate a word in a file that meets a predetermined condition, thereby preventing the user from interrupting the reading process to perform an operation of querying the words.
  • the user's reading consistency is ensured, and the embodiment of the present invention provides the user with a learning opportunity to learn more knowledge during the reading process, which improves the user's reading experience.

Abstract

本发明实施例提供了一种文件处理方法及文件处理装置。其中所述文件处理方法包括:获得文件;解析所述文件获得所述文件包含的第一字;将所述第一字与预先设置的匹配字库匹配;在所述第一字满足预定条件时,获得所述第一字对应的注释内容;显示所述第一字和所述注释内容。本发明实施例能够对文件中特定的字实现自动注释,改善用户的阅读体验。

Description

一种文件处理方法及文件处理装置 技术领域
本发明涉及文字处理技术领域,具体涉及一种文件处理方法及文件处理装 置。 背景技术
用户在电子设备(如计算机、 个人数字助理 PDA、 手机、 电纸书等) 阅 读文件时,经常会遇到一些不认识的字词或者一些不能确定其含义或发音的字 词, 如生僻字词和 /或多音字等, 这些内容会影响用户对文件内容的充分理解。
为了获得对文件内容的充分理解,现有技术中用户在阅读文件时,例如在 遇到生僻字词, 需要中断阅读过程去对指定字词进行相关查询操作, 例如查询 字典, 确定指定字词的发音和含义。 显然, 这种查询操作需要用户在畅快淋漓 的阅读过程中不得不中断阅读, 因此将打断阅读的连贯性,严重影响用户的阅 读体验。 发明内容
本发明实施例所要解决的技术问题是提供一种文件处理方法及文件处理 装置, 用以实现对文件中特定的字实现自动注释, 改善用户的阅读体验。
为解决上述技术问题, 本发明实施例提供方案如下:
一种文件处理方法, 包括:
获得文件;
解析所述文件获得所述文件包含的第一字;
将所述第一字与预先设置的匹配字库匹配;
在所述第一字满足预定条件时, 获得所述第一字对应的注释内容; 显示所述第一字和所述注释内容。
优选地,上述的文件处理方法中, 所述显示所述第一字和所述注释内容包 括: 按照具有第一显示效杲的显示方案显示所述第一字, 以及按照具有第二显 示效果的第二显示方案显示所述注释内容, 其中,所述第一显示效果和第二显 示效果不同。
优选地, 上述的文件处理方法中,
所述显示所述第一字和所述注释内容, 包括:
获得所述文件的原始排版;
确定所述注释内容相对于所述第一字的显示位置;
判断所述原始排版中的所述显示位置处, 是否有空间容纳所述注释内容; 在没有空间容纳所述注释内容时, 对所述文件进行重新排版得到一新排 版,使得所述新排版中的所述显示位置处有容纳所述注释内容的空间, 并按照 所述新排版显示所述第一字, 并在所述显示位置处显示所述注释内容。
优选地, 上述的文件处理方法中,
所述预定条件为所述第一字不属于所述匹配字库或所述第一字属于所述 匹配字库。
优选地, 上述的文件处理方法中,
所述注释内容包括用于标注所述第一字的发音方式及语调的标音符号、用 于解释所述第一字含义的释义信息、用于控制播放所述第一字发音的音频文件 的播放控制菜单、和利用不同于所述第一字所属语言的其它语言对所述第一字 进行翻译的翻译内容中的至少一种。
优选地, 上述的文件处理方法中,
在所述注释内容包括所述标音符号时,所述匹配字库包括预先设置的常用 字库和易错字库,所述预定条件为所述第一字不属于所述常用字库或所述第一 字属于所述易错字库, 其中所述常用字库包含有预先设定的常用字, 所述易错 字库包括预先设定的容易读错的字。
优选地, 上述的文件处理方法中,
在所述注释内容包括所述标音符号时,在所述解析所述文件获得所述文件 包含的第一字后, 还包括: 根据所述第一字的上下文, 对所述第一字进行分词 处理, 得到分词结果;
所述获得所述第一字对应的注释内容, 包括: 根据所述分词结果, 查询预 先设定的词库, 获得所述第一字的标音符号。
优选地, 上述的文件处理方法中, 存在至少两个预定字库, 每个所述预定字库包含的字不完全相同; 在所述获得文件之前, 还包括:
接收匹配字库设置信息;
才艮据所述匹配字库设置信息,将所述两个以上的预定字库中的一预定字库 设置为所述匹配字库。
本发明实施例还提供了一种文件处理装置, 包括:
第一获得单元, 用于获得文件;
解析单元, 用于解析所述文件获得所述文件包含的第一字;
匹配单元, 用于将所述第一字与预先设置的匹配字库匹配;
注释单元, 用于在所述第一字满足预定条件时,获得所述第一字对应的注 释内容;
显示单元, 用于显示所述第一字和所述注释内容。
优选地, 上述文件处理装置中, 所述显示单元包括:
效果确定单元,用于确定所述第一字的第一显示方案以及所述注释内容的 第二显示方案, 其中, 所述第一显示方案的第一显示效果和所述第二显示方案 的第二显示效果不同;
显示处理单元, 用于在显示所述第一字时,按照所述第一显示方案显示所 述第一字; 以及在显示所述注释内容时,按照所述第二显示方案显示所述注释 内容。
优选地, 上述文件处理装置中, 还包括:
第二获得单元, 用于获得所述文件的原始排版;
位置确定单元, 用于确定所述注释内容相对于所述第一字的显示位置; 判断单元, 用于判断所述原始排版中的所述显示位置处,是否有空间容纳 所述注释内容;
排版单元, 用于在没有空间容纳所述注释内容时,对所述文件进行重新排 版得到一新排版,使得所述新排版中的所述显示位置处有容纳所述注释内容的 空间;
所述显示单元,还用于按照所述排版单元得到的所述新排版显示所述第一 字, 并在所述显示位置处显示所述注释内容。 优选地, 上述文件处理装置中,
所述注释单元,进一步用于在所述第一字不属于所述匹配字库时,获得所 述第一字对应的注释内容; 或者在所述第一字属于所述匹配字库时,获得所述 第一字对应的注释内容。
优选地, 上述文件处理装置中, 还包括:
存储单元, 用于存储所述注释内容, 其中, 所述注释内容包括用于标注所 述第一字的发音方式及语调的标音符号、 用于解释所述第一字含义的释义信 息、用于控制播放所述第一字发音的音频文件的播放控制菜单、和利用不同于 所述第一字所属语言的其它语言对所述第一字进行翻译的翻译内容中的至少 一种。
优选地, 上述文件处理装置中,
所述存储单元还用于存储预先设定的词库;
在所述注释内容包括所述标音符号时, 所述文件处理装置还包括: 分词单元,用于在所述解析单元获得所述第一字后,根据所述第一字的上 下文, 对所述第一字进行分词处理, 得到分词结果;
查询单元, 用于根据所述分词结果, 查询所述存储单元存储的所述词库, 获得所述第一字的标音符号。
优选地, 上述文件处理装置中, 还包括:
存储单元, 用于存储至少两个预定字库,每个所述预定字库包含的字不完 全相同;
接收单元, 用于接收匹配字库设置信息;
设置单元, 用于根据所述匹配字库设置信息,将所述存储单元存储的一预 定字库设置为所述匹配字库。
从以上所述可以看出, 本发明实施例提供的文件处理方法及文件处理装 置, 能够自动对文件中符合预定条件的字进行注释,从而避免了用户中断阅读 过程去对这些字进行查询的操作,保证了用户阅读连贯性, 同时本发明实施例 还在阅读过程中向用户提供了一个学习更多知识的学习机会,这些都改善了用 户的阅读体验。 附图说明
图 1为本发明实施例一所述的文件处理方法的流程示意图;
图 2为本发明实施例一的显示字和注释内容的步骤的流程图;
图 3为本发明实施例二所述的文件处理方法的流程示意图;
图 4为本发明实施例三所述的文件处理方法的流程示意图;
图 5为本发明实施例四所述的文件处理方法的流程示意图;
图 6为本发明实施例五所述的文件处理方法的流程示意图;
图 7为本发明实施例六所述的文件处理方法的流程示意图;
图 8为本发明实施例七所述的文件处理方法的流程示意图;
图 9为本发明实施例八所述的文件处理方法的流程示意图;
图 10为本发明实施例所述的文件处理装置的结构示意图; 具体实施方式
本发明实施例通过将解析文件所获得的字,与预先设置的匹配字库进行匹 配, 确定需要自动注释的字及其注释内容, 进而在显示时进行自动注释, 用以 改善用户的阅读体验。 以下将结合附图,通过具体实施例对本发明做进一步的 说明。
<实施例一>
如图 1 所示, 本实施例所述的文件处理方法, 可以应用在诸如计算机、 PDA, 手机、 MP4和电纸书等各种电子设备中, 具体包括以下步骤:
步骤 11 , 获得文件。
这里, 获得文件可以是电子设备读取本地保存的文件,或者是从网络或其 它设备处下载所述文件,还可以是通过网络在线阅读的文件。本实施例所述文 件不限于具体文件格式, 只要是解析后能够得到字的文件即可,具体包括以下 三类:
( 1 )仅包括文本内容的文件, 如 Word文档文件和 WPS文档文件等。
( 2 )仅包括非文本内容的文件, 如 PDF文件, 图片文件等。
( 3 ) 既包括文本内容也包括非文本内容的文件, 如包含有字幕信息的视 频文件和流媒体文件等。 本实施例中所述字包括各种语言文字, 具体可以是中文汉字、 英语单词、 法语单词等等。
步骤 12, 解析所述文件获得所述文件包含的第一字。
这里, 才艮据所述文件格式, 对所述文件进行解析, 获得所述文件中包括的 字。 具体的, 针对以上三类文件分别进行说明:
( 1 )仅包括文本内容的文件: 读取文件后, 获取其中包含的文本内容, 即可得到该文件中包含的字, 例如, Word文件中包含的字。
( 2 )仅包括非文本内容的文件: 读取文件后, 对文件进行文字识别, 将 其中的非文本内容转换为文本内容, 从而得到该文件中包含的字, 例如, 对图 片中的图像进行文字识别获得该图像代表的字。
( 3 ) 既包括文本内容也包括非文本内容的文件: 读取文件后, 忽略其中 包含的非文本内容,提取其中包含的文本内容,从而得到文本内容中包含的字。 例如, 针对视频文件, 忽略其中的视频图像, 而提取其中的字幕内容中的字。 针对包含图像的电子书, 忽略其中的图像, 而提取其中的文本内容中的字。 当 然,如果图像中同样存在需要识别的内容, 可以使用上述第 2类的方式进行进 一步处理。
下面以解析得到的第一字为例进行说明。
步骤 13 , 将所述第一字与预先设置的匹配字库匹配。 这里, 匹配字库可 以有 1个、 2个或者多个。
步驟 14, 在所述第一字满足预定条件时, 获得所述第一字对应的注释内 容。
这里, 步骤 13中将所述第一字与预先设置的匹配字库匹配, 得到所述第 一字的匹配结果; 在步骤 14中, 在所述匹配结果满足所述预定条件时则去获 得第一字对应的注释内容。作为一个优选实施例, 所述预定条件具体可以是所 述第一字不属于所述匹配字库,此时将对不属于所述匹配字库中的第一字进行 注释; 作为另一个优选实施例, 所述预定条件还可以是所述第一字属于所述匹 配字库, 此时将对属于所述匹配字库中的第一字进行注释。
本实施例中, 注释内容可以预先保存数据库中,该数据库可以保存在电子 设备本地的存储单元或与电子设备连接的网络上的存储单元中。 上述步骤 14 中, 在所述第一字满足预定条件时, 以所述第一字为索引, 查找所述数据库, 确定所述第一字对应的注释内容。 这里, 所述的注释内容, 至少包括以下四种 中的一种:
A )、 用于标注所述第一字的发音方式及语调的标音符号, 例如, 对于中 文汉字可以是汉语拼音及声调,对于英语单词则是其音标及标示重读音节的重 音符号等等。
B )、 用于解释所述第一字含义的释义信息, 具体可以使用各自语言中的 标准字典中的释义, 例如, 对于某个中文汉字, 可以使用 《新华字典》、 《古汉 语字典》等中的文字释义。
C )、 用于控制播放所述第一字发音的音频文件的播放控制菜单, 通过该 播放控制菜单可以控制播放音频文件,用以通过声音方式演示所述第一字的具 体发音。
D )、 利用不同于所述第一字所属语言的其它语言对所述第一字进行翻译 的翻译内容, 例如对于中文汉字的第一字, 可以是利用英语、 法语或其它语言 对其进行翻译的翻译内容; 对于英语单词的第一字, 可以是利用中文对其进行 翻译的翻译内容。
步骤 15 , 显示所述第一字和所述注释内容。
这里, 通过上述步骤 13和 14, 将所述第一字与匹配字库进行匹配, 判断 所述第一字的匹配结果是否满足预定条件,若满足, 则表示为所述第一字增加 相应的注释内容, 此时确定所述第一字所对应的注释内容, 并在步骤 15中显 示所述文件时, 将所述第一字及其对应的注释内容同时予以显示。
如果在步骤 13中得到的所述第一字的匹配结果不满足预定条件, 则无需 为第一字增加注释内容, 直接显示第一字即可。
通过本实施例的上述步骤, 本实施例在显示文件时,对文件内容中满足预 定条件的第一字实现了自动注释,使得在阅读过程中用户无需对所述第一字进 行查询便可自动获得第一字的相关注释信息,实现了在阅读过程中为用户提供 必要的知识, 增加了用户的知识量, 提高了用户对文件内容的充分理解, 并简 化了用户的阅读操作, 改善了用户的阅读体验。
作为一个优选的实施方式, 本实施例在所述文件的显示过程中,对将要显 示的内容中包含的字进行实时处理, 此时上述步骤 12具体包括: 步骤, 解析所述文件, 获得所述文件中将要显示的第一内容。 例如, 对于 文档文件,将要显示的内容可能只是该文档文件中的某一页;对于流媒体文件, 将要显示的内容可能只是某一帧数据。
步骤, 提取所述第一内容中包含的第一字。 这里, 所述第一字是所述第一 内容中包含的字。
如果所述第一字满足预定条件, 则获得所述第一字对应的注幹内容, 并在 步骤 15中, 显示包括所述第一字在内的所述第一内容, 同时显示所述第一字 对应的注释内容。
以下通过一个优选的实施方式, 说明上述步骤 15的具体步骤, 如图 2所 示, 步骤 15具体包括:
步骤 151, 获得所述文件的原始排版。
步骤 152, 确定所述注释内容相对于所述第一字的显示位置。
这里,步骤 152中,注释内容的显示位置可以根据阅读习惯来确定。例如, 在所述注释内容是中文汉字的拼音时,显示位置通常是对应汉字的正上方; 在 所述注释内容是英语单词的音标时,显示位置通常是紧随该英语单词, 与该英 语单词在同一行(如果同一行显示不下, 则可以顺延至下一行)。
步骤 153, 判断所述原始排版中的所述显示位置处, 是否有空间容纳所述 注释内容。
这里, 没有空间容纳所述注释内容, 可能是文本的行间距太小, 以致于在 文本的行间隔中显示所述注释内容时会对原始文本形成遮挡;还有可能是字间 隔太小,以致于在字间隔中显示所述注释内容时会对原始文本形成遮挡,等等。
步骤 154, 在没有空间容纳所述注释内容时, 对所述文件进行重新排版得 到一新排版, 使得所述新排版中的所述显示位置处有容纳所述注释内容的空 间, 并按照所述新排版显示所述第一字, 并在所述显示位置处显示所述注释内 容
这里, 步骤 154中, 在没有空间容纳所述注释内容时, 重新排版时可以根 据需要调整行间距(如增加行间距), 或者增加新的一行用于显示注释内容, 还可以是增加所述第一字与所述第一个字的下一个字之间的字间隔,以使得注 释内容有足够的容纳空间。
作为上述步骤 154的另一个替换的实施方式,还可以不调整排版, 而是先 对所述注释内容进行透明化处理,使之透明度提高到预定值, 然后将所述注释 内容叠加显示在所述第一字的位置处,这样既可以看到注释内容, 又不影响第 一字的显示。
步骤 155,在有空间容纳所述注释内容时,按照原始排版显示所述第一字, 并在所述显示位置处显示所述注释内容。
作为一个优选的实施方式, 本实施例还可以在上述步骤 15中采用不同的 显示效果,显示所述第一类字和所述注释内容:按照具有第一显示效果的第一 显示方案显示所述第一字,以及按照具有第二显示效果的第二显示方案显示所 述注释内容, 其中, 所述第一显示效果和第二显示效果不同。
这里, 可以由用户根据个人偏好, 针对第一类字和注释内容, 预先设置不 同的显示方案,也可以是电子设备中预先为第一类字和注释内容设置的默认显 示方案。 显示方案的内容包括: 字体类型、 大小、 颜色、 透明度、 是否静态显 示、 是否动态显示(如闪烁显示、 渐变显示等)等参数。 在步骤 15中进行显 示前, 确定所述第一字和所述注释内容各自对应的显示方案, 然后, 按照各自 的显示方案进行显示, 达到不同的显示效果。
作为另一优选实施方式, 上述显示方案还可以与上述步骤 151 - 155结合 起来,在步骤 154中按照所述新排版显示所述第一字, 并在所述显示位置处显 示所述注释内容时, 可以进一步按照所述第一显示方案显示所述第一字, 以及 按照所述第二显示效果显示所述注释内容;在步骤 155中按照原始排版显示所 述第一字, 并在所述显示位置处显示所述注释内容时,可以进一步按照所述第 一显示方案显示所述第一字, 以及按照所述第二显示效果显示所述注释内容。
基于上述文件处理方法, 本实施例还提供了一种文件处理装置, 如图 10 所示, 该文件处理装置 80具体包括:
第一获得单元, 用于获得文件;
解析单元, 用于解析所述文件获得所述文件包含的第一字;
匹配单元, 用于将所述第一字与预先设置的匹配字库匹配;
注释单元,用于在所述第一字满足预定条件时,获得所述第一字对应的注 释内容;
显示单元, 用于显示所述第一字和所述注释内容。 . ^ 作为一个优选实施方式, 所述显示单元包括:
效果确定单元,用于确定所述第一字的第一显示方案以及所述注释内容的 第二显示方案,其中, 所述第一显示方案的第一显示效果和所述第二显示方案 的第二显示效果不同;
显示处理单元, 用于在显示所述第一字时,按照所述第一显示方案显示所 述第一字; 以及在显示所述注释内容时,按照所述第二显示方案显示所述注释 内容。
作为一个优选实施方式, 所述文件处理装置还包括:
第二获得单元, 用于获得所述文件的原始排版;
位置确定单元, 用于确定所述注释内容相对于所述第一字的显示位置; 判断单元,用于判断所述原始排版中的所述显示位置处,是否有空间容纳 所述注释内容;
排版单元,用于在没有空间容纳所述注释内容时,对所述文件进行重新排 版得到一新排版,使得所述新排版中的所述显示位置处有容纳所述注释内容的 空间;
所述显示单元,还用于按照所述排版单元得到的所述新排版显示所述第一 字, 并在所述显示位置处显示所述注释内容。
作为一个优选实施方式, 所述注释单元,进一步用于在所述第一字不属于 所述匹配字库时,获得所述第一字对应的注释内容; 或者在所述第一字属于所 述匹配字库时, 获得所述第一字对应的注释内容。
作为一个优选实施方式, 所述文件处理装置还包括:
存储单元, 用于存储所述注释内容, 其中, 所述注释内容包括用于标注所 述第一字的发音方式及语调的标音符号、 用于解释所述第一字含义的释义信 息、用于控制播放所述第一字发音的音频文件的播放控制菜单、和利用不同于 所述第一字所属语言的其它语言对所述第一字进行翻译的翻译内容中的至少 一种。
图 10中还进一步显示了利用本实施例所述的文件处理装置, 对文件中的 中文汉字处理后的显示效果,其中左侧显示的是按照原始排版显示的文件,右 侧是经过本实施例处理后的显示效果, 其中增加了对非常见字 "窫窳" 的拼音 标注: "y a y u "。 <实施例二>
作为一个优选实施例,本实施例可以预先对所述文件的全部内容进行预处 理,获得所述文件包含的所有字,进而确定是否需要显示注释内容的第一类字 以及对应的注释内容; 然后, 在显示所述文件时, 再确定将要显示的内容中所 包括的所述第一类字, 从而在显示时显示其所对应的注释内容。 体包括以下步骤:
步驟 21 , 获得文件。
步骤 22, 解析所述文件, 获得所述文件的全部内容, 提取所述全部内容 中包含的所有字。
步骤 23 , 将所述所有字逐个与预先设置的匹配字库匹配, 从中选择出满 足预定条件的第一类字。
这里预定条件和匹配字库的设置方式可以与实施例一相同。
步骤 24, 获得所述第一类字对应的注释内容。
步骤 25, 确定所述文件中将要显示的第一内容, 从所述第一类字中选择 出属于所述第一内容的第二类字;
步骤 26, 显示包括所述第二类字在内的所述第一内容, 同时显示所述第 二类字对应的注释内容。
本实施例以文件中包含的所有字为例进行说明: 在解析文件时, 利用匹配 字库匹配的方式, 从所述文件中的所有字中选择出满足预定条件的第一类字, 进而针对第一类字获得对应的注释内容; 然后在具体显示某个内容时, 同时显 示该内容中的第一类字所对应的注释内容,从而也同样实现了对文件中特定字 进行自动注释的目的。
以下在实施例一的基础之上, 通过更多实施例对本发明做进一步的说明。
<实施例三> 本实施例以所述注释内容包括标音符号为例做进一步说明。
在所述注释内容包括标音符号时,本实施例中所述匹配字库包括预先设置 的常用字库,此时所述预定条件为所述第一字不属于所述常用字库。所述常用 字库包含有预先设定的常用字, 例如, 对于中文汉字, 可以将中国国家标准
GB2312中收录的一级字库中的汉字作为常用字; 对于英语单词, 可以将大学 公共英语 CET-6级的英语单词作为常用字, 等等。
如图 4所示,本实施例所述的文件处理方法可以应用在诸如计算机、 PDA、 手机、 MP4和电纸书等各种电子设备中, 具体包括以下步骤:
步骤 31 , 获得文件。
步骤 32, 解析所述文件获得所述文件包含的第一字。
步骤 33, 将所述第一字与预先设置的匹配字库匹配, 这里, 所述匹配词 库包括预先设定的常用字库。
步骤 34, 在所述第一字不属于所述常用字库时, 获得所述第一字对应的 注释内容, 所述注释内容包括标音符号, 还可以包括释义信息等内容。
步驟 35, 显示所述第一字和所述注释内容。
通过上述步骤, 本实施例实现了对非常用字自动注释的功能,使得用户在 阅读过程中能够学习到非常用字,提高了阅读学习的效率, 改善了用户的阅读 体验。
<实施例四>
本实施例以所述注释内容包括标音符号为例做进一步说明。
现有技术中, 用户对于阅读过程中遇到的非常见字, 可以主动去进行查字 典获得其读音、释义等信息, 但对于一些易错字, 如果该用户将错误读音当成 正确读音,该用户在阅读过程中通常不会主动再去确认该易错字的读音,也就 无法改正其错误, 不能学习到正确的读音。 本实施例通过设置易错字库, 在阅 读过程中主动对易错字库中的字的读音进行自动标注,从而能够提供给用户一 个学习正确读音的机会, 改善用户的阅读体验。
在所述注释内容包括标音符号时,本实施例中所述匹配字库包括预先设置 的易错字库,此时所述预定条件为所述第一字属于所述易错字库。所述易错字 库包含有预先设定的容易读错的字, 例如, 中文汉字中的多音字, 如 "行" 字 在 "银行,, 和 "行人" 中有不同的读音; 再例如, 英语中的地名 "San Jose" 是源自西班牙语的一个英文词组, 经常被读错。 这种容易读错的字词, 在确定 其读音时, 需要根据上下文进行分词处理,根据分词结果去查找保存有读音信 息的词库, 才能确定其准确读音。
如图 5所示,本实施例所述的文件处理方法可以应用在诸如计算机、 PDA、 手机、 MP4和电纸书等各种电子设备中, 具体包括以下步骤:
步骤 41 , 获得文件。
步骤 42, 解析所述文件获得所述文件包含的第一字。
步骤 43 , 将所述第一字与预先设置的匹配字库匹配, 这里, 所述匹配词 库包括预先设定的易错字库。
步骤 44, 在所述第一字属于所述易错字库时, 获得所述第一字对应的注 释内容, 所述注释内容包括标音符号, 还可以包括释义信息等内容。
步骤 45 , 显示所述第一字和所述注释内容。
这里, 作为一个优选实施方式, 在上述第一字为中文汉字时, 在上述步骤 42 中, 在所述解析所述文件获得所述文件包含的第一字后, 还包括: 根据所 述第一字的上下文, 对所述第一字进行分词处理, 得到分词结果; 在上述步骤 44 中, 进一步根据所述分词结果, 查询预先设定的词库, 获得所述第一字的 标音符号。
通过上述步骤, 本实施例实现了对容易读错的字自动注释的功能,使得用 户在阅读过程中能够学习到易错字的正确读音,提高了阅读学习的效率, 改善 了用户的阅读体验。
类似的, 本实施例也提供了一种文件处理装置, 具体包括:
第一获得单元, 用于获得文件;
解析单元, 用于解析所述文件获得所述文件包含的第一字;
匹配单元, 用于将所述第一字与预先设置的匹配字库匹配;
注释单元,用于在所述第一字满足预定条件时,获得所述第一字对应的注 释内容;
显示单元, 用于显示所述第一字和所述注释内容;
存储单元, 用于存储所述注释内容, 其中, 所述注释内容包括用于标注所 述第一字的发音方式及语调的标音符号、 用于解释所述第一字含义的释义信 息、用于控制播放所述第一字发音的音频文件的播放控制菜单、和利用不同于 所述第一字所属语言的其它语言对所述第一字进行翻译的翻译内容中的至少 一种。
作为一个优选实施方式, 所述存储单元还用于存储预先设定的词库; 在所述注释内容包括所述标音符号时, 所述文件处理装置还包括: 分词单元,用于在所述解析单元获得所述第一字后,根据所述第一字的上 下文, 对所述第一字进行分词处理, 得到分词结果;
查询单元, 用于根据所述分词结果, 查询所述存储单元存储的所述词库, 获得所述第一字的标音符号。
<实施例五>
本实施例所述文件处理方法中, 匹配字库包括常用字库和易错字库, 此时 所述预定条件为所述第一字不属于所述常用字库或所述第一字属于所述易错 字库。 此时, 本实施例所述的文件处理方法如图 6所示, 具体包括以下步骤: 步骤 51, 获得文件。
步驟 52, 解析所述文件获得所述文件包含的第一字。
步骤 53, 将所述第一字与预先设置的常用字库匹配: 在所述第一字属于 所述常用字库时进入步骤 54, 在所述第一字不属于所述常用字库时, 进入步 骤 55。
步骤 54, 将所述第一字与预先设置的易错字库匹配: 在所述第一字属于 所述易错字库时进入步骤 55, 在所述第一字不属于所述易错字库时, 进入步 骤 57。
步骤 55, 获得所述第一字对应的注释内容, 然后进入步骤 56。
步骤 56, 显示所述第一字和所述注释内容, 所述注释内容包括标音符号。 步骤 57, 显示所述第一字。
以上步骤是先将第一字与常用字库匹配,如果第一字属于常用字库则进一 步判断第一字与易错字库匹配, 最终确定第一字是否为易错字: 若是, 则需要 确定第一字的注释内容, 并在显示时显示第一字及其注释内容。 当然,本实施例也可以改变上述匹配的顺序,先将第一字与易错字库匹配, 如果第一字不属于易错字库则进一步判断第一字与常用字库匹配,最终确定第 一字是否为易错字或非常用字。
通过以上步骤, 本实施例能够在第一字为易错字或非常用字时,在显示第 一字时为第一字增加对应的注释内容, 改善用户的阅读体验。
<实施例六>
在阅读文件时, 不同的用户可能具有不同的知识能力, 例如, 小学生认识 的汉字相对于大学生来说通常要少,小学生认识的英语单词相对于大学生来说 通常要少, 因此可以预先设置多个字库, 例如, 对于英语单词可以设置大学公 共英语 CET-4级单词字库、 大学公共英语 CET-6级单词字库等各种级别的字 库, 分别包括有不同级别的英语单词; 对于汉字字库, 则可以为不同年级的学 生设置对应的年级字库, 例如为一年级学生设置一年级字库, 包括一年级学生 应该掌握的汉字; 为二年级学生设置二年级字库, 包括二年级学生应该掌握的 汉字 ·.· ·.·。
为此, 本实施例预先设置至少两个预定字库,各个所述预定字库包含的字 不完全相同。 本实施例所述的文件处理方法, 可以应用在诸如计算机、 PDA、 手机、 MP4和电纸书等各种电子设备中, 如图 7所示, 具体包括以下步骤: 步骤 61, 接收用户输入的匹配字库设置信息;
步骤 62, 根据所述匹配字库设置信息, 将所述至少两个预定字库中的一 预定字库设置为匹配字库。
步驟 63 , 获得文件。
步骤 64, 解析所述文件获得所述文件包含的第一字。
步骤 65, 将所述第一字与所述匹配字库匹配。
步骤 66, 在所述第一字的匹配结果满足预定条件时, 获得所述第一字对 应的注释内容。
步骤 67, 显示所述第一字和所述注释内容。
这里, 如果步骤 65中所述第一字的匹配结果不满足预定条件, 则在显示 时不需要显示第一字的注释内容。
通过上述步骤, 本实施例实现了对非常用字自动注释的功能,使得用户在 阅读过程中能够学习到非常用字,提高了阅读学习的效率, 改善了用户的阅读 体验。
由于用户在文件阅读过程中,能够对显示有注释内容的所述第一字进行学 习。在阅读该文件达到一定次数后,用户可能已经掌握了所述第一字的注释内 容, 此时再显示所述第一字的注释内容的必要性就大大降低。 因此, 本实施例 还可以在设置所述匹配字库后,进一步地统计所述文件被显示的次数,在步骤
67 中显示所述第一字和所述注释内容之前, 判断所述文件被显示的次数是否 达到预先设置的所述匹配词库对应的次数: 若达到所述匹配词库对应的次数, 则在显示所述第一字时不显示所述注释内容;若未达到所述匹配词库对应的次 数, 则同时显示所述第一字和所述注释内容。
类似的, 本实施例也提供了一种文件处理装置, 具体包括:
第一获得单元, 用于获得文件;
解析单元, 用于解析所述文件获得所述文件包含的第一字;
匹配单元, 用于将所述第一字与预先设置的匹配字库匹配;
注释单元, 用于在所述第一字满足预定条件时,获得所述第一字对应的注 释内容;
显示单元, 用于显示所述第一字和所述注释内容;
存储单元, 用于存储至少两个预定字库,每个所述预定字库包含的字不完 全相同;
接收单元, 用于接收匹配字库设置信息;
设置单元, 用于根据所述匹配字库设置信息,将所述存储单元存储的一预 定字库设置为所述匹配字库。
<实施例七>
在阅读文件时, 同一用户的认知水平也会是变化的, 该用户随着阅读文件 次数的增加, 将会学习到更多的字, 从而认知水平得以提高。 为此, 本实施例 根据统计得到的用户阅读文件的次数,设置当前的匹配字库, 以使得匹配字库 与当前用户的认知水平相适应, 具体说明如下:
本实施例预先设置至少两个预定字库,各个所述预定字库包含的字不完全 相同。 并且, 本实施例还预先设置每个所述预定字库对应的次数门限, 其中, 每个预定字库对应的次数门限不同。本实施例所述的文件处理方法,可以应用 在诸如计算机、 PDA、 手机、 MP4和电纸书等各种电子设备中, 如图 8所示, 具体包括以下步骤:
步骤 71 , 统计所述文件被显示的显示次数。
步骤 72, 根据所述显示次数, 从所述至少两个预定字库中选择出第一预 定字库, 从而得到包含有所述第一预定字库信息的匹配字库设置信息, 其中, 所述第一预定字库是次数门限大于所述显示次数的预定字库中具有最小次数 门限的预定字库。
步骤 73, 根据所述匹配字库设置信息, 将所述第一预定字库设置为当前 的匹配字库。
步骤 74, 获得文件。
步骤 75 , 解析所述文件获得所述文件包含的第一字。
步骤 76, 将所述第一字与所述匹配字库匹配。
步骤 77, 在所述第一字的匹配结果满足预定条件时, 获得所述第一字对 应的注释内容。
步骤 78, 显示所述第一字和所述注释内容。
本实施例的上述步骤 73与实施例六的步骤 62不同。 在上述步骤 73中是 电子设备根据预定策略, 自动生成的匹配字库设置信息, 然后根据该匹配字库 设置信息将对应的预定字库设置为匹配字库, 而非实施例六的步骤 61、 62中 的接收用户输入的匹配字库设置信息并根据该信息进行匹配字库的设置。
本实施例以上步骤中, 实现了根据文件阅读(显示)次数, 自动设置当前 匹配字库的功能, 使得匹配字库与用户当前认知水平相适应。 举例说明如下: 假设预定字库为常用字库, 并且存在 3个不同级別的常用字库,一级常用 字库所包含的常用字的数量 < 二级常用字库所包含的常用字的数量 < 三级 常用字库中所包含的常用字的数量, 且设置一级常用字库对应的次数门限 < 二级常用字库对应的次数门限 < 三级常用字库对应的次数门限。 下表中列出 了一种可能的示例: 一级常用字库 二级常用字库 三级常用字库
常用字数量 3600 6000 9200
次数门限 3 10 30
这里,次数门限的含义在于:若显示次数达到当前的匹配字库的次数门限, 则应该选用具有更高次数门限的预定字库作为匹配字库。例如,在当前匹配字 库为一级常用字库时, 若文件显示次数已达到 3次, 则应该选用次数门限高于 3的二级常用字库作为匹配字库; 若文件显示次数已达到 3次, 则从次数门限 高于 3的二级、 三级常用字库选择具有较小次数门限 10的二级常用字库作为 匹配字库; 若文件显示次数已达到 30次以上, 由不存在次数门限高于 30的常 用字库, 因此不再设置匹配字库, 此时由于该文件已被很多次显示, 用户对其 中的非常见字都已得到充分学习, 因此没有必要再显示注释内容。
<实施例 >
有一些字, 在不同国家会有不同的发音, 例如, 一些英语单词在美国有美 国式发音,而在英国则有英国式发音;一些字在不同地区则有不同的方言发音, 即这些字的发音和地理位置有关。 为此, 本实施例预先设置一个预定字库, 该 预定字库中所包含的字具有至少两种发音,其中第一种发音对应于第一地理位 置, 第二种发音对应于第二地理位置, 所述第一地理位置与所述第二地理位置 不同。 此外, 还设置一标音符号数据库, 该数据库中保存了所述预定字库中的 字在不同地理位置时的不同发音的标音符号。
本实施例所述文件处理方法, 应用于一电子设备中, 如图 9所示, 具体包 括以下步骤:
步骤 81, 获得文件。
步骤 82, 解析所述文件获得所述文件包含的第一字。
步骤 83 , 将所述第一字与所述预定字库匹配。
步骤 84, 在所述第一字属于所述预定字库时, 获得所述电子设备的当前 地理位置。
这里, 电子设备的当前地理位置, 可以根据电子设备的 IP地址, 查询保 存有地理位置与 IP地址对应关系的数据库, 获得电子设备的当前地理位置; 还可以利用全球定位系统 GPS对所述电子设备进行定位, 获取所述电子设备 的当前地理位置。
步骤 85, 根据所述电子设备的当前地理位置, 查找所述标音符号数据库, 确定所述第一字在所述当前地理位置时的发音的第一标音符号。
步骤 86, 显示所述第一字和所述第一标音符号。
这样, 本实施例可以依据用户当前所在地理位置, 为用户显示字在当前地 理位置的发音, 使得用户能够入乡随俗, 有利于用户与当地居民的沟通交流。
综上所述,本发明各个实施例所提供的文件处理方法及文件处理装置, 能 够自动对文件中符合预定条件的字进行注释,从而避免了用户中断阅读过程去 对这些字进行查询的操作,保证了用户阅读连贯性, 同时本发明实施例还在阅 读过程中向用户提供了一个学习更多知识的学习机会,这些都改善了用户的阅 读体验。
以上所述仅是本发明的实施方式,应当指出,对于本技术领域的普通技术 人员来说, 在不脱离本发明原理的前提下, 还可以作出若干改进和润饰, 这些 改进和润饰也应视为本发明的保护范围。

Claims

权利 要求 书
1. 一种文件处理方法, 其特征在于, 包括:
获得文件;
解析所述文件获得所述文件包含的第一字;
将所述第一字与预先设置的匹配字库匹配;
在所述第一字满足预定条件时, 获得所述第一字对应的注释内容; 显示所述第一字和所述注释内容。
2. 如权利要求 1 所述的文件处理方法, 其特征在于, 所述显示所述第一 字和所述注释内容包括: 按照具有第一显示效杲的显示方案显示所述第一字, 以及按照具有第二显示效果的第二显示方案显示所述注释内容,其中, 所述第 一显示效果和第二显示效果不同。
3. 如权利要求 1所述的文件处理方法, 其特征在于,
所述显示所述第一字和所述注释内容, 包括:
获得所述文件的原始排版;
确定所述注释内容相对于所述第一字的显示位置;
判断所述原始排版中的所述显示位置处, 是否有空间容纳所述注释内容; 在没有空间容纳所述注释内容时, 对所述文件进行重新排版得到一新排 版,使得所述新排版中的所述显示位置处有容纳所述注释内容的空间, 并按照 所述新排版显示所述第一字, 并在所述显示位置处显示所述注释内容。
4. 如权利要求 1所述的文件处理方法, 其特征在于,
所述预定条件为所述第一字不属于所述匹配字库或所述第一字属于所述 匹配字库。
5. 如权利要求 1所述的文件处理方法, 其特征在于,
所述注释内容包括用于标注所述第一字的发音方式及语调的标音符号、用 于解释所述第一字含义的释义信息、用于控制播放所述第一字发音的音频文件 的播放控制菜单、和利用不同于所述第一字所属语言的其它语言对所述第一字 进行翻译的翻译内容中的至少一种。
6. 如权利要求 5所述的文件处理方法, 其特征在于,
在所述注释内容包括所述标音符号时,所述匹配字库包括预先设置的常用 字库和易错字库,所述预定条件为所述第一字不属于所述常用字库或所述第一 字属于所述易错字库, 其中所述常用字库包含有预先设定的常用字, 所述易错 字库包括预先设定的容易读错的字。
7. 如权利要求 5所述的文件处理方法, 其特征在于,
在所述注释内容包括所述标音符号时,在所述解析所述文件获得所述文件 包含的第一字后, 还包括: 根据所述第一字的上下文, 对所述第一字进行分词 处理, 得到分词结果;
所述获得所述第一字对应的注释内容, 包括: 根据所述分词结果, 查询预 先设定的词库, 获得所述第一字的标音符号。
8. 如权利要求 1所述的文件处理方法, 其特征在于,
存在至少两个预定字库, 每个所述预定字库包含的字不完全相同; 在所述获得文件之前, 还包括:
接收匹配字库设置信息;
根据所述匹配字库设置信息,将所述两个以上的预定字库中的一预定字库 设置为所述匹配字库。
9. 一种文件处理装置, 其特征在于, 包括:
第一获得单元, 用于获得文件;
解析单元, 用于解析所述文件获得所述文件包含的第一字;
匹配单元, 用于将所述第一字与预先设置的匹配字库匹配;
注释单元, 用于在所述第一字满足预定条件时,获得所述第一字对应的注 释内容;
显示单元, 用于显示所述第一字和所述注释内容。
10. 如权利要求 9所述的文件处理装置,其特征在于,所述显示单元包括: 效果确定单元,用于确定所述第一字的第一显示方案以及所述注释内容的 第二显示方案, 其中, 所述第一显示方案的第一显示效果和所述第二显示方案 的第二显示效果不同;
显示处理单元, 用于在显示所述第一字时,按照所述第一显示方案显示所 述第一字; 以及在显示所述注释内容时,按照所述第二显示方案显示所述注释 内容。
11. 如权利要求 9所述的文件处理装置, 其特征在于, 还包括: 第二获得单元, 用于获得所述文件的原始排版;
位置确定单元, 用于确定所述注释内容相对于所述第一字的显示位置; 判断单元, 用于判断所述原始排版中的所述显示位置处,是否有空间容纳 所述注释内容;
排版单元, 用于在没有空间容纳所述注释内容时,对所述文件进行重新排 版得到一新排版,使得所述新排版中的所述显示位置处有容纳所述注释内容的 空间;
所述显示单元,还用于按照所述排版单元得到的所述新排版显示所述第一 字, 并在所述显示位置处显示所述注释内容。
12. 如权利要求 9所述的文件处理装置, 其特征在于,
所述注释单元,进一步用于在所述第一字不属于所述匹配字库时,获得所 述第一字对应的注释内容; 或者在所述第一字属于所述匹配字库时,获得所述 第一字对应的注释内容。
13. 如权利要求 8所述的文件处理装置, 其特征在于, 还包括:
存储单元, 用于存储所述注释内容, 其中, 所述注释内容包括用于标注所 述第一字的发音方式及语调的标音符号、 用于解释所述第一字含义的释义信 息、用于控制播放所述第一字发音的音频文件的播放控制菜单、和利用不同于 所述第一字所属语言的其它语言对所述第一字进行翻译的翻译内容中的至少 一种。
14. 如权利要求 13所述的文件处理装置, 其特征在于,
所述存储单元还用于存储预先设定的词库;
在所述注释内容包括所述标音符号时, 所述文件处理装置还包括: 分词单元, 用于在所述解析单元获得所述第一字后,根据所述第一字的上 下文, 对所述第一字进行分词处理, 得到分词结果;
查询单元, 用于根据所述分词结果, 查询所述存储单元存储的所述词库, 获得所述第一字的标音符号。
15. 如权利要求 9所述的文件处理装置, 其特征在于, 还包括:
存储单元,用于存储至少两个预定字库,每个所述预定字库包含的字不完 全相同;
接收单元, 用于接收匹配字库设置信息;
设置单元, 用于根据所述匹配字库设置信息,将所述存储单元存储的 定字库设置为所述匹配字库。
PCT/CN2011/077865 2010-08-02 2011-08-01 一种文件处理方法及文件处理装置 WO2012016505A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/813,720 US10210148B2 (en) 2010-08-02 2011-08-01 Method and apparatus for file processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010243566.9 2010-08-02
CN201010243566.9A CN102346731B (zh) 2010-08-02 2010-08-02 一种文件处理方法及文件处理装置

Publications (1)

Publication Number Publication Date
WO2012016505A1 true WO2012016505A1 (zh) 2012-02-09

Family

ID=45545419

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/077865 WO2012016505A1 (zh) 2010-08-02 2011-08-01 一种文件处理方法及文件处理装置

Country Status (3)

Country Link
US (1) US10210148B2 (zh)
CN (1) CN102346731B (zh)
WO (1) WO2012016505A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5753769B2 (ja) * 2011-11-18 2015-07-22 株式会社日立製作所 音声データ検索システムおよびそのためのプログラム
CN104346375B (zh) * 2013-07-31 2017-10-13 北大方正集团有限公司 一种制作中间字库的方法以及装置
CN103941981B (zh) * 2014-04-24 2017-09-19 江西迈思科技有限公司 一种信息处理的方法及装置
CN105989099A (zh) * 2015-02-13 2016-10-05 晨星半导体股份有限公司 相关资讯显示方法以及可自动显示相关资讯的电子装置
CN104933033A (zh) * 2015-07-08 2015-09-23 邱行中 中文汉字自动标注拼音的系统及其标注方法
CN107239441B (zh) * 2017-04-26 2020-09-01 广东小天才科技有限公司 一种词典释义方法及装置
CN110321535B (zh) * 2018-03-30 2023-08-18 富士胶片实业发展(上海)有限公司 儿童读物处理方法及装置
CN108804002B (zh) * 2018-04-25 2022-03-08 广州视源电子科技股份有限公司 交互智能设备的文本注释方法和装置
CN110874527A (zh) * 2018-08-28 2020-03-10 游险峰 一种基于云端的智能释义注音系统
CN111274352B (zh) * 2020-01-14 2023-05-26 北大方正集团有限公司 工具书中特征字的标注方法和设备
WO2023121681A1 (en) * 2021-12-20 2023-06-29 Google Llc Automated text-to-speech pronunciation editing for long form text documents
CN116484052B (zh) * 2023-06-26 2023-12-01 广州宏途数字科技有限公司 一种基于大数据的教育资源共享系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1393107A (zh) * 2000-07-27 2003-01-22 皇家菲利浦电子有限公司 充实视频的屏幕文字触发字
WO2006029259A2 (en) * 2004-09-08 2006-03-16 Sharedbook Ltd Creating an annotated web page
CN101645190A (zh) * 2009-07-22 2010-02-10 合肥讯飞数码科技有限公司 一种单词查询系统及其查询方法
CN101765840A (zh) * 2006-09-15 2010-06-30 埃克斯比布里奥公司 纸质与电子文档中的注释的捕获及显示

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2020748A1 (en) * 1989-08-22 1991-02-23 Thomas F. Look Method and apparatus for machine reading of retroreflective vehicle identification articles
DE69232493T2 (de) * 1991-10-21 2003-01-09 Canon Kk Verfahren und Gerät zur Zeichenerkennung
US5369704A (en) * 1993-03-24 1994-11-29 Engate Incorporated Down-line transcription system for manipulating real-time testimony
US6128632A (en) * 1997-03-06 2000-10-03 Apple Computer, Inc. Methods for applying rubi annotation characters over base text characters
US6262728B1 (en) * 1998-11-03 2001-07-17 Agilent Technologies, Inc. System and method for annotating a graphical user interface display in a computer-based system
US6551357B1 (en) * 1999-02-12 2003-04-22 International Business Machines Corporation Method, system, and program for storing and retrieving markings for display to an electronic media file
JP2000330902A (ja) * 1999-05-25 2000-11-30 Sony Corp 情報処理装置および方法、並びに媒体
US20020086269A1 (en) * 2000-12-18 2002-07-04 Zeev Shpiro Spoken language teaching system based on language unit segmentation
US20030214528A1 (en) * 2002-03-15 2003-11-20 Pitney Bowes Incorporated Method for managing the annotation of documents
US20040267798A1 (en) * 2003-06-20 2004-12-30 International Business Machines Corporation Federated annotation browser
US7418656B1 (en) * 2003-10-03 2008-08-26 Adobe Systems Incorporated Dynamic annotations for electronics documents
CN1993692A (zh) * 2004-05-24 2007-07-04 紫熊猫有限公司 字符显示系统
US7779347B2 (en) * 2005-09-02 2010-08-17 Fourteen40, Inc. Systems and methods for collaboratively annotating electronic documents
CN100483416C (zh) 2007-05-22 2009-04-29 北京搜狗科技发展有限公司 一种字符输入的方法、输入法系统及词库更新的方法
CN101408874A (zh) * 2007-10-09 2009-04-15 深圳富泰宏精密工业有限公司 图像文字翻译装置及方法
CN101420313B (zh) 2007-10-22 2011-01-12 北京搜狗科技发展有限公司 一种针对客户端用户群进行聚类的方法和系统
CN101196874B (zh) * 2007-12-28 2010-06-23 宇龙计算机通信科技(深圳)有限公司 一种机器辅助阅读的方法和装置
CN201259670Y (zh) * 2008-07-22 2009-06-17 青岛海信移动通信技术股份有限公司 一种文本信息处理装置和设备
CN101645065B (zh) 2008-08-05 2016-02-24 北京搜狗科技发展有限公司 确定需要加载的辅助词库的方法、装置及输入法系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1393107A (zh) * 2000-07-27 2003-01-22 皇家菲利浦电子有限公司 充实视频的屏幕文字触发字
WO2006029259A2 (en) * 2004-09-08 2006-03-16 Sharedbook Ltd Creating an annotated web page
CN101765840A (zh) * 2006-09-15 2010-06-30 埃克斯比布里奥公司 纸质与电子文档中的注释的捕获及显示
CN101645190A (zh) * 2009-07-22 2010-02-10 合肥讯飞数码科技有限公司 一种单词查询系统及其查询方法

Also Published As

Publication number Publication date
CN102346731A (zh) 2012-02-08
US10210148B2 (en) 2019-02-19
US20130132816A1 (en) 2013-05-23
CN102346731B (zh) 2014-09-03

Similar Documents

Publication Publication Date Title
WO2012016505A1 (zh) 一种文件处理方法及文件处理装置
JP5997217B2 (ja) 言語変換において複数の読み方の曖昧性を除去する方法
US8762358B2 (en) Query language determination using query terms and interface language
US8255376B2 (en) Augmenting queries with synonyms from synonyms map
CN1801139B (zh) 句子显示方法和信息处理系统
JP4960461B2 (ja) ウェブベースのコロケーション誤りの校正
US7475063B2 (en) Augmenting queries with synonyms selected using language statistics
US7835903B2 (en) Simplifying query terms with transliteration
US20180144747A1 (en) Real-time caption correction by moderator
US20100180199A1 (en) Detecting name entities and new words
JP2006190006A5 (zh)
US9977766B2 (en) Keyboard input corresponding to multiple languages
TW201316187A (zh) 偵測及校正中文錯字的系統及方法
JP2009258293A (ja) 音声認識語彙辞書作成装置
EP2016486A2 (en) Processing of query terms
WO2015162464A1 (en) Method and system for generating a definition of a word from multiple sources
CN111046627A (zh) 一种中文文字显示方法及系统
US8438005B1 (en) Generating modified phonetic representations of indic words
JPWO2009041661A1 (ja) 情報処理装置、及びプログラム
JP2020064428A (ja) コンテンツの表示方法および装置
JPH11272671A (ja) 機械翻訳装置及び機械翻訳方法
KR20080020011A (ko) 모바일 웹 콘텐츠 서비스 시스템 및 그 방법
Musgrave et al. Language description and hypertext: Nunggubuyu as a case study
KR101100848B1 (ko) 어휘 데이터베이스를 생성하는 방법 및 그 어휘 데이터베이스를 저장하는 컴퓨터 판독가능 매체
US20240119851A1 (en) Method and system for providing language learning services

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11814088

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 13813720

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11814088

Country of ref document: EP

Kind code of ref document: A1