WO2019205917A1 - 一种确定目标单词的拼读分区的方法和电子设备 - Google Patents

一种确定目标单词的拼读分区的方法和电子设备 Download PDF

Info

Publication number
WO2019205917A1
WO2019205917A1 PCT/CN2019/081628 CN2019081628W WO2019205917A1 WO 2019205917 A1 WO2019205917 A1 WO 2019205917A1 CN 2019081628 W CN2019081628 W CN 2019081628W WO 2019205917 A1 WO2019205917 A1 WO 2019205917A1
Authority
WO
WIPO (PCT)
Prior art keywords
partition
word
combination
spelling
determining
Prior art date
Application number
PCT/CN2019/081628
Other languages
English (en)
French (fr)
Inventor
陈逸天
Original Assignee
Chen Yitian
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chen Yitian filed Critical Chen Yitian
Publication of WO2019205917A1 publication Critical patent/WO2019205917A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages

Definitions

  • Embodiments of the present invention relate to the field of electronic assisted teaching technologies, and in particular, to a method and an electronic device for determining a spelling partition of a target word.
  • the way of dividing the alphabetic string of the component area is to associate the words containing the same partial letter string in the vocabulary into related words, so as to conveniently classify the associated partial alphabetic strings according to the linguistic alphabetic pinyin rules, and correspondingly
  • the pronunciation type is spelled out.
  • this spelling method can effectively simplify the various pronunciations of related words into a partial pronunciation of a limited syllable type, which is a good way to solve the problem of pronunciation of foreign languages of "letter spelling words".
  • the associated partition letters between the associated words often have more than one or more associated heterogeneous pronunciations, and many of these different correspondences have some rules.
  • Embodiments of the present invention provide a method and an electronic device for determining a spelling partition of a target word, which can provide a spelling partition that helps to improve the structural correct pronunciation probability for the target word.
  • the embodiment of the present invention provides the following technical solutions:
  • an embodiment of the present invention provides a method for determining a spelling partition of a target word, including:
  • the spelling total library includes a word optimal partition combination database
  • the word optimal partition combination database records Selecting a word and an optimal partition combination corresponding to each of the candidate words, the target word being one of the candidate words;
  • the spelling total library further includes: a word likelihood partition combination database, wherein the candidate word is recorded in the word likelihood partition combination database, and each of the candidate words corresponds to several possibilities a partition combination, a plurality of partition units included in each of the likelihood partition combinations, and a structural correct pronunciation probability corresponding to each of the partition units; wherein each of the partition units represents a letter string and a correspondence between pronunciation codes;
  • the optimal combination of partitions corresponding to each of the candidate words in the word optimal partition combination database can be filtered from the word likelihood partition combination database.
  • the method further includes:
  • the optimal partition combination corresponding to the candidate word is selected according to the structural correct pronunciation probability of each partition unit in each possible partition combination corresponding to the candidate word;
  • the candidate words and their corresponding optimal partition combinations are recorded in the word optimal partition combination database.
  • the optimal partition combination corresponding to the candidate word is selected based on the comprehensive structural correct pronunciation probability of each possible partition combination corresponding to the candidate word, including:
  • the spelling total library further includes: a partition unit database, wherein all the partition units corresponding to each word in the target vocabulary are recorded in the partition unit database, and the letter string and pronunciation corresponding to each of the partition units a code and a word including the partition unit, wherein the candidate word is included in the target vocabulary;
  • the structural correct pronunciation probability of each of the partition units in the word likelihood partition combination database may be calculated based on the partition unit database.
  • the method further includes:
  • the structural correct pronunciation probability corresponding to each of the partition units is correspondingly recorded in the word likelihood partition combination database.
  • the first type of word, the second type of word, and the candidate word belong to the same vocabulary category.
  • determining, according to the first type of word and the second type of word, a structural correct pronunciation probability corresponding to the partitioning unit including:
  • the spelling total library further includes: a base letter string and a basic pronunciation code correspondence relationship library, wherein the basic letter string and the basic pronunciation code correspondence relationship library record all the basic letter strings and corresponding basic pronunciation codes thereof ;
  • the partition unit database may be calculated based on the target vocabulary library and the base letter string and the base pronunciation code correspondence database.
  • an electronic device including:
  • At least one processor and,
  • the apparatus is capable of performing the method of determining the spelling partition of the target word as described above.
  • an embodiment of the present invention further provides a non-transitory computer readable storage medium, where the non-transitory computer readable storage medium stores computer executable instructions for causing an electronic device A method of determining a spelling partition of a target word as described above is performed.
  • the beneficial effects of the embodiments of the present invention are: different from the prior art, the method and the electronic device for determining the spelling partition of the target word provided by the embodiment of the present invention, by using the words based on the preset spelling total library a partition combination database, determining an optimal partition combination corresponding to the received target word, and determining a spelling partition of the target word according to the optimal partition combination corresponding to the target word, wherein the determined optimal partition combination is
  • the partition unit has a high structural correct pronunciation probability, so that the correct pronunciation probability of the target word can be substantially improved from the spelling structure.
  • FIG. 1 is a schematic flowchart of a method for determining a spelling partition of a target word according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart diagram of a method for creating a word optimal partition combination database based on a word possibility partition combination database according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method for creating a word likelihood partition combination database according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of an apparatus for determining a spelling partition of a target word according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of hardware of an electronic device according to an embodiment of the present invention.
  • dictionaries for example, English dictionaries
  • foreign language assisted teaching equipment in addition to the complete alphabetic string writing sequence and the complete pronunciation code (for example, In addition to the meaning of the complete phonetic alphabet sequence of English and its meaning, a syllable mark is inserted in its complete alphabetic string writing sequence according to the syllable classification method of the foreign language linguistics to display the spelling partition of the word.
  • an embodiment of the present invention provides a method for determining a spelling partition of a target word, an apparatus for determining a spelling partition of a target word, an electronic device, a non-transitory computer readable storage medium, and a Computer program product.
  • the method for determining the spelling partition of the target word is a method for finding the optimal partition combination corresponding to the target word to comprehensively improve the structural correct pronunciation probability of each spelling partition of the target word, specifically: Determining an optimal partition combination of the received target words based on the preset spelling total library, and then determining a spelling partition of the target word according to the optimal partition combination corresponding to the target word; wherein the spelling total pool Included in the word optimal partition combination database, wherein the word optimal partition combination database records an alternative word and an optimal partition combination corresponding to each of the candidate words, the target word being one of the candidate words
  • the partition unit in the optimal partition combination determined based on the spelling total library has a relatively high structural correct pronunciation probability, which helps to reduce the structural loopholes of the foreign language learners in the spelling partition.
  • a confident misunderstanding of another known associated word ie, a word containing the same partial letter string in the vocabulary
  • the possibility of the correct pronunciation of the word associated with the target area for the pronunciation spelling partition is a method for finding the optimal partition combination corresponding
  • the device for determining the spelling partition of the target word provided by the embodiment of the present invention is a virtual device composed of a software program, which can implement the method for determining the spelling partition of the target word provided by the embodiment of the present invention, and the embodiment of the present invention
  • the method of determining the spelling partition of the target word is provided based on the same inventive concept, having the same technical features and beneficial effects.
  • the electronic device provided by the embodiment of the present invention may be any type of electronic device, such as a learning machine, a smart phone, a personal computer, a tablet computer, a robot, a cloud server, and the like.
  • the electronic device is capable of performing the method for determining the spelling partition of the target word provided by the embodiment of the present invention, or the device for determining the spelling partition of the target word provided by the embodiment of the present invention.
  • the method, apparatus, electronic device, non-transitory computer readable storage medium, and computer program product for determining the spelling partition of the target word can be applied to any natural spelling method.
  • the "letter spelling text” language such as: English, German, French, Greek, Italian, Portuguese, and so on.
  • the "target word” may be a word of any one of the above languages.
  • the target word is mainly described as an English word as an example.
  • FIG. 1 is a schematic flowchart of a method for determining a spelling partition of a target word according to an embodiment of the present invention, and the method may be performed by any type of electronic device.
  • the method may include, but is not limited to, the following steps:
  • Step 100 Determine an optimal partition combination corresponding to the received target word based on the preset spelling total library.
  • the “spelling total library” is used to determine the optimal partition combination corresponding to the received target word, which may be a data total database set in advance, or may be a step-by-step update of the total data.
  • the spelling total library may include one or more databases that record multiple correspondences.
  • the spelling total library includes a "word best partition combination database". As shown in Table 1, at least the candidate words and the optimal partition combination corresponding to each of the candidate words are recorded in the "word best partition combination database".
  • the "alternative word” may be any word of the "letter spelling text” language, such as an English word, a German word, a French word, a Greek word, etc., in this embodiment,
  • the alternative words are examples of English words, but they are not intended to limit the invention.
  • the "alternative word” may be any word that has determined its corresponding optimal partition combination, wherein the "best partition combination" of each word is composed of several partition units, each of which represents a partition unit.
  • a correspondence between a letter string and a pronunciation code (wherein the "pronunciation code” refers to a code capable of characterizing a certain pronunciation, specifically, it may be a phonetic string, or a specific code corresponding to the relevant pronunciation)
  • the structural correct pronunciation probability of the partition unit in the optimal partition combination is relatively high (that is, the letter string represented by the partition unit in the optimal partition combination is pronounced in each associated word according to the corresponding pronunciation code. The probability is relatively high).
  • the best partition combination for the word "different” is "dif-ferent: It consists of two partition units (ie, with Partitioning unit with Both have a relatively high structural correct pronunciation probability, that is, the letter string "dif” is sent in each associated word containing "dif” Sound, and the letter string "ferent” is sent in most of the associated words containing "ferent” Sound, thus, based on the best partition combination "dif-ferent: "Determining the spelling partition of the word "different” for natural spelling, can improve the correct pronunciation probability of each word of the word "different” from the source of the spelling structure, and help to improve the correct spelling of the students from the source of the spelling structure. The probability of reading "different".
  • the “target word” is a word received by the electronic device to determine its spelling partition, which is one of the above-mentioned alternative words. Therefore, when the electronic device receives a certain target word, the optimal partition combination corresponding to the target word can be matched by querying the word optimal partition combination database in the total library.
  • the “spelling total library” may further include a “word likelihood partition combination database”, where each of the candidate words in the “word best partition combination database” corresponds to The best partition combination can be filtered from the "word likelihood partition combination database”.
  • the "word possibility partition combination database” includes at least all the candidate words described in the above-mentioned "word optimal partition combination database”, and each of the candidate words corresponds to a plurality of possible partition combinations. a plurality of partition units included in each of said likelihood partition combinations, and a structural correct pronunciation probability corresponding to each of said partition units (that is, recorded in the "word likelihood partition combination database” : an alternate word, a combination of likelihood partitions, a partition unit, and a hierarchical correspondence of structurally correct pronunciation probabilities).
  • Each of the possible partition combinations represents a manner of dividing the spelling partition, and the "best partition combination" corresponding to the candidate word is one of several corresponding "probability partition combinations".
  • each of the partition units in each of the possible partition combinations represents a correspondence between a letter string and a pronunciation code
  • the "structural correct pronunciation probability" of the partition unit is that the letter string represented by the partition unit is The probability that the associated words are pronounced according to the corresponding pronunciation code (that is, the corresponding pronunciation of the partition unit letter string in any one of the associated words is borrowed, and the letter string corresponding to the partition unit in the target word is used for pronunciation.
  • Table 2 lists the "partial" likelihood partition combinations corresponding to the word "different” in the "word likelihood partition combination database", the partition units included in each of the possible partition combinations, and each The structural correct pronunciation probability corresponding to the partition unit.
  • the partition unit The structural correct pronunciation probability is 100%, indicating that the letter string is pronounced in any associated word containing the letter string "dif" Partition unit
  • the structural correct pronunciation probability is 99.88%, which means that the letter string "ferent” is sent in each associated word containing the letter string "ferent”.
  • the probability of the sound is 99.88%.
  • the "word likelihood partition combination database” may be pre-set, and the words recorded therein may further include words other than the candidate words (for example, "word best partition combination database”
  • the alternative words recited include: A, B, and C, and the words recited in the "word likelihood partition combination database” may include A, B, C, D, and E); or, the "word likelihood partition combination database” "It can also be formed at the same time as the "word best partition combination database” (for example, after determining the possible partition combination of the word A, the partition unit and its structural correct pronunciation probability in the "word possibility partition combination database” Extracting the optimal partition combination corresponding to the word A and updating it in the "word best partition combination database", so that only the candidate word is included in the "word possibility partition combination database", and the present invention is implemented This example does not specifically limit this.
  • FIG. 2 is a method for creating a word optimal partition combination database based on a word likelihood partition combination database according to an embodiment of the present invention.
  • the method may include, but is not limited to, the following steps:
  • Step 110 Determine, according to the word likelihood partition combination database, a plurality of possible partition combinations corresponding to each of the candidate words, each of the plurality of partition units included in the probability partition combination, and each The structural correct pronunciation probability corresponding to one of the partition units.
  • the "word possibility partition combination database” is already set, and therefore, by querying the "word possibility partition combination database", several possibilities corresponding to each candidate word can be obtained.
  • Step 120 For each of the candidate words, respectively, according to the structural correct pronunciation probability of each partition unit in each possible partition combination corresponding to the candidate word, the best corresponding to the candidate word is selected. Partition combination.
  • the structural correct pronunciation probability of the partition unit is used to represent the probability that the letter string represented by the letter string represented by each of the associated words containing the letter string is pronounced according to the corresponding pronunciation code; the higher the structural correct pronunciation probability, The higher the probability that the letter string represented by it is pronounced according to the pronunciation code among each of the associated words. Therefore, in the embodiment, an optimal probability partition combination can be selected as the optimal partition combination according to the structural correct pronunciation probability of each partition unit in each possible partition combination corresponding to the candidate word. .
  • the specific implementation manner of “selecting the optimal partition combination according to the structural correct pronunciation probability of each partition unit in each possible partition combination corresponding to the candidate word” may be determined according to the actual application scenario or language characteristics. .
  • the possibility of including the partition unit having the most structural correct pronunciation probability satisfying the preset requirement may be selected among the corresponding combinations of several possible partitions. Partition combination as the best partition combination for this alternative word.
  • the “preset requirement” may be: the structural correct pronunciation probability is 100%; or the structural correct pronunciation probability exceeds a certain preset threshold (for example, 98%); as long as the determined optimal partition is determined.
  • a certain preset threshold for example, 98%
  • the number of partition units satisfying the preset requirement by the structural correct pronunciation probability is used as a screening criterion, so that most of the partition units in the selected optimal partition combination have relatively high structural correctness.
  • the probabilistic probability thus, can improve the correct rate of the foreign language learner naturally spelling the candidate word from the source of the partition structure, and at the same time, facilitate the foreign language learner to map the correspondence between the letter string and the pronunciation code represented by the partition unit. More fully effective and naturally applied to other unfamiliar words containing the same string of letters, achieving the highest possible benefit and applicability (ie, applying the correspondence to more associated words containing the string).
  • the "integrated structural correct pronunciation probability" is used to characterize the comprehensive performance of the structural correct pronunciation probability of each partition unit in a certain possible partition combination, which may be based on the possibility partition combination
  • the structural correct pronunciation probability of each partition unit is calculated.
  • the mathematical average algorithm can be used to construct the structure of each partition unit in each possible partition combination.
  • the correct pronunciation probabilities are calculated to obtain a comprehensive structural correct pronunciation probability for each combination of possible partitions.
  • the mathematical average algorithm may include, but is not limited to, a total number average algorithm, a middle average algorithm, a mode average algorithm, an RMS average algorithm, and the like.
  • the possibility partition combination with the highest comprehensive structural correct pronunciation probability can be directly selected as the optimal partition of the candidate word. combination.
  • the possibility that the comprehensive structural correct pronunciation probability satisfies the first preset condition may also be first selected. And combining the preferred partition partition groups corresponding to the candidate words; and then screening the optimal partition combination corresponding to the candidate words from the preferred likelihood partition combination group according to the second preset condition .
  • the “first preset condition” is used to select one or more high-quality combinations with comprehensive structural correct correct pronunciation probabilities and comprehensive structural correct pronunciation probabilities from several combinations of possible partitions, and constitute a preferred combination. Possibility partitioning group.
  • the first preset condition may be set to "integrated structural correct pronunciation probability is greater than a certain threshold", "the comprehensive structural correct pronunciation probability falls into the highest comprehensive structural correct pronunciation probability and acceptable acceptable Within the numerical interval formed by the range (for example, assuming that the highest comprehensive structural correct pronunciation probability is 99.94%, and the allowable acceptable range is 5, then the first preset condition can be set as: the comprehensive structural correct pronunciation probability falls into [(99.94-5)%, 99.94%] within the numerical range)" and so on.
  • the “second preset condition” is used to further assist the high-quality combination of the comprehensive structural correct pronunciation probabilities that are very similar or equal (ie, the likelihood partition combination in the "preferred likelihood partition combination group”). Screening, so that the best combination of partitions selected is more convenient for foreign language learners to remember and apply to the spelling of other related words.
  • the second preset condition may be set to any one or more of the following:
  • the possible partition combination with the largest total number of partition units whose structural correct pronunciation probability satisfies the preset requirement is selected as the optimal partition combination. Thereby, it is possible to facilitate the foreign language learner to accumulate more partition units having a higher structural correct pronunciation probability.
  • the partition unit including the "general prefix” or / and the “general suffix” is selected as the optimal partition. combination.
  • the alphabet string represented by one or more partition units is selected as a complete word, and the pronunciation code of the complete word is represented by the partition unit
  • the pronunciation code is the same as the possible partition combination as the best partition combination.
  • first preset conditions and “second preset conditions” listed above are merely illustrative of the invention and are not intended to limit the invention. In practical applications, other "first preset conditions” or “second preset conditions” may also be set in combination with the language features of the alternative words.
  • the preferred likelihood partition combination group corresponding to the candidate word is first constructed with the first preset condition, and then the preferred probability partition combination group is selected according to the second preset condition.
  • the optimal combination of the alternative words can ensure that the best partition combination selected has a higher probability of correct pronunciation, and at the same time, it is convenient for foreign language learners to perform associative memory more effectively.
  • Step 130 Record the candidate words and their corresponding optimal partition combinations in the word optimal partition combination database.
  • the candidate word and its corresponding optimal partition combination may be recorded in the word optimal partition combination database.
  • the “word likelihood partition combination database” and the “word optimal partition combination database” may be two independent databases, or may be in the same database. Different parts (ie, in the same database, with alternative words, likelihood partition combinations, partition units, structural correct pronunciation probabilities, and correspondences of optimal partition combinations), the embodiments of the present invention are for the two databases.
  • the form of expression is not specifically limited.
  • the "word likelihood partition combination database” and the "word best partition combination database” are different parts in the same database, it may be determined according to the correspondence relationship recorded in the "word possibility partition combination database” After the optimal partition combination corresponding to an alternative word, the optimal partition combination is marked to obtain a correspondence of the alternative word-best partition combination.
  • Step 200 Determine a spelling partition of the target word according to an optimal partition combination corresponding to the target word.
  • the "spelling partition” refers to one or more regions formed by dividing a complete letter string writing sequence and/or a complete pronunciation code of a target word.
  • the area may be a partition unit, that is, including a letter string and a pronunciation code, and then determining, according to the optimal partition combination corresponding to the target word, the spelling partition of the target word.
  • the specific implementation manner is: each partition unit in the optimal partition combination determined by step 100 is used as the spelling partition of the target word.
  • the area may also be a string of letters represented by a partition unit (ie, an area formed by dividing only a complete letter string write sequence of the target word), in which case the basis
  • the specific partition combination corresponding to the target word determines that the spelling partition of the target word may be: inserting a syllable mark in the complete letter string writing sequence of the target word according to the optimal partition combination, thereby obtaining Each spelling partition of the target word.
  • the embodiment of the present invention can improve the structural correct pronunciation probability of the spelling partition of the target word, and thus in the spelling structure, compared with the conventional manner of determining the spelling partition of the word by syllable splitting. Essentially improve the correct pronunciation probability of the target word.
  • the spelling partition of the target word "different” is: "fe/f/” and The structural correct pronunciation probabilities of these three partition units are 100%, 2.41%, and 98.66%, respectively, thus, "dif-fe-rent: The comprehensive structural correct pronunciation probability is only 67.02%; in this embodiment, the spelling partition of the target word "different” is: with The structural correct pronunciation probabilities of the two partition units are 100% and 99.88%, respectively, thus, “dif-ferent: The comprehensive structural correct pronunciation probability can reach 99.94%; thus, it can be seen that the spelling structure determined by the method provided by the embodiment of the present invention is "dif-ferent: The correct pronunciation probability is much higher than the spelling structure determined by the traditional syllable splitting method "dif-fe-rent: ".
  • each of the determined spelling partitions generally has a higher structural correct pronunciation probability, and the foreign language learner is reduced in the source of the spelling partition structure due to the spelling of the partition structure. The probability of reading. Therefore, after the foreign language learner keeps in mind the spelling partition of the target word learned, it can be determined that the letter string represented by each spelling partition in the target word is most likely in each associated word containing the same letter string.
  • the pronunciation of the pronunciation word (for example, the spelling partition of the English word “different” determined by the method provided in this embodiment is: with It is possible to determine that the most likely pronunciation of "dif” in each associated word containing the letter string "dif” is The most likely pronunciation of "ferent” in each associated word containing the letter string "ferent” is ), in turn, the spelling partition can be applied to other related words containing the same letter string with a larger loanable ratio, thereby achieving the highest remembering benefit and applicable benefit.
  • foreign language learners can provide relevant spelling partitions by recognizing the present invention, and in the case of facing other strange words, can naturally understand how to recognize the spelling partition of the strange word, and only need to borrow before learning.
  • the pronunciation of the associated partition in the associated word can spell the strange word with a higher probability of correctness.
  • the above-mentioned "word possibility partition combination database” can be obtained by any suitable means.
  • the second implementation of the present invention The example also provides a method of creating the word likelihood partition combination database.
  • the method may include but is not limited to the following steps:
  • Step 310 Create a partition unit database of the target vocabulary.
  • the “target vocabulary library” may be any type of vocabulary, for example, it may be a dictionary, an academic level vocabulary, a scientific literature vocabulary library, and the like. In particular, all of the candidate words recited in the "word likelihood partition combination database" are included in the target vocabulary.
  • the created “partition unit database” can be used as one of the databases in the "spelling total library” described in the first embodiment, and at least all the partitions corresponding to each word in the target vocabulary are recorded therein.
  • Unit wherein, as shown in Table 3, in some embodiments, in order to facilitate recording and querying the partition unit, each partition unit may be configured with a unique spelling code
  • each of the partition units corresponds to a letter string and pronunciation.
  • the code as well as the words including the partition unit.
  • a correspondence relationship between a partition unit (spelling code), a letter string, a pronunciation code, and each word containing the same spelling code can be obtained.
  • each of the words in the target vocabulary may be first determined respectively, and the corresponding plurality of possible partition combinations, and the plurality of partitions included in each of the possible partition combinations are determined.
  • the unit and the letter string and pronunciation code corresponding to each partition unit, and then the "partition unit database" of the target vocabulary is created according to the correspondence.
  • the "partition unit database" for creating the target vocabulary can conveniently count words containing the same partition unit or letter string, and conveniently determine the structural correct pronunciation probability of each partition unit.
  • the specific implementation manner of the “determining a plurality of possible partition combinations corresponding to a word, a plurality of partition units included in each of the possible partition combinations, and a letter string and a pronunciation code corresponding to each partition unit” may be :
  • each language in the target vocabulary is divided by the linguistic experts to obtain several possible partition combinations corresponding to the word, and at the same time, several partition units included in each possible partition combination are extracted, and This is recorded in the correspondence table as shown in Table 4.
  • the "determining a plurality of possible partition combinations corresponding to a word, a plurality of partition units included in each of the likelihood partition combinations, and each The specific implementation manner of the letter string and pronunciation code corresponding to the partition unit may also be:
  • each of the words in the target vocabulary is determined for its basic partition unit; and then corresponding corresponding partitions are determined according to the basic partition unit.
  • the preset "basic letter string and basic pronunciation code correspondence relationship library” may be one of the databases in the "spelling total library” described in the first embodiment, and all basic letter strings and basic pronunciations are recorded therein.
  • the correspondence relationship of the codes (wherein Table 5 shows the correspondence relationship of the partial "basic letter string - basic pronunciation code”). Therefore, when the "basic letter string - basic pronunciation code" is matched for each word, the matching result can be quickly obtained, and the efficiency of building the database can be improved.
  • the syllable mark of the word may be first extracted from the target vocabulary (for example, "- in the word “dif-fe-rent” "ie its syllable mark” and determine that the word is a monosyllabic word (a word that does not contain a syllable mark) or a multi-syllable type word (a word that contains at least one syllable mark).
  • the word is a multi-syllable word
  • the letters of the letter strings are matched in the order of "basic letter string-base pronunciation code” from left to right.
  • the pronunciation code corresponding to the letter string for example, after matching: with ).
  • the target foreign language it is detected whether there is a shared phoneme in the word (in English, for example, in two adjacent partition units, if the last letter of the previous partition unit is a consonant letter, then The first letter of a partition unit is a vowel, and the consonant can be a shared phoneme. If it exists, the shared phoneme is copied to the first position of the next partition unit, so that the partition unit becomes a complete auditory unit. The sound zone, thereby obtaining the base partition unit of the word and the base partition combination composed of these base partition units.
  • the complete alphabetic string writing sequence of the word is not split according to the syllable mark of the word, and the complete alphabetic string writing sequence-complete pronunciation code is
  • the base partition unit is also a combination of one of its possible partitions.
  • the letters in the letter strings can be sequentially "from left to right” based on the complete pronunciation code of the word and the "base letter string and basic pronunciation code correspondence library”. The matching of the basic letter string - the basic pronunciation code", the pronunciation code corresponding to each letter string is obtained.
  • the various possible partition combinations of words are acquired based on the basic partition unit, and the problem of utterance disorder due to the transphonic area can be avoided.
  • the "base letter string and basic pronunciation code correspondence library” may also preset only a part of the most basic and most common "base letter string - basic pronunciation code", in creating a table. In the process of the correspondence shown in Fig. 4, the “base library of basic alphabet strings and basic pronunciation codes” is gradually improved.
  • the complete letter string writing sequence and the complete phonetic symbol corresponding to each word in the target vocabulary can be extracted first.
  • the string (ie, the full pronunciation code), and the words in the target vocabulary are sorted according to the rules of "the number of letters and the number of phonetic symbols from as few as possible” and "the difference between the number of letters and the number of phonetic symbols is small to large".
  • the order may be: “1 letter - 1 phonetic symbol”, “2 letter - 2 phonetic symbol”, “2 letter - 1 phonetic symbol”, “3 letter - 3 phonetic symbol”, “3 letter - 2 phonetic symbol”, "3 letter” -1 phonetic "....
  • the "basic letter string-base pronunciation code" included in each word is sequentially determined, and the combination of several possibility partitions corresponding to each word, the partition unit included in each possible partition combination, And the letter string and pronunciation code corresponding to each partition unit.
  • the "basic letter string-base pronunciation code" contained therein can be determined according to the following steps:
  • the complete letter string writing sequence of the monosyllabic word may be segmented according to the preset “control arrangement template”, and a plurality of suspected likelihood comparison combinations are generated, and the suspected likelihood comparison combination represents each partition.
  • a basic letter string For example, the comparison arrangement template for the 5-letter string number (12345) is:
  • the first is to contain up to 5 letters: 12345;
  • the suspected likelihood comparison combination can be obtained: their, thei+r, t+heir, the+ir, the+i+r, Th+eir, t+hei+r, t+h+eir, th+ei+r, th+e+i+r, th+e+i+r, t+he+ir, t+he+i+r, t+h+ei+r, t+h+e+ir, and t+h+e+i+r.
  • a matching rule of a plurality of basic alphabetic string-base pronunciation codes may be set according to the language feature, and the suspected likelihood comparison combination that does not conform to the phoneme segmentation rule may be deleted.
  • its matching rules can include but are not limited to:
  • the pronunciation code corresponding to "their" is The basic pronunciation code is with The total number is 2, then the total number of basic letter strings should also be 2, so that the suspected likelihoods that do not meet the rule can be deleted.
  • the remaining comparison combinations only include: “thei+r”, “t+ Heir”, “the+ir” and "th+eir”.
  • a partial split letter (for example, ch, th, etc.) in the "base letter string and basic pronunciation code correspondence library" may be set without a split flag.
  • “*” in Table 5 means that when “th” is present, “t” and “h” are not separated. Thus, based on the matching rule, the combination "t+heir" can be deleted.
  • the syllable classification rule is to perform syllable classification according to linguistic rules of the target foreign language, such as a 6-syllable classification of English, a long and short vowel classification of German, and the like.
  • the word structure is a typical "consonant-vowel-consonant (CVC)" closed syllable
  • the suffix is the letter "r" which needs to comply with the "r-" in the English 6 syllable classification.
  • R-controlled Syllable Types Therefore, the vowels need to be merged into a group with the following "r".
  • Some foreign language segmentation rules are more concise than English open and closed syllable segmentation principle. In German, for example, there is only one consonant between two vowels, and the consonant and the following vowels constitute a syllable.
  • step (A3) according to the ordering in the step (A1), matching based on the correspondence relationship between the remaining comparison combination and the "base letter string and basic pronunciation code correspondence library", determining the "base letter string” included in the word - Basic pronunciation code.”
  • step (A2) only the "th+eir" is left in the contrast combination of "their", and can be found by querying the "base letter string and the basic pronunciation code correspondence library". Correspondence but could not be found Correspondence so that The new correspondence is updated to the "base letter string and basic pronunciation code correspondence library".
  • each of the suspected likelihood comparison combinations is arranged in reverse order, so as not to be confused with the "basic letter string-base pronunciation code" of the previous stage, first guaranteeing the "basic letter string-based" in the maximum number of letters.
  • the matching code is not found in the pronunciation code, and it is found in the first level.
  • the complete pronunciation code for the word “early” is The syllable mark of the alphabet is located between “ear” and “ly”, so that the word can be split into two syllables “ear” and “ly”.
  • each of the split syllables is determined one by one in the form of a monosyllabic word (ie, the above steps (A1) to (A3)) to determine the corresponding relationship of the "basic letter string-base pronunciation code".
  • this step is substantially the same as the above steps (A1) to (A3).
  • the specific embodiment of this step is substantially the same as the above steps (A1) to (A3).
  • Step 320 Obtain a plurality of possible partition combinations corresponding to each candidate word, and each of the plurality of partition units included in the probability partition combination and the letter string and pronunciation code corresponding to each partition unit are recorded in "Word Possibility Partition Combination Database".
  • determining a plurality of possible partition combinations corresponding to the candidate words, each of the plurality of partition units included in the possibility partition combination, and a specific implementation of a letter string and a pronunciation code corresponding to each partition unit For the manner, reference may be made to the corresponding description in step 310 above, and details are not described herein again.
  • Step 330 Extract, for each partition unit in each of the possible partition combinations corresponding to each of the candidate words, a first type of word including the partition unit from the partition unit database and include the a second type of word of the letter string corresponding to the partition unit, and determining a structural correct pronunciation probability corresponding to the partition unit based on the first type of word and the second type of word.
  • a word including a letter string and a pronunciation code corresponding to a certain partition unit is referred to as a “first type word” corresponding to the partition unit, and a word including a letter string corresponding to the partition unit is called A "second type of word” corresponding to the partition unit (that is, "associated word” described in the first embodiment).
  • the corresponding first type of word is: including the letter string "if” and the letter string "if” is pronounced as Words such as: d if ferent, g if t, etc.; and the corresponding second type of words are: words including the letter string "if", such as: d if ferent, g if t, l if e, r if Le, un if orm, mod if y, etc.
  • the first type word and the second type word corresponding to each partition unit in each possible partition combination corresponding to each candidate word may be determined by querying the foregoing “partition unit database”.
  • partition unit database For example, as shown in Table 2, one of the possible partition combinations of the word "different" is "dif-ferent: Partition unit with The partition unit can be determined by querying the "partition unit database" (as shown in Table 3)
  • the corresponding first type of word and the second type of word include dif ficult, dif ficulty, etc.; partition unit
  • the corresponding first type of words include dif ferent, etc.
  • the second type of words include dif ferent , dif ferent ial , and the like.
  • the structural correct pronunciation can be determined for each partition unit based on its corresponding first type of word (including both the letter string and the pronunciation code) and the second type of word (including only the letter string). Probability.
  • the foreign language learner's language level or the main contact vocabulary category may be different.
  • Different foreign language learners may have different vocabulary, but in the case of different vocabulary, even According to the same method (for example, the structural correct pronunciation probability of the partition unit is obtained according to the number of related words as described above), there is a possibility that the structural correct pronunciation probability of the partition unit may be different.
  • the vocabulary is small, and in the words they touch, the pronunciation of a letter string X may be /x1/, and thus the structural correct pronunciation probability of X/x1/ is 100%.
  • the vocabulary is relatively large.
  • the possible pronunciation of the letter string X may include /x2/ in addition to /x1/, then At this time, the structural correct pronunciation probability of X/x1/ is no longer 100%, and may even be lower than 50%. If the calculation of the structural correct pronunciation probability of the partition unit is performed directly based on all the words in the target vocabulary, it is impossible to provide a more suitable spelling partition scheme for the student's characteristics.
  • the first type of word, the second type of word and the Alternative words belong to the same vocabulary category.
  • the vocabulary categories may include, but are not limited to, primary school vocabulary, secondary vocabulary, professional vocabulary, everyday language, travel terminology, and the like.
  • a foreign language learner is generally not only a back word when learning a foreign language, but also a statistical place where a practical application scenario (for example, a reading material, a listening material, a news entertainment webpage, or the like) Performing language learning, the number of occurrences of different words in the corresponding statistical scene is different, in still other embodiments, the "determining for each partition unit separately, based on its corresponding first type of word and second type of word
  • the specific implementation manner of the structural correct pronunciation probability may also be:
  • the “statistical scene corresponding to the vocabulary category” may be specifically: a reading material, a listening material, and the like corresponding to the vocabulary category.
  • the learner's learning scenario can be more consistent, and the sum is excluded. Regardless of the influence of the vocabulary that is rare in the application on the structural correct pronunciation probability of the partition unit, the reliability of the structural correct pronunciation probability of the partition unit in the "word likelihood partition combination database" can be further improved.
  • Step 340 Record the structural correct pronunciation probability corresponding to each of the partition units in the word likelihood partition combination database.
  • the corresponding correspondence is recorded in the “word possibility partition combination database”, specifically, the structurality of a certain partition unit is determined. After the correct pronunciation probability, the structural correct pronunciation probability is recorded at the position corresponding to the partition unit in the "word possibility partition combination database”.
  • the method for creating a word likelihood partition combination database provided by this embodiment can efficiently obtain several possible partition combinations of each word in the target vocabulary, and partition units included in each possible partition combination. , the letter string and pronunciation code corresponding to each partition unit, and the structural correct pronunciation probability corresponding to each partition unit.
  • determining the corresponding structural correct pronunciation probability based on the first type word and the second type word corresponding to the partition unit is only one of the better implementation manners, and is actually applied.
  • the structural correct pronunciation probability of the partition unit may also be determined in other ways, which should fall within the scope of the claimed invention.
  • the structural correct pronunciation probability of the partition unit may be determined based only on the number of associated partition units of the partition unit without considering the associated word.
  • the “associated partition unit” refers to a partition unit having the same letter string as the partition unit.
  • the partition units of all the words in the target vocabulary are first determined, and the partition units having the same letter string are used as the associated partition units of each other.
  • FIG. 4 is a schematic structural diagram of an apparatus for determining a spelling partition of a target word according to an embodiment of the present invention.
  • the apparatus 40 includes: an optimal partition combination determining unit 41 and a spelling partition determining unit 42.
  • the optimal partition combination determining unit 41 is configured to determine, according to the preset spelling total library, an optimal partition combination corresponding to the received target word, wherein the spelling total library includes a word optimal partition combination database. An optimal partition combination corresponding to each of the candidate words is recorded in the word optimal partition combination database, the target word is one of the candidate words; the spelling partition determining unit 42 is configured to Determining the spelling partition of the target word according to the optimal partition combination corresponding to the target word.
  • the optimal partition combination determining unit 41 may first determine the optimal partition combination corresponding to the received target word based on the preset spelling total library, and then use the spelling.
  • the partition determining unit 42 determines the spelling partition of the target word according to the optimal partition combination corresponding to the target word.
  • the spelling total library comprises a word optimal partition combination database, wherein the word optimal partition combination database records an alternative word and an optimal partition combination corresponding to each of the candidate words, the target word is One of the candidate words, and each of the partition units in the optimal partition combination has a higher structural correct pronunciation probability.
  • the spelling total library further includes: a word likelihood partition combination database, wherein the candidate word is recorded in the word likelihood partition combination database, and each of the candidate words corresponds to a plurality of possible partition combinations, each of the plurality of partition units included in the likelihood partition combination, and a structural correct pronunciation probability corresponding to each of the partition units; wherein each of the partition units represents a Corresponding relationship between a letter string and a pronunciation code; an optimal partition combination corresponding to each of the candidate words in the word optimal partition combination database can be filtered from the word likelihood partition combination database.
  • the apparatus 40 further includes: a structural correct pronunciation probability extraction unit 43, a screening unit 44, and a word optimal partition combination database creation unit 45.
  • the structural correct pronunciation probability extraction unit 43 is configured to determine, according to the word likelihood partition combination database, a plurality of possible partition combination corresponding to each of the candidate words, each of the possibility partition combinations being included a plurality of partition units, and a structural correct pronunciation probability corresponding to each of the partition units;
  • the filtering unit 44 is configured to filter, for each of the candidate words, a corresponding correct pronunciation probability of each of the partition units in each of the possible partition combinations corresponding to the candidate words, and select corresponding Optimal partition combination;
  • the word optimal partition combination database creating unit 45 is configured to record the candidate words and their corresponding optimal partition combinations in the word optimal partition combination database.
  • the screening unit 44 includes a comprehensive structural correct pronunciation probability determination module 441 and a screening module 442.
  • the integrated structural correct pronunciation probability determining module 441 is configured to determine, for each of the candidate words, a structural correct pronunciation probability of each of the partition units in each of the possible partition combinations corresponding to the candidate words, respectively.
  • the screening module 442 is configured to filter out the optimal partition combination corresponding to the candidate word based on the comprehensive structural correct pronunciation probability of each possible partition combination corresponding to the candidate word.
  • the screening module 442 is specifically configured to: screen out a possible partition combination that comprehensively corrects the correct pronunciation probability to meet the first preset condition, and constitute a preferred likelihood partition combination corresponding to the candidate word. And selecting, according to the second preset condition, the optimal partition combination corresponding to the candidate word from the preferred likelihood partition combination group.
  • the spelling total library further includes: a partition unit database, wherein the partition unit database records all partition units corresponding to each word in the target vocabulary, and each of the partition units corresponds to Letter string and pronunciation code and words including the partition unit, wherein the candidate word is included in the target vocabulary; each of the partition units in the word likelihood partition combination database is structurally correct
  • the pronunciation probability can be calculated based on the partition unit database.
  • the apparatus 40 further includes a likelihood partition combination acquisition unit 46, a structural correct pronunciation probability calculation unit 47, and a word likelihood partition combination database creation unit 48.
  • the possibility partition combination obtaining unit 46 is configured to respectively acquire a plurality of possible partition combination corresponding to each of the candidate words, and each of the possibility partition combinations includes a plurality of partition units;
  • the structural correct pronunciation probability calculation unit 47 is configured to extract, for each partition unit in each of the possible partition combinations corresponding to each of the candidate words, the first unit including the partition unit from the partition unit database a type of word and a second type of word including a letter string corresponding to the partition unit, and determining a structural correct pronunciation probability corresponding to the partition unit based on the first type of word and the second type of word;
  • the word likelihood partition combination database creating unit 48 is configured to record the structural correct pronunciation probability corresponding to each of the partition units in the word likelihood partition combination database.
  • the first type of word, the second type of word, and the candidate word belong to the same vocabulary category.
  • the structural correct pronunciation probability calculation unit 47 is specifically configured to: for each partition unit in each of the possible partition combination corresponding to each of the candidate words, from the partition unit database Extracting a first type of word including the partition unit and a second type of word including a letter string corresponding to the partition unit; acquiring a number of occurrences of the first type of word in a statistical scene corresponding to the vocabulary category And the number of occurrences of the second type of words in the statistical scene is recorded as the second number of occurrences; and the partition is determined according to the first number of occurrences and the second number of occurrences The structural correct pronunciation probability corresponding to the unit.
  • the spelling total library further includes: a base letter string and a base pronunciation code correspondence relationship library, wherein the basic letter string and the basic pronunciation code correspondence relationship library record all the basic letter strings and Corresponding basic pronunciation code; the partition unit database may be calculated based on the target vocabulary library and the basic letter string and the basic pronunciation code correspondence database.
  • the device of the embodiment of the present invention has the advantage that the device for determining the spelling partition of the target word provided by the optimal partition combination determining unit 41 is based on the word in the preset spelling total library. a better partition combination database, determining the optimal partition combination corresponding to the received target word, and then using the spelling partition determining unit 42 to determine the spelling partition of the target word according to the optimal partition combination corresponding to the target word, wherein The determined partition unit in the optimal partition combination has a higher structural correct pronunciation probability, thereby being able to substantially improve the correct pronunciation probability of the target word from the spelling structure.
  • FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
  • the electronic device 500 can be any type of electronic device, such as a learning machine, a smart phone, a robot, a personal computer, a central or cloud server, etc., capable of executing The method for determining the spelling partition of the target word provided by the above method embodiment, or the apparatus for determining the spelling partition of the target word provided by the above device embodiment.
  • the electronic device 500 includes:
  • processors 501 and memory 502 one processor 501 is taken as an example in FIG.
  • the processor 501 and the memory 502 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
  • the memory 502 is a non-transitory computer readable storage medium, and can be used for storing a non-transitory software program, a non-transitory computer executable program, and a module, such as a method for determining a spelling partition of a target word in an embodiment of the present invention.
  • Corresponding program instructions/modules for example, the optimal partition combination determining unit 41, the spelling partition determining unit 42, the structural correct pronunciation probability extraction unit 43, the screening unit 44, and the word optimal partition combination database creation shown in FIG.
  • the processor 501 executes various functional applications and data processing of the apparatus 40 for determining the spelling partition of the target word by running non-transitory software programs, instructions, and modules stored in the memory 502, that is, implementing any of the above method embodiments The method of determining the spelling partition of the target word.
  • the memory 502 can include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function; and the storage data area can store usage of the device 40 according to the spelling partition of the determined target word. Created data, etc.
  • memory 502 can include high speed random access memory, and can also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.
  • memory 502 can optionally include memory remotely disposed relative to processor 501, which can be connected to electronic device 500 over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the one or more modules are stored in the memory 502, and when executed by the one or more processors 501, perform a method of determining a spelling partition of a target word in any of the above method embodiments, for example, performing The method steps 100 to 200 of FIG. 1 described above, the method steps 110 to 130 of FIG. 2, and the method steps 310 to 340 of FIG. 3 implement the functions of the units 41-48 of FIG.
  • Embodiments of the present invention also provide a non-transitory computer readable storage medium storing computer executable instructions that are executed by one or more processors, such as Executed by a processor 501 in FIG. 5, the one or more processors may be configured to perform the method of determining the spelling partition of the target word in any of the above method embodiments, for example, performing the method in FIG. 1 described above. Steps 100 through 200, method steps 110 through 130 of FIG. 2, and method steps 310 through 340 of FIG. 3, implement the functions of units 41-48 of FIG.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the various embodiments can be implemented by means of software plus a general hardware platform, and of course, by hardware.
  • One of ordinary skill in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program in a computer program product, and the computer program can be stored in a non-transitory computer.
  • the computer program includes program instructions that, when executed by an electronic device, cause the electronic device to perform the flow of an embodiment of the various methods described above.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
  • the foregoing products may be configured to determine a spelling partition of a target word provided by an embodiment of the present invention, and have a spelling partition for performing a target word determination.
  • the method corresponds to the functional modules and benefits. For a technical detail that is not described in detail in this embodiment, reference may be made to a method for determining a spelling partition of a target word provided by an embodiment of the present invention.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

本发明实施例涉及电子辅助教学技术领域,具体公开了一种确定目标单词的拼读分区的方法和电子设备。其中,所述方法包括:基于预设的拼读总库,确定接收到的目标单词对应的最佳分区组合;根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区;其中,所述拼读总库包括单词最佳分区组合数据库,所述单词最佳分区组合数据库中记载有备选单词以及每一个所述备选单词对应的最佳分区组合,所述目标单词为所述备选单词中的一个。通过上述技术方案,本发明实施例能够从拼读结构的源头上提升所述目标单词内各拼读分区的正确发音概率。

Description

一种确定目标单词的拼读分区的方法和电子设备 技术领域
本发明实施例涉及电子辅助教学技术领域,尤其涉及一种确定目标单词的拼读分区的方法和电子设备。
背景技术
一般地,以英语、德语等“字母拼写文字”类语言为母语的人都能依靠直观方式观察单词本身的字母串写序列,懂得首先把所述序列按音节方式分割成分区字母串,然后懂得在没有音标提示的协助下,仅凭语言字母拼音规则逐一观察分割后各组音节的字母串直接进行发音,或进行音节内的音素分割拼读。
这种分割成分区字母串的方式是为了把词汇中包含相同分区字母串的单词关联起来成为关联单词,以方便把其中的关联分区字母串统一按语言字母拼音规则进行发音分类,并按对应的发音类型进行拼读。理论上,这种拼读方式能有效地把关联单词各式各样的整体读音简化成有限音节类型的分区读音,属于解决“字母拼写文字”类外语的读音问题的上佳途径。但实际上在关联单词之间的关联分区字母串除了包含关联同类发音,也常会出现超过一种或多种关联异类发音的对应关系,而这些不同的对应关系的出现规律有不少是包含一些非规律性的特例或存在复杂难以记忆的应用规律,这使得外语学习者、甚至老师在面对这些疑似关联单词时,要么因当前分区字母串本身包含一种或者多种异音关联字母串而把其中一种关联异类发音,在不确定的情况下误读出来;要么误认定另一关联单词中的关联异类发音为目标单词关联分区字母串的关联同类发音,而产生过份自信的误读;这两种 误读问题均是由于拼读分区出现的关联异类发音造成的,而这些关联异类发音是目标语言拼读分区的结构性产物,所以所述两种误读问题都是属于分区结构性的误读问题,容易成为外语学习者、甚至老师不由自主地误读的诱因。
因此,如何改善单词的拼读分区的结构以从结构源头上提升单词的正确发音概率是当前亟待解决的问题。
发明内容
本发明实施例提供了一种确定目标单词的拼读分区的方法和电子设备,能够针对该目标单词提供有助提升结构性正确发音概率的拼读分区。
为解决上述技术问题,本发明实施例提供了如下技术方案:
第一方面,本发明实施例提供一种确定目标单词的拼读分区的方法,包括:
基于预设的拼读总库,确定接收到的目标单词对应的最佳分区组合,其中,所述拼读总库包括单词最佳分区组合数据库,所述单词最佳分区组合数据库中记载有备选单词以及每一个所述备选单词对应的最佳分区组合,所述目标单词为所述备选单词中的一个;
根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区。
可选地,所述拼读总库还包括:单词可能性分区组合数据库,所述单词可能性分区组合数据库中记载有所述备选单词,每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元,以及,每一个所述分区单元对应的结构性正 确发音概率;其中,每一个所述分区单元代表一种字母串和一种发音代码的对应关系;
所述单词最佳分区组合数据库中的每一个所述备选单词所对应的最佳分区组合可从所述单词可能性分区组合数据库中筛选得到。
可选地,所述方法还包括:
基于所述单词可能性分区组合数据库,确定每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元,以及,每一个所述分区单元对应的结构性正确发音概率;
分别针对每一个所述备选单词,根据所述备选单词对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合;
将所述备选单词及其对应的最佳分区组合记录于所述单词最佳分区组合数据库。
可选地,所述分别针对每一个所述备选单词,根据所述备选单词对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合,包括:
分别针对每一个所述备选单词,根据所述备选单词对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,确定所述备选单词对应的每种可能性分区组合的综合结构性正确发音概率;
基于所述备选单词对应的每种可能性分区组合的综合结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合。
可选地,所述基于所述备选单词对应的每种可能性分区组合的综 合结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合,包括:
筛选出综合结构性正确发音概率满足第一预设条件的可能性分区组合,构成所述备选单词对应的优选可能性分区组合群;
根据第二预设条件从所述优选可能性分区组合群中筛选出所述备选单词对应的最佳分区组合。
可选地,所述拼读总库还包括:分区单元数据库,所述分区单元数据库中记载有目标词汇库中每一个单词对应的全部分区单元,每一个所述分区单元对应的字母串和发音代码以及包括所述分区单元的单词,其中,所述目标词汇库中包括所述备选单词;
所述单词可能性分区组合数据库中的每一个所述分区单元的结构性正确发音概率可基于所述分区单元数据库计算得到。
可选地,所述方法还包括:
分别获取每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元;
分别针对每一个所述备选单词对应的每种可能性分区组合中的每个分区单元,从所述分区单元数据库中提取出包括所述分区单元的第一类单词和包括所述分区单元对应的字母串的第二类单词,并基于所述第一类单词和所述第二类单词确定所述分区单元对应的结构性正确发音概率;
将每一个所述分区单元对应的结构性正确发音概率对应记录于所述单词可能性分区组合数据库。
可选地,所述第一类单词、所述第二类单词与所述备选单词属于 同一词汇类别。
可选地,所述基于所述第一类单词和所述第二类单词确定所述分区单元对应的结构性正确发音概率,包括:
获取所述第一类单词在与所述词汇类别对应的统计场景中出现的次数,记为第一出现次数;
获取所述第二类单词在所述统计场景中出现的次数,记为第二出现次数;
根据所述第一出现次数和所述第二出现次数确定所述分区单元对应的结构性正确发音概率。
可选地,所述拼读总库还包括:基础字母串与基础发音代码对应关系库,所述基础字母串与基础发音代码对应关系库中记载有所有基础字母串及其对应的基础发音代码;
所述分区单元数据库可基于所述目标词汇库和所述基础字母串与基础发音代码对应关系库计算得到。
第二方面,本发明实施例提供一种电子设备,包括:
至少一个处理器;以及,
与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上所述的确定目标单词的拼读分区的方法。
第三方面,本发明实施例还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使电子设备执行如上所述的确定目标单词的 拼读分区的方法。
本发明实施例的有益效果是:区别于现有技术的情况,本发明实施例提供的确定目标单词的拼读分区的方法和电子设备,通过基于预设的拼读总库中的单词最佳分区组合数据库,确定接收到的目标单词对应的最佳分区组合,进而根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区,其中,所确定的最佳分区组合中的分区单元具有较高的结构性正确发音概率,从而能够从拼读结构本质上提升目标单词的正确发音概率。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍。显而易见地,下面所描述的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1是本发明实施例提供的一种确定目标单词的拼读分区的方法的流程示意图;
图2是本发明实施例提供的一种基于单词可能性分区组合数据库创建单词最佳分区组合数据库的方法的流程示意图;
图3是本发明实施例提供的一种创建单词可能性分区组合数据库的方法的流程示意图;
图4是本发明实施例提供的一种确定目标单词的拼读分区的装置的结构示意图;
图5是本发明实施例提供的一种电子设备的硬件结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
需要说明的是,如果不冲突,本发明实施例中的各个特征可以相互结合,均在本发明的保护范围之内。另外,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。
当前,为了辅助“字母拼写文字”类语言的自然拼读教学,在一些词典(比如,英语词典)或者外语辅助教学设备中,除了记载有单词的完整字母串写序列、完整发音代码(比如,英语的完整音标串序列)以及其含义解析之外,还根据该外语语言学的音节分类方法在其完整字母串写序列中插入音节标记以显示该单词的拼读分区。以英语为例:根据英语语言学的“6音节”分类方法,在单词“different”中插入音节标记“-”后变成“dif-fe-rent”,以便于学习者根据“dif”、“fe”和“rent”三个拼读分区对该单词“different”进行自然拼读。
然而,基于所述传统的音节分类方法所确定的拼读分区中有不少会存在“同样的字母串,当在其前/后加载了不同的字母串组合时,其发音常会产生不同变异”的问题(即,相同字母串在不同的单词中所对应的发音有可能不同的问题),使音节性拼读分区在结构上容易出现 一些在关联字母串与正确发音的对应关系上难以察觉的变化规律,从而使得单词在音节分区拼读上出现结构性正确发音概率较低的现象,也在音节分区的结构上更增加外语学习者拼读错误的可能性。以英语为例,在“dif-fe-rent”中,字母串“rent”原本属于一个完整独立的单词,其发音为/rent/,但字母串“rent”在“dif-fe-rent”中的发音为
Figure PCTCN2019081628-appb-000001
若学习者直接将以前学过的单词“rent”的发音/rent/套用在“dif-fe-rent”中,就会导致拼读错误。而分区“fe”在英语中更包含超过5种不同发音的可能性。
基于此,本发明实施例提供了一种确定目标单词的拼读分区的方法、一种确定目标单词的拼读分区的装置、一种电子设备、一种非暂态计算机可读存储介质以及一种计算机程序产品。
其中,本发明实施例提供的确定目标单词的拼读分区的方法是一种寻找目标单词对应的最佳分区组合以综合提升目标单词各拼读分区的结构性正确发音概率的方法,具体为:基于预设拼读总库,确定接收到的目标单词的最佳分区组合,然后根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区;其中,所述拼读总库包括单词最佳分区组合数据库,所述单词最佳分区组合数据库中记载有备选单词以及每一个所述备选单词对应的最佳分区组合,所述目标单词为所述备选单词中的一个,并且,基于该拼读总库所确定的最佳分区组合中的分区单元具有相对较高的结构性正确发音概率,有助于降低外语学习者在拼读分区的结构性漏洞上,产生过份自信地误认另一已知关联单词(即,词汇中包含相同分区字母串的单词)包含的同字母异音类关联拼读分区发音为目标单词关联拼读分区的正确发音之可能 性。
其中,本发明实施例提供的确定目标单词的拼读分区的装置是由软件程序构成的能够实现本发明实施例提供的确定目标单词的拼读分区的方法的虚拟装置,其与本发明实施例提供的确定目标单词的拼读分区的方法基于相同的发明构思,具有相同的技术特征以及有益效果。
其中,本发明实施例提供的电子设备可以是任意类型的电子设备,比如:学习机、智能手机、个人电脑、平板电脑、机器人、云端服务器等等。该电子设备能够执行本发明实施例提供的确定目标单词的拼读分区的方法,或者,运行本发明实施例提供的确定目标单词的拼读分区的装置。
此外,应当理解的是,本发明实施例提供的确定目标单词的拼读分区的方法、装置、电子设备、非暂态计算机可读存储介质以及计算机程序产品能够适用于任意能够实现自然拼读方法的“字母拼写文字”类语言,比如:英语、德语、法语、希腊语、意大利语、葡萄牙语等等。从而,在本发明实施例中,所述“目标单词”可以是上述任意一种语言的单词。其中,为了方便解释说明本发明的发明构思,在本发明实施例中,主要以所述目标单词为英语单词为例进行详细说明。
下面结合附图,对本发明实施例作进一步阐述。
实施例一
图1是本发明实施例提供的一种确定目标单词的拼读分区的方法的流程示意图,该方法可以由任意类型的电子设备执行。
具体地,请参阅图1,该方法可以包括但不限于如下步骤:
步骤100:基于预设的拼读总库,确定接收到的目标单词对应的 最佳分区组合。
在本实施例中,所述“拼读总库”用于确定接收到的目标单词对应的最佳分区组合,其可以是预先全部设置好的数据总库,也可以是逐步更新完善的数据总库。具体地,所述拼读总库内可以包括一个或者多个记载有多种对应关系的数据库。特别地,在本实施例中,该拼读总库中包括“单词最佳分区组合数据库”。如表1所示,所述“单词最佳分区组合数据库”中至少记载有备选单词以及每一个所述备选单词对应的最佳分区组合。
表1
Figure PCTCN2019081628-appb-000002
其中,所述“备选单词”可以是任意一种“字母拼写文字”类语言的单词,比如,英语单词、德语单词、法语单词、希腊语单词等等,在本实施例中,以所述备选单词为英语单词为例进行说明,但其并不用于限定本发明。具体地,所述“备选单词”可以是任意已经确定好其对应的最佳分区组合的单词,其中,每一单词的“最佳分区组合”由若干个分区单元组成,每一个分区单元代表一种字母串和一种发音代码的对应关系(其中,所述“发音代码”是指能够表征某种发音的代码,具体地,其可以为音标串,或者,与相关发音对应的特定代码),并且,最佳分区组合中的分区单元的结构性正确发音概率相对较高 (即,最佳分区组合中的分区单元所代表的字母串在各关联单词中根据与其对应的发音代码进行发音的概率相对较高)。举例来说,如表1所示,单词“different”的最佳分区组合为“dif-ferent:
Figure PCTCN2019081628-appb-000003
其由两个分区单元(即,
Figure PCTCN2019081628-appb-000004
Figure PCTCN2019081628-appb-000005
组成,其中,分区单元
Figure PCTCN2019081628-appb-000006
Figure PCTCN2019081628-appb-000007
均具有相对较高的结构性正确发音概率,即,字母串“dif”在各包含“dif”的关联单词中都发
Figure PCTCN2019081628-appb-000008
音,而字母串“ferent”在绝大多数包含“ferent”的关联单词中都发
Figure PCTCN2019081628-appb-000009
音,从而,基于该最佳分区组合“dif-ferent:
Figure PCTCN2019081628-appb-000010
”确定单词“different”的拼读分区以进行自然拼读,可以从拼读结构的源头上提升单词“different”各分区的正确发音概率,有助于从拼读结构的源头上提升学员正确拼读“different”的概率。
在本实施例中,所述“目标单词”为电子设备接收到的待确定其拼读分区的单词,其为上述备选单词中的一个。从而,当电子设备接收到某一目标单词时,可以通过查询拼读总库中的单词最佳分区组合数据库,即可匹配得到该目标单词对应的最佳分区组合。
其中,在一些实施例中,所述“拼读总库”中还可以包括“单词可能性分区组合数据库”,所述“单词最佳分区组合数据库”中的每一个所述备选单词所对应的最佳分区组合均可从该“单词可能性分区组合数据库”中筛选得到。
具体地,该“单词可能性分区组合数据库”中至少记载有上述“单词最佳分区组合数据库”中所记载的所有备选单词,每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元,以及,每一个所述分区单元对应的结构性正确 发音概率(也就是说,在该“单词可能性分区组合数据库”中记载有:备选单词、可能性分区组合、分区单元及其结构性正确发音概率的层级对应关系)。其中,每一种可能性分区组合代表一种分割拼读分区的方式,备选单词对应的“最佳分区组合”为其对应的若干种“可能性分区组合”中的一种。再者,每种可能性分区组合中的每一个分区单元代表一种字母串和一种发音代码的对应关系,该分区单元的“结构性正确发音概率”即该分区单元所代表的字母串在各关联单词中根据与其对应的发音代码进行发音的概率(亦即,仅凭任何一个关联单词中该分区单元字母串的对应发音,借用到目标单词中该分区单元对应的字母串里进行发音,其刚好对上正确发音的概率),其可以基于目标词汇库中包括该分区单元的第一类单词和包括该分区单元对应的字母串的第二类单词来确定,也可以基于该分区单元的关联分区单元的数量来确定(具体地,可参见下述实施例二所提供的创建单词可能性分区组合数据库的方法)。
举例来说,表2列举了“单词可能性分区组合数据库”中的单词“different”所对应的“部分”可能性分区组合,每一种可能性分区组合内包括的分区单元,以及,每一个分区单元对应的结构性正确发音概率。其中,由表2可见,分区单元
Figure PCTCN2019081628-appb-000011
的结构性正确发音概率为100%,则说明字母串在任意一个包含字母串“dif”的关联单词中的发音均为
Figure PCTCN2019081628-appb-000012
分区单元
Figure PCTCN2019081628-appb-000013
的结构性正确发音概率为99.88%,则说明字母串“ferent”在各包含字母串“ferent”的关联单词中发
Figure PCTCN2019081628-appb-000014
音的概率为99.88%。
表2
Figure PCTCN2019081628-appb-000015
在实际应用中,该“单词可能性分区组合数据库”可以是预先设置好的,其内所记载的单词还可以包括所述备选单词之外的单词(比如,“单词最佳分区组合数据库”中记载的备选单词包括:A、B和C,而“单词可能性分区组合数据库”中记载的单词可以包括A、B、C、D和E);或者,该“单词可能性分区组合数据库”也可以是与“单词最佳分区组合数据库”同时逐步形成的(比如,在“单词可能性分区组 合数据库”中确定单词A对应的可能性分区组合、分区单元及其结构性正确发音概率后,筛选出该单词A对应的最佳分区组合,并将其更新于“单词最佳分区组合数据库”中,从而“单词可能性分区组合数据库”中仅包括所述备选单词),本发明实施例对此不作具体限定。
具体地,请参阅图2,为本发明实施例提供的一种基于单词可能性分区组合数据库创建单词最佳分区组合数据库的方法,该方法可以包括但不限于如下步骤:
步骤110:基于所述单词可能性分区组合数据库,确定每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元,以及,每一个所述分区单元对应的结构性正确发音概率。
在本实施例中,可以认为“单词可能性分区组合数据库”是已经设置好的,因此,通过查询该“单词可能性分区组合数据库”即可获取到每一个备选单词对应的若干种可能性分区组合,每一种可能性分区组合包括的若干个分区单元,以及,每一个分区单元对应的结构性正确发音概率。
步骤120:分别针对每一个所述备选单词,根据所述备选单词对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合。
由上可知,分区单元的结构性正确发音概率用于表征其所代表的字母串在各包含该字母串的关联单词中根据与其对应的发音代码进行发音的概率;结构性正确发音概率越高,说明其所代表的字母串在各所述关联单词中根据该发音代码进行发音的概率越高。从而,在本实施例中,可以根据备选单词所对应的每种可能性分区组合中每个分区 单元的结构性正确发音概率,筛选出一个最优的可能性分区组合作为其最佳分区组合。
其中,“根据备选单词所对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,筛选出其最佳分区组合”的具体实施方式可以结合实际应用场景或者语言特性来确定。
比如,在一些实施例中,可以分别针对每一个所述备选单词,在其对应的若干种可能性分区组合中,筛选出包括最多结构性正确发音概率满足预设要求的分区单元的可能性分区组合,作为该备选单词的最佳分区组合。
其中,所述“预设要求”可以是:结构性正确发音概率为100%;也可以是:结构性正确发音概率超过某一预设阈值(比如,98%);只要所确定的最佳分区组合中具有多个结构性正确发音概率高的分区单元即可,本发明实施例对此不作具体限定。
在该实施例中,以结构性正确发音概率满足预设要求的分区单元的数量作为筛选标准,能够使得所筛选出的最佳分区组合中的大多数分区单元都具有相对较高的结构性正确发音概率,从而,能够从分区结构的源头上辅助提升外语学习者自然拼读该备选单词时的正确率,同时,便于外语学习者将这些分区单元所代表的字母串-发音代码的对应关系更充分有效并且自然而然地应用到其它包含相同字母串的陌生单词中,达到尽量高的牢记效益和可应用效益(即,将该对应关系应用到更多包含该字母串的关联单词中)。
又如,在另一些实施例中,也可以首先分别针对每一个所述备选单词,根据所述备选单词对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,确定所述备选单词对应的每种可能性分区组 合的综合结构性正确发音概率;然后,基于所述备选单词对应的每种可能性分区组合的综合结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合。
其中,在该实施例中,所述“综合结构性正确发音概率”用于表征某一可能性分区组合中各分区单元的结构性正确发音概率的综合表现,其可以根据该可能性分区组合中的各个分区单元的结构性正确发音概率计算得到。其中,为了能够较大程度地反映出可能性分区组合中各个分区单元的结构性正确发音概率,在实际应用中,可以采用数学平均算法对每种可能性分区组合中每个分区单元的结构性正确发音概率进行计算,从而得到每种可能性分区组合的综合结构性正确发音概率。其中,所述数学平均算法可以包括但不限于:总数学平均算法、中间平均算法、众数平均算法、RMS平均算法等等。
进一步地,在确定了备选单词对应的每种可能性分区组合的综合结构性正确发音概率之后,可以直接选择综合结构性正确发音概率最高的可能性分区组合作为该备选单词的最佳分区组合。
或者,在一些实施例中,在确定了备选单词对应的每种可能性分区组合的综合结构性正确发音概率之后,也可以首先筛选出综合结构性正确发音概率满足第一预设条件的可能性分区组合,构成所述备选单词对应的优选可能性分区组合群;然后再根据第二预设条件从所述优选可能性分区组合群中筛选出所述备选单词对应的最佳分区组合。
其中,所述“第一预设条件”用于从若干种可能性分区组合中筛选出一个或者多个综合结构性正确发音概率高,并且综合结构性正确发音概率十分相近的优质组合,构成优选可能性分区组合群。举例来说,该第一预设条件可以设置为“综合结构性正确发音概率大于某一 门限值”、“综合结构性正确发音概率落入根据最高综合结构性正确发音概率以及允许的可接受范围所形成的数值区间内(比如,假设最高综合结构性正确发音概率为99.94%,允许的可接受范围为5,则,该第一预设条件可以设为:综合结构性正确发音概率落入[(99.94-5)%,99.94%]的数值区间内)”等。
所述“第二预设条件”用于对综合结构性正确发音概率十分相近或相等的优质组合(即,所述的“优选可能性分区组合群”中的可能性分区组合)进行进一步的辅助筛选,使得所筛选出的最佳分区组合更加便于外语学习者记忆以及举一反三地应用到其它关联单词的拼读上。举例来说,所述第二预设条件可以设置为如下所述的任意一项或者多项:
(1)、在所述“优选可能性分区组合群”中,筛选出结构性正确发音概率满足预设要求的分区单元总数量最多的可能性分区组合作为最佳分区组合。从而,可以便于外语学习者累积更多的具有较高结构性正确发音概率的分区单元。
(2)、在所述“优选可能性分区组合群”中,筛选出分区单元包括“通用前缀(general prefix)”或/和“通用后缀(general suffix)”的可能性分区单元作为最佳分区组合。从而,可以便于外语学习者将“通用前缀(general prefix)”或/和“通用后缀(general suffix)”应用到更多的单词拼读中。
(3)、在所述“优选可能性分区组合群”中,筛选出存在一个或者多个分区单元所代表的字母串为一完整单词,并且,该完整单词的发音代码与该分区单元所代表的发音代码相同的可能性分区组合作为最佳分区组合。从而,外语学习者学会了该分区单元,即学会了该完 整单词的发音,便于联想和记忆。
其中,应当理解的是,上述所列举的“第一预设条件”和“第二预设条件”的具体形式仅为了解释本发明,而不用于限定本发明。在实际应用中,还可以结合备选单词的语言特征设置其它的“第一预设条件”或“第二预设条件”。
在该实施例中,通过首先以第一预设条件构建出所述备选单词对应的优选可能性分区组合群,再根据第二预设条件从所述优选可能性分区组合群中筛选出该备选单词的最佳分区组合,能够保障筛选出的最佳分区组合具有较高的正确发音概率,同时,便于外语学习者更加有效地进行联想记忆。
步骤130:将所述备选单词及其对应的最佳分区组合记录于所述单词最佳分区组合数据库。
在本实施例中,在确定了某一备选单词的最佳分区组合后,可以将该备选单词及其对应的最佳分区组合记录于单词最佳分区组合数据库。
其中,可以理解的是,在本实施例中,所述“单词可能性分区组合数据库”与所述“单词最佳分区组合数据库”可以是两个相互独立的数据库,也可以是同一数据库中的不同部分(即,在同一数据库中,同时记载有备选单词、可能性分区组合、分区单元、结构性正确发音概率以及最佳分区组合的对应关系),本发明实施例对这两个数据库的表现形式不作具体限定。而当所述“单词可能性分区组合数据库”与所述“单词最佳分区组合数据库”为同一数据库中的不同部分时,可以在基于“单词可能性分区组合数据库”中所记载的对应关系确定了某一备选单词对应的最佳分区组合后,对该最佳分区组合进行标记, 从而得到备选单词-最佳分区组合的对应关系。
步骤200:根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区。
在本实施例中,所述“拼读分区”是指对目标单词的完整字母串写序列和/或完整发音代码进行分割后形成的一个或者多个区域。其中,在本实施例中,该区域可以为一个分区单元,即,同时包括字母串和发音代码,那么,所述根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区的具体实施方式为:以步骤100所确定的最佳分区组合中的各个分区单元作为该目标单词的拼读分区。
或者,在一些实施例中,该区域也可以为一个分区单元所代表的字母串(即,仅对目标单词的完整字母串写序列进行分割所形成的区域),在该情况下,所述根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区的具体实施方式可以是:根据所述最佳分区组合,在目标单词的完整字母串写序列中插入音节标记,从而得到该目标单词的各个拼读分区。
通过上述技术方案可知,与传统的通过音节拆分确定单词的拼读分区的方式相比,本发明实施例能够提升该目标单词的拼读分区的结构性正确发音概率,从而在拼读结构的本质上提升目标单词的正确发音概率。比如,在一般的英语词典中,目标单词“different”的拼读分区为:
Figure PCTCN2019081628-appb-000016
“fe/f/”和
Figure PCTCN2019081628-appb-000017
这三个分区单元的结构性正确发音概率分别为100%、2.41%和98.66%,从而,“dif-fe-rent:
Figure PCTCN2019081628-appb-000018
”的综合结构性正确发音概率只有67.02%;而在本实施例中,目标单词“different”的拼读分区为:
Figure PCTCN2019081628-appb-000019
Figure PCTCN2019081628-appb-000020
这两个分区单元的结构性正确发 音概率分别为100%和99.88%,从而,“dif-ferent:
Figure PCTCN2019081628-appb-000021
”的综合结构性正确发音概率却能达到99.94%;由此可见,基于本发明实施例提供的方法所确定的拼读结构“dif-ferent:
Figure PCTCN2019081628-appb-000022
”的正确发音概率远远高于基于传统音节拆分方法所确定的拼读结构“dif-fe-rent:
Figure PCTCN2019081628-appb-000023
”。
进一步地,由于在本实施例中,所确定的各个拼读分区一般都具有较高的结构性正确发音概率,在拼读分区结构的源头上降低了外语学习者因拼读分区结构而产生误读的概率。从而,外语学习者在牢记所学过的目标单词的拼读分区后,即可确定该目标单词中每个拼读分区所代表的字母串在各包含相同字母串的关联单词中的最有可能的发音(比如,通过本实施例提供的方法确定英语单词“different”的拼读分区为:
Figure PCTCN2019081628-appb-000024
Figure PCTCN2019081628-appb-000025
即可确定“dif”在各包含字母串“dif”的关联单词中最有可能的发音为
Figure PCTCN2019081628-appb-000026
“ferent”在各包含字母串“ferent”的关联单词中最有可能的发音为
Figure PCTCN2019081628-appb-000027
),进而可将该拼读分区以更大的可借用占比应用到其它包含相同字母串的关联单词中,从而达到最高牢记效益和可应用效益。
再者,外语学习者透过认识本发明提供相关拼读分区,在面对其它陌生单词的情况下,也能够自然而然地懂得如何识别出该陌生单词的拼读分区,并只需借用之前学习过的关联单词中的关联分区的发音,就能以较高的正确机率拼读该陌生单词。
实施例二
在实际应用中,上述的“单词可能性分区组合数据库”可以通过任意合适的方式获得。其中,为了提升所述“单词可能性分区组合数据库”的创建效率,以及,提升该“单词可能性分区组合数据库”中 的分区单元的结构性正确发音概率的可靠性,本发明第二个实施例还提供了一种创建所述单词可能性分区组合数据库的方法。
具体地,请参阅图3,该方法可以包括但不限于如下步骤:
步骤310:创建目标词汇库的分区单元数据库。
在本实施例中,所述“目标词汇库”可以是任意类型的词汇库,比如,其可以为某词典、某学术级别的词汇库、科技文献词汇库等等。特别地,该目标词汇库中包括“单词可能性分区组合数据库”中所记载的所有备选单词。
在本实施例中,所创建的“分区单元数据库”可以作为实施例一所述的“拼读总库”中的其中一个数据库,其内至少记载有目标词汇库中每一个单词对应的全部分区单元(其中,如表3所示,在一些实施例中,为了方便记录和查询分区单元,可以为每一个分区单元配置唯一的拼读代号),每一个所述分区单元对应的字母串和发音代码,以及,包括所述分区单元的单词。从而,根据所述“分区单元数据库”,可以得到分区单元(拼读代号)、字母串、发音代码和包含相同拼读代号的各单词之间的对应关系。
表3
Figure PCTCN2019081628-appb-000028
其中,由于该目标词汇库中包括所述备选单词,因此,这些备选 单词对应的所有分区单元都能够在所创建的“分区单元数据库”中查询到相关的对应关系。所述“单词可能性分区组合数据库”中的每一个分区单元的结构性正确发音概率可基于该“分区单元数据库”计算得到。
在本实施例中,可以首先分别针对目标词汇库中的每一个单词(包括备选单词),确定其对应的若干种可能性分区组合、每种所述可能性分区组合中包括的若干个分区单元以及每个分区单元对应的字母串和发音代码,进而根据这些对应关系创建出该目标词汇库的“分区单元数据库”。其中,在本实施例中,创建目标词汇库的“分区单元数据库”能够便于统计包含相同分区单元或字母串的单词,方便确定每一个分区单元的结构性正确发音概率。
其中,所述“确定单词对应的若干种可能性分区组合、每种所述可能性分区组合中包括的若干个分区单元以及每个分区单元对应的字母串和发音代码”的具体实施方式可以是:
由语言专家根据其经验分别对目标词汇库中的每一个单词进行分割,获得该单词对应的若干种可能性分区组合,同时,提取出每种可能性分区组合中包括的若干个分区单元,并将其记录于如表4所示的对应关系表中。
表4
Figure PCTCN2019081628-appb-000029
或者,在另一些实施例中,为了提升获取这些对应关系的效率,所述“确定单词对应的若干种可能性分区组合、每种所述可能性分区组合中包括的若干个分区单元以及每个分区单元对应的字母串和发音代码”的具体实施方式也可以是:
基于预设好的“基础字母串与基础发音代码对应关系库”,分别针对目标词汇库中的每一个单词,确定其基础分区单元;进而根据其基础分区单元确定其对应的若干种可能性分区组合,每种所述可能性分区组合中包括的若干个分区单元,以及,每个分区单元对应的字母串和发音代码。
其中,该预设好的“基础字母串与基础发音代码对应关系库”可以是实施例一所述的“拼读总库”中的其中一个数据库,其内记载有 所有基础字母串与基础发音代码的对应关系(其中,表5示出了部分“基础字母串-基础发音代码”的对应关系)。从而,在针对每一单词进行“基础字母串-基础发音代码”的匹配时,能够快速得到匹配结果,提升建库效率。
表5
Figure PCTCN2019081628-appb-000030
具体地,在分别针对每一单词,确定如上所述的各种对应关系时,可以首先从目标词汇库中提取出该单词的音节标记(比如,单词“dif-fe-rent”中的“-”即其音节标记),并确定该单词为单音节型单词(不含有音节标记的单词)或者多音节型单词(含有至少一个音 节标记的单词)。
如果该单词为多音节型单词,则需要首先根据单词的音节标记,对该单词的完整字母串写序列进行拆分,得到若干个字母串(比如,对“mon-i-tor”进行拆分,得到字母串“mon”、“i”和“tor”)。然后,基于该单词的完整发音代码以及“基础字母串与基础发音代码对应关系库”,从左到右依次对这些字母串中的字母进行“基础字母串-基础发音代码”的匹配,得到每个字母串对应的发音代码(比如,匹配后得到:
Figure PCTCN2019081628-appb-000031
Figure PCTCN2019081628-appb-000032
)。接着按目标外语的语言学规则和特例情况,检测该单词中是否存在共享音素(以英语为例,在两个相邻的分区单元中,如果前一个分区单元的最后一个字母为辅音字母,而后一个分区单元的首字母为元音字母,则,该辅音字母可以是共享音素),如果存在,则把共享音素复制到其后一个分区单元的首位,以使该分区单元成为一个在听觉上完整的音区,从而得到该单词的基础分区单元以及由这些基础分区单元组成的基础分区组合。(比如,对于“monitor”,“n”即
Figure PCTCN2019081628-appb-000033
Figure PCTCN2019081628-appb-000034
的共享音素,将共享音素“n/n/”复制到
Figure PCTCN2019081628-appb-000035
的首位后,可以得到一个完整音区
Figure PCTCN2019081628-appb-000036
进而可以确定单词“monitor”的基础分区单元包括:
Figure PCTCN2019081628-appb-000037
Figure PCTCN2019081628-appb-000038
这些基础分区单元组成的基础分区组合即“mon-ni-tor:
Figure PCTCN2019081628-appb-000039
”,其中,“n-n”用于表示其为共享音素)。最后,对这些基础分区单元进行合并和拆分,得到该单词的其它可能性分区组合,同时,确定每一种可能性分区组合中包括的分区单元,以及,每一分区单元对应的字母串和发音代码。比如,对
Figure PCTCN2019081628-appb-000040
Figure PCTCN2019081628-appb-000041
合并后可以得到新的分区单元
Figure PCTCN2019081628-appb-000042
其与另一个基础分区单元
Figure PCTCN2019081628-appb-000043
可 以组成新的可能性分区组合“moni-tor:
Figure PCTCN2019081628-appb-000044
”;又如,对
Figure PCTCN2019081628-appb-000045
进行拆分后可以得到新的分区单元“m/m/”和
Figure PCTCN2019081628-appb-000046
这两个分区单元与剩下的基础分区单元
Figure PCTCN2019081628-appb-000047
Figure PCTCN2019081628-appb-000048
也可以组成新的可能性分区组合“m-on-ni-tor:
Figure PCTCN2019081628-appb-000049
”。
如果该单词为传统语言学上定义的单音节型单词,则不需根据单词的音节标记,对该单词的完整字母串写序列进行拆分,其完整字母串写序列-完整发音代码即为其基础分区单元,也为其的其中一个可能性分区组合。在对该单词的基础分区单元进行拆分时,可以直接基于该单词的完整发音代码以及“基础字母串与基础发音代码对应关系库”,从左到右依次对这些字母串中的字母进行“基础字母串-基础发音代码”的匹配,得到每个字母串对应的发音代码。例如单词“they”,其基础分区单元即:
Figure PCTCN2019081628-appb-000050
对该基础分区单元进行拆分后,可以得到新的分区单元:
Figure PCTCN2019081628-appb-000051
Figure PCTCN2019081628-appb-000052
这些新的分区单元可以组成新的可能性分区组合:“th-ey:
Figure PCTCN2019081628-appb-000053
”。由此,单词“they”的各可能性分区组合包括:
Figure PCTCN2019081628-appb-000054
和“th-ey:
Figure PCTCN2019081628-appb-000055
”。
在本实施例中,基于基础分区单元获取单词的各种可能性分区组合,能够避免因为跨音区而造成发音混乱的问题。
此外,在又一些实施例中,所述“基础字母串与基础发音代码对应关系库”也可以仅预设好一部分最基本、最常见的“基础字母串-基础发音代码”,在创建如表4所示的对应关系的过程中,逐步完善该“基础字母串与基础发音代码对应关系库”。
在该实施例中,创建如表4所示的对应关系的过程与上述实施例所描述的方式大体相同,其不同之处在于:
(1)、为了能够由简单到复杂地进行“基础字母串-基础发音代码” 识别,在该实施例中,可以首先提取出目标词汇库中每个单词对应的完整字母串写序列和完整音标串(即,完整发音代码),并且,按照“字母数量和音标数量从少至多”以及“字母数量与音标数量之差由小至大”等等的规则对目标词汇库中的单词进行排序。比如,该顺序可以为:“1字母-1音标”、“2字母-2音标”、“2字母-1音标”、“3字母-3音标”、“3字母-2音标”、“3字母-1音标”…。然后再根据该排序,依次确定每个单词中包含的“基础字母串-基础发音代码”,以及,每个单词对应的若干种可能性分区组合,每种可能性分区组合中包括的分区单元,以及每个分区单元对应的字母串和发音代码。
(2)、需要首先基于记载在“基础字母串与基础发音代码对应关系库”中的对应关系,识别出每个单词中包含的“基础字母串-基础发音代码”,并将新增的“基础字母串-基础发音代码”同步更新到“基础字母串与基础发音代码对应关系库”。
具体地,针对单音节型单词,比如,their,可以根据如下步骤确定其包含的“基础字母串-基础发音代码”:
(A1)、创造出该单词的各种疑似可能性对比组合。在本实施例中,可以根据预设的“对照排列模板”对单音节型单词的完整字母串写序列进行分割,生成若干种疑似可能性对比组合,该疑似可能性对比组合的各个分区代表一种基础字母串。比如,5个字母串数(12345)的对照排列模板为:
首先是包含最多5个字母:12345;
然后是包含最多4个字母:1234+5,1+2345;
其次是包含最多3个字母:123+45,123+4+5,12+345,1+234+5,1+2+345;
其次是包含最多2个字母:12+34+5,12+3+45,12+3+4+5,1+23+45,1+23+4+5,1+2+34+5,1+2+3+45;
最后是包含最多1个字母:1+2+3+4+5。
举例来说,假设该单词为“their”,那么,根据上述对照排列模板,可以得到其疑似可能性对比组合包括:their、thei+r、t+heir、the+ir、the+i+r、th+eir、t+hei+r、t+h+eir、th+ei+r、th+e+ir、th+e+i+r、t+he+ir、t+he+i+r、t+h+ei+r、t+h+e+ir以及t+h+e+i+r。
(A2)、删除不符合音素分割规则的组合,获得该单词的对比组合。
在本实施例中,可以根据语言特性设置若干个基础字母串-基础发音代码的匹配规则,用于删除不符合音素分割规则的疑似可能性对比组合。比如,其匹配规则可以包括但不限于:
a)、若没有特殊情况,基础字母串总数=基础发音代码总数。
比如“their”对应的发音代码为
Figure PCTCN2019081628-appb-000056
其基础发音代码为
Figure PCTCN2019081628-appb-000057
Figure PCTCN2019081628-appb-000058
总数为2个,那么,其基础字母串总数也应该为2个,从而可以删除不符合该规则的疑似可能性对比组合后,剩下的对比组合仅包括:“thei+r”,“t+heir”,“the+ir”和“th+eir”。
b)、不拆分带有预设标记的基础字母串。
在本实施例中,可以对“基础字母串与基础发音代码对应关系库”中的部分基础字母串(比如,ch、th等)设置不拆分标记。比如,表5中的“*”,即表示,当存在“th”时,不将“t”和“h”拆分开来。从而,基于该匹配规则,可以删除组合“t+heir”。
c)、满足基础音节分类规则。
其中,所述音节分类规则是根据目标外语的语言学规则进行音节分类,例如英语的6音节分类法、德语的长短元音分类法等等。比如, 在英语单词“their”中,单词结构是典型的“辅音-元音-辅音(C-V-C)”闭音节,而且刚好字尾是字母“r”,需要顺从英语6音节分类中的“r-控制型元音(R-controlled Syllable Types)”。所以,元音需与后面的“r”合并为一组。在3个余下对比组合中,就只有“th+eir”满足该规则。而有些外语的分割规则比英语的开闭音节分割原则更加简洁,以德语为例,两个元音之间只有一个辅音,辅音跟后面元音构成音节。
(A3)、按照步骤(A1)中的排序,基于剩下的对比组合与“基础字母串与基础发音代码对应关系库”中记载的对应关系进行匹配,确定该单词中包含的“基础字母串-基础发音代码”。
比如,经过步骤(A2)后,“their”的对比组合只剩下“th+eir”,而通过查询“基础字母串与基础发音代码对应关系库”,可以找到
Figure PCTCN2019081628-appb-000059
的对应关系,但找不到
Figure PCTCN2019081628-appb-000060
的对应关系,从而可以将
Figure PCTCN2019081628-appb-000061
作为新增的对应关系更新到“基础字母串与基础发音代码对应关系库”。
其中,在本实施例中,倒序排列各疑似可能性对比组合,是为了不与前级的“基础字母串-基础发音代码”混淆在一起,先保证在最多字母数量的“基础字母串-基础发音代码”中找不到匹配的对应关系,才一级级地往前一级里找。
而针对多音节型单词,比如“early”,可以根据如下步骤确定该单词中的“基础字母串-基础发音代码”的对应关系:
(B1)、根据单词的音节标记,对该单词进行音节拆分。
比如,单词“early”对应的完整发音代码为
Figure PCTCN2019081628-appb-000062
字母类的音节标记位于“ear”和“ly”之间,从而,可以对该单词拆分为“ear”和“ly”两个音节。
(B2)、把每个拆分后的音节,逐一以单音节型单词的方式(即上述步骤(A1)至(A3))确定其包含的“基础字母串-基础发音代码”对应关系。
其中,本步骤的具体实施方式与上述步骤(A1)至(A3)大致相同。比如,通过查询“基础字母串与基础发音代码对应关系库”,可以得到
Figure PCTCN2019081628-appb-000063
的对应关系,以及“l-/l/”“y-/i/”的对应关系,从而可以确定单词“early”的基础分区单元包括
Figure PCTCN2019081628-appb-000064
和“ly-/li/”。
而本步骤与上述步骤(A1)至(A3)的不同之处在于:在本步骤中,不会马上将新增的“基础字母串-基础发音代码”更新到“基础字母串与基础发音代码对应关系库”,而是需要等全部的音节都匹配完成后才能验证该新增的“基础字母串-基础发音代码”是否正确,如果正确才将该新增的“基础字母串-基础发音代码”更新到“基础字母串与基础发音代码对应关系库”。
其中,在一些实施例中,如果在进行匹配时,超过一个对比组合,或者,在一个对比组合中存在两个或以上的分区(基础字母串)对应的基础发音代码无法确定(或者说,存在两个或以上新增“基础字母串-基础发音代码”),那么,可以将这些无法确定的“基础字母串-基础发音代码”暂存于“待决定库”,以便专家介入手动选择。
步骤320:分别获取每一个备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元以及每个分区单元对应的字母串和发音代码,记录于“单词可能性分区组合数据库”中。
在本实施例中,确定备选单词对应的若干种可能性分区组合,每 一种所述可能性分区组合中包括的若干个分区单元以及每个分区单元对应的字母串和发音代码的具体实施方式可以参考上述步骤310中相应的描述,此处便不再赘述。
步骤330:分别针对每一个所述备选单词对应的每种可能性分区组合中的每个分区单元,从所述分区单元数据库中提取出包括所述分区单元的第一类单词和包括所述分区单元对应的字母串的第二类单词,并基于所述第一类单词和所述第二类单词确定所述分区单元对应的结构性正确发音概率。
在本实施例中,将同时包括某一分区单元对应的字母串和发音代码的单词称为与该分区单元对应的“第一类单词”,而包括该分区单元对应的字母串的单词称为与该分区单元对应的“第二类单词”(亦即,实施例一中所描述的“关联单词”)。比如,如表3所示,对于分区单元
Figure PCTCN2019081628-appb-000065
其对应的第一类单词为:包括字母串“if”并且该字母串“if”的发音为
Figure PCTCN2019081628-appb-000066
的单词,比如:d ifferent、g ift等;而其对应的第二类单词则为:包括字母串“if”的单词,比如:d ifferent、g ift、l ife、r ifle、un iform、mod ify等。
其中,在本实施例中,可以通过查询上述“分区单元数据库”,确定每个备选单词对应的每种可能性分区组合中的每个分区单元对应的第一类单词和第二类单词。比如,如表2所示,单词“different”的其中一个可能性分区组合“dif-ferent:
Figure PCTCN2019081628-appb-000067
”中包括分区单元
Figure PCTCN2019081628-appb-000068
Figure PCTCN2019081628-appb-000069
通过查询“分区单元数据库”(如表3所示),即可确定分区单元
Figure PCTCN2019081628-appb-000070
对应的第一类单词和第二类单词均包括 difficult、 difficulty等;分区单元
Figure PCTCN2019081628-appb-000071
对应的第一类单词包括dif ferent等,第二类单 词包括dif ferent、dif ferential等。
又,由于本实施例通过分区单元对应的结构性正确发音概率来表征该分区单元中的字母串在各关联单词中根据该发音代码进行发音的概率。因此,在本实施例中,可以分别针对每一个分区单元,基于其对应的第一类单词(同时包括字母串和发音代码)和第二类单词(仅包括字母串)确定其结构性正确发音概率。
具体地,所述“分别针对每一个分区单元,基于其对应的第一类单词和第二类单词确定其结构性正确发音概率”的具体实施方式可以是:分别针对每一个分区单元,统计出所述目标词汇库中该分区单元对应的第一类单词的个数N1以及第二类单词的个数N2;然后基于N1和N2确定该分区单元的结构性正确发音概率(比如:其结构性正确发音概率=N1/N2,或者,其结构性正确发音概率=N1/(N2-N1)等)。
此外,在实际应用中,外语学习者的语言水平或者主要接触的词汇类别有可能会有所差异,不同外语学习者有可能具有不同的词汇量,而在不同的词汇量的情况下,即便是根据相同的方法(比如,如上所述的根据相关单词的个数得到分区单元的结构性正确发音概率)统计得到的分区单元的结构性正确发音概率也有可能会存有差异。比如,对于小学生来说,其词汇量较少,在其所接触的单词中,某字母串X的发音有可能均为/x1/,从而,X/x1/的结构性正确发音概率为100%;然而,对于年级较高的学生来说,其词汇量相对较大,在其所接触的单词中,该字母串X的可能性发音除了/x1/之外还有可能包括/x2/,那么,此时,X/x1/的结构性正确发音概率就不再是100%,甚至有可能低于50%。若直接基于目标词汇库中的所有单词进行分区单元的结构性正确发音概率的计算,就无法针对学生的特性提供更加合适的拼 读分区方案。
基于此,在一些实施例中,为了提升该“单词可能性分区组合数据库”中的分区单元的结构性正确发音概率的可靠性,所述第一类单词、所述第二类单词与所述备选单词属于同一词汇类别。该词汇类别可以包括但不限于:小学词汇、中学词汇、专业词汇、日常用语、旅游用语等等。
进一步地,考虑到外语学习者在学习外语时一般不仅仅是背单词,更多的是通过实际的应用场景(比如,阅读材料、听力材料、新闻娱乐网页等的语言、文字出现的被统计场所)进行语言学习,不同单词在对应的统计场景中出现的次数不一样,在又一些实施例中,所述“分别针对每一个分区单元,基于其对应的第一类单词和第二类单词确定其结构性正确发音概率”的具体实施方式还可以是:
首先,获取所述第一类单词在与所述词汇类别对应的统计场景中出现的次数(记为第一出现次数M1),以及,所述第二类单词在所述统计场景中出现的次数(记为第二出现次数M2);然后,根据所述第一出现次数M1与所述第二出现次数M2确定所述分区单元对应的结构性正确发音概率(比如,结构性正确发音概率=M1/M2,或者,结构性正确发音概率=M1/(M2-M1)等)。
其中,所述“与所述词汇类别对应的统计场景”可以具体为:与所述词汇类别对应的阅读材料、听力材料等。
由此,在该实施例中,通过根据所述第一出现次数M1与所述第二出现次数M2确定所述分区单元对应的结构性正确发音概率,能够更加符合学习者的学习场景,排除和不考虑那些在应用上罕见的词汇对分区单元的结构性正确发音概率的影响,可以进一步提升该“单词可能 性分区组合数据库”中的分区单元的结构性正确发音概率的可靠性。
步骤340:将每一个所述分区单元对应的结构性正确发音概率对应记录于所述单词可能性分区组合数据库。
在本实施例中,在确定了每个分区单元对应的结构性正确发音概率之后,将其对应记录于所述“单词可能性分区组合数据库”,具体为,在确定了某分区单元的结构性正确发音概率之后,将该结构性正确发音概率记录在所述“单词可能性分区组合数据库”中与该分区单元对应的位置处。
通过上述技术方案可知,本实施例提供的创建单词可能性分区组合数据库的方法能够高效地得到目标词汇库中每个单词的若干种可能性分区组合、每种可能性分区组合中包括的分区单元,每个分区单元对应的字母串和发音代码,以及,每个分区单元对应的结构性正确发音概率。
另外,应当理解的是,在上述实施例中,基于分区单元对应的第一类单词和第二类单词确定其对应的结构性正确发音概率仅为其中一种较优的实施方式,在实际应用中,也可以采用其它方式确定分区单元的结构性正确发音概率,其均应落入本发明要求保护的范围。
比如,在一些实施例中,可以无需考虑关联单词,仅根据分区单元的关联分区单元的数量确定该分区单元的结构性正确发音概率。其中,所述“关联分区单元”是指与该分区单元具有相同字母串的分区单元。
具体为,首先确定目标词汇库中所有单词的所有分区单元,并将具有相同字母串的分区单元作为彼此的关联分区单元。当需要确定某一分区单元的结构性正确发音概率时,可以首先确定该分区单元的关 联分区单元的数量Q,然后根据该数量Q确定该分区单元的结构性正确发音概率。比如,其结构性正确发音概率=1/Q。Q越大,说明与该分区单元同字母异音的关联分区单元越多,该分区单元的结构性正确发音概率越低。举例来说,在确定分区单元
Figure PCTCN2019081628-appb-000072
的结构性正确发音概率时,可以确定该分区单元的关联分区单元包括:
Figure PCTCN2019081628-appb-000073
Figure PCTCN2019081628-appb-000074
Figure PCTCN2019081628-appb-000075
数量Q=3,从而,分区单元
Figure PCTCN2019081628-appb-000076
的结构性正确发音概率为1/3;而在确定分区单元
Figure PCTCN2019081628-appb-000077
的结构性正确发音概率时,由于其关联分区单元的数量Q=1,从而,分区单元
Figure PCTCN2019081628-appb-000078
的结构性正确发音概率为1(100%)。
实施例三
图4是本发明实施例提供的一种确定目标单词的拼读分区的装置的结构示意图,请参阅图4,该装置40包括:最佳分区组合确定单元41和拼读分区确定单元42。
其中,最佳分区组合确定单元41用于基于预设的拼读总库,确定接收到的目标单词对应的最佳分区组合,其中,所述拼读总库包括单词最佳分区组合数据库,所述单词最佳分区组合数据库中记载有备选单词以及每一个所述备选单词对应的最佳分区组合,所述目标单词为所述备选单词中的一个;拼读分区确定单元42用于根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区。
在实际应用中,当接收到目标单词时,可以首先通过最佳分区组合确定单元41基于预设的拼读总库,确定该接收到的目标单词对应的最佳分区组合,然后再利用拼读分区确定单元42根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区。其中,所述拼读总库包括单词最佳分区组合数据库,所述单词最佳分区组合数据库中记 载有备选单词以及每一个所述备选单词对应的最佳分区组合,所述目标单词为所述备选单词中的一个,并且所述最佳分区组合中的各分区单元的结构性正确发音概率较高。
其中,在一些实施例中,所述拼读总库还包括:单词可能性分区组合数据库,所述单词可能性分区组合数据库中记载有所述备选单词,每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元,以及,每一个所述分区单元对应的结构性正确发音概率;其中,每一个所述分区单元代表一种字母串和一种发音代码的对应关系;所述单词最佳分区组合数据库中的每一个所述备选单词所对应的最佳分区组合可从所述单词可能性分区组合数据库中筛选得到。
基于此,在一些实施例中,该装置40还包括:结构性正确发音概率提取单元43、筛选单元44以及单词最佳分区组合数据库创建单元45。
其中,结构性正确发音概率提取单元43用于基于所述单词可能性分区组合数据库,确定每一所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元,以及,每一个所述分区单元对应的结构性正确发音概率;
筛选单元44用于分别针对每一个所述备选单词,根据所述备选单词对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合;
单词最佳分区组合数据库创建单元45用于将所述备选单词及其对应的最佳分区组合记录于所述单词最佳分区组合数据库。
具体地,在一些实施例中,所述筛选单元44包括:综合结构性正 确发音概率确定模块441和筛选模块442。
综合结构性正确发音概率确定模块441用于分别针对每一个所述备选单词,根据所述备选单词对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,确定所述备选单词对应的每种可能性分区组合的综合结构性正确发音概率;
筛选模块442用于基于所述备选单词对应的每种可能性分区组合的综合结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合。
其中,在一些实施例中,所述筛选模块442具体用于:筛选出综合结构性正确发音概率满足第一预设条件的可能性分区组合,构成所述备选单词对应的优选可能性分区组合群;根据第二预设条件从所述优选可能性分区组合群中筛选出所述备选单词对应的最佳分区组合。
其中,在又一些实施例中,所述拼读总库还包括:分区单元数据库,所述分区单元数据库中记载有目标词汇库中每一个单词对应的全部分区单元,每一个所述分区单元对应的字母串和发音代码以及包括所述分区单元的单词,其中,所述目标词汇库中包括所述备选单词;所述单词可能性分区组合数据库中的每一个所述分区单元的结构性正确发音概率可基于所述分区单元数据库计算得到。
基于此,在一些实施例中,该装置40还包括:可能性分区组合获取单元46、结构性正确发音概率计算单元47以及单词可能性分区组合数据库创建单元48。
可能性分区组合获取单元46用于分别获取每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括若干个分区单元;
结构性正确发音概率计算单元47用于分别针对每一个所述备选单词对应的每种可能性分区组合中的每个分区单元,从所述分区单元数据库中提取出包括所述分区单元的第一类单词和包括所述分区单元对应的字母串的第二类单词,并基于所述第一类单词和所述第二类单词确定所述分区单元对应的结构性正确发音概率;
单词可能性分区组合数据库创建单元48用于将每一个所述分区单元对应的结构性正确发音概率对应记录于所述单词可能性分区组合数据库。
其中,在一些实施例中,所述第一类单词、所述第二类单词与所述备选单词属于同一词汇类别。
其中,在一些实施例中,结构性正确发音概率计算单元47具体用于:分别针对每一个所述备选单词对应的每种可能性分区组合中的每个分区单元,从所述分区单元数据库中提取出包括所述分区单元的第一类单词和包括所述分区单元对应的字母串的第二类单词;获取所述第一类单词在与所述词汇类别对应的统计场景中出现的次数,记为第一出现次数;获取所述第二类单词在所述统计场景中出现的次数,记为第二出现次数;根据所述第一出现次数和所述第二出现次数确定所述分区单元对应的结构性正确发音概率。
再者,在有一些实施例中,所述拼读总库还包括:基础字母串与基础发音代码对应关系库,所述基础字母串与基础发音代码对应关系库中记载有所有基础字母串及其对应的基础发音代码;所述分区单元数据库可基于所述目标词汇库和所述基础字母串与基础发音代码对应关系库计算得到。
需要说明的是,由于所述确定目标单词的拼读分区的装置与上述 实施例提供的方法基于相同的发明构思,因此,方法实施例一和二中相应的内容及其有益效果同样适用于装置实施例,此处不再详述。
通过上述技术方案可知,本发明实施例的有益效果在于:本发明实施例提供的确定目标单词的拼读分区的装置通过最佳分区组合确定单元41基于预设的拼读总库中的单词最佳分区组合数据库,确定接收到的目标单词对应的最佳分区组合,进而利用拼读分区确定单元42根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区,其中,所确定的最佳分区组合中的分区单元具有较高的结构性正确发音概率,从而能够从拼读结构本质上提升目标单词的正确发音概率。
实施例四
图5是本发明实施例提供的一种电子设备的结构示意图,该电子设备500可以是任意类型的电子设备,如:学习机、智能手机、机器人、个人电脑、中央或云服务器等,能够执行上述方法实施例提供的确定目标单词的拼读分区的方法,或者,运行上述装置实施例提供的确定目标单词的拼读分区的装置。
具体地,请参阅图5,该电子设备500包括:
一个或多个处理器501以及存储器502,图5中以一个处理器501为例。处理器501和存储器502可以通过总线或者其它方式连接,图5中以通过总线连接为例。存储器502作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态性计算机可执行程序以及模块,如本发明实施例中的确定目标单词的拼读分区的方法对应的程序指令/模块(例如,附图4所示的最佳分区组合确定单元41、拼读分区确定单元42、结构性正确发音概率提取单元43、筛选单元44、 单词最佳分区组合数据库创建单元45、可能性分区组合获取单元46、结构性正确发音概率计算单元47以及单词可能性分区组合数据库创建单元48)。处理器501通过运行存储在存储器502中的非暂态软件程序、指令以及模块,从而执行确定目标单词的拼读分区的装置40的各种功能应用以及数据处理,即实现上述任一方法实施例的确定目标单词的拼读分区的方法。
存储器502可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据确定目标单词的拼读分区的装置40的使用所创建的数据等。此外,存储器502可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其它非暂态固态存储器件。在一些实施例中,存储器502可选包括相对于处理器501远程设置的存储器,这些远程存储器可以通过网络连接至电子设备500。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
所述一个或者多个模块存储在所述存储器502中,当被所述一个或者多个处理器501执行时,执行上述任意方法实施例中的确定目标单词的拼读分区的方法,例如,执行以上描述的图1中的方法步骤100至200,图2中的方法步骤110至130,图3中的方法步骤310至340,实现图4中的单元41-48的功能。
本发明实施例还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个处理器执行,例如,被图5中的一个处理器501执行,可使得上述一个或多个处理器执行上述任意方法实施例中的确定 目标单词的拼读分区的方法,例如,执行以上描述的图1中的方法步骤100至200,图2中的方法步骤110至130,图3中的方法步骤310至340,实现图4中的单元41-48的功能。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
通过以上的实施方式的描述,本领域普通技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机程序产品中的计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非暂态计算机可读取存储介质中,该计算机程序包括程序指令,当所述程序指令被电子设备执行时,可使所述电子设备执行上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
上述产品(包括:电子设备、非暂态计算机可读存储介质以及计算机程序产品)可执行本发明实施例所提供的确定目标单词的拼读分区的方法,具备执行确定目标单词的拼读分区的方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本发明实施例所提供的确定目标单词的拼读分区的方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;在本发明的思路下,以上实施例或者不同实施例中的技 术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本发明的不同方面的许多其它变化,为了简明,它们没有在细节中提供;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (12)

  1. 一种确定目标单词的拼读分区的方法,其特征在于,包括:
    基于预设的拼读总库,确定接收到的目标单词对应的最佳分区组合,其中,所述拼读总库包括单词最佳分区组合数据库,所述单词最佳分区组合数据库中记载有备选单词以及每一个所述备选单词对应的最佳分区组合,所述目标单词为所述备选单词中的一个;
    根据所述目标单词对应的最佳分区组合确定所述目标单词的拼读分区。
  2. 根据权利要求1所述确定目标单词的拼读分区的方法,其特征在于,所述拼读总库还包括:单词可能性分区组合数据库,所述单词可能性分区组合数据库中记载有所述备选单词,每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元,以及,每一个所述分区单元对应的结构性正确发音概率;其中,每一个所述分区单元代表一种字母串和一种发音代码的对应关系;
    所述单词最佳分区组合数据库中的每一个所述备选单词所对应的最佳分区组合可从所述单词可能性分区组合数据库中筛选得到。
  3. 根据权利要求2所述确定目标单词的拼读分区的方法,其特征在于,所述方法还包括:基于所述单词可能性分区组合数据库,确定每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元,以及,每一个所述分区单元对应的结构性正确发音概率;
    分别针对每一个所述备选单词,根据所述备选单词对应的每种可 能性分区组合中每个分区单元的结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合;
    将所述备选单词及其对应的最佳分区组合记录于所述单词最佳分区组合数据库。
  4. 根据权利要求3所述确定目标单词的拼读分区的方法,其特征在于,所述分别针对每一个所述备选单词,根据所述备选单词对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合,包括:
    分别针对每一个所述备选单词,根据所述备选单词对应的每种可能性分区组合中每个分区单元的结构性正确发音概率,确定所述备选单词对应的每种可能性分区组合的综合结构性正确发音概率;
    基于所述备选单词对应的每种可能性分区组合的综合结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合。
  5. 根据权利要求4所述确定目标单词的拼读分区的方法,其特征在于,所述基于所述备选单词对应的每种可能性分区组合的综合结构性正确发音概率,筛选出所述备选单词对应的最佳分区组合,包括:
    筛选出综合结构性正确发音概率满足第一预设条件的可能性分区组合,构成所述备选单词对应的优选可能性分区组合群;
    根据第二预设条件从所述优选可能性分区组合群中筛选出所述备选单词对应的最佳分区组合。
  6. 根据权利要求2-5任一项所述确定目标单词的拼读分区的方法,其特征在于,所述拼读总库还包括:分区单元数据库,所述分区单元 数据库中记载有目标词汇库中每一个单词对应的全部分区单元,每一个所述分区单元对应的字母串和发音代码以及包括所述分区单元的单词,其中,所述目标词汇库中包括所述备选单词;
    所述单词可能性分区组合数据库中的每一个所述分区单元的结构性正确发音概率可基于所述分区单元数据库计算得到。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    分别获取每一个所述备选单词对应的若干种可能性分区组合,每一种所述可能性分区组合中包括的若干个分区单元;分别针对每一个所述备选单词对应的每种可能性分区组合中的每个分区单元,从所述分区单元数据库中提取出包括所述分区单元的第一类单词和包括所述分区单元对应的字母串的第二类单词,并基于所述第一类单词和所述第二类单词确定所述分区单元对应的结构性正确发音概率;
    将每一个所述分区单元对应的结构性正确发音概率对应记录于所述单词可能性分区组合数据库。
  8. 根据权利要求7所述确定目标单词的拼读分区的方法,其特征在于,所述第一类单词、所述第二类单词与所述备选单词属于同一词汇类别。
  9. 根据权利要求8所述确定目标单词的拼读分区的方法,其特征在于,所述基于所述第一类单词和所述第二类单词确定所述分区单元对应的结构性正确发音概率,包括:
    获取所述第一类单词在与所述词汇类别对应的统计场景中出现的次数,记为第一出现次数;
    获取所述第二类单词在所述统计场景中出现的次数,记为第二出现次数;
    根据所述第一出现次数和所述第二出现次数确定所述分区单元对应的结构性正确发音概率。
  10. 根据权利要求6所述确定目标单词的拼读分区的方法,其特征在于,所述拼读总库还包括:基础字母串与基础发音代码对应关系库,所述基础字母串与基础发音代码对应关系库中记载有所有基础字母串及其对应的基础发音代码;
    所述分区单元数据库可基于所述目标词汇库和所述基础字母串与基础发音代码对应关系库计算得到。
  11. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1-10任一项所述的方法。
  12. 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使电子设备执行如权利要求1-10任一项所述的方法。
PCT/CN2019/081628 2018-04-28 2019-04-05 一种确定目标单词的拼读分区的方法和电子设备 WO2019205917A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810402172.XA CN109002454B (zh) 2018-04-28 2018-04-28 一种确定目标单词的拼读分区的方法和电子设备
CN201810402172.X 2018-04-28

Publications (1)

Publication Number Publication Date
WO2019205917A1 true WO2019205917A1 (zh) 2019-10-31

Family

ID=64573212

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081628 WO2019205917A1 (zh) 2018-04-28 2019-04-05 一种确定目标单词的拼读分区的方法和电子设备

Country Status (2)

Country Link
CN (1) CN109002454B (zh)
WO (1) WO2019205917A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002454B (zh) * 2018-04-28 2022-05-27 陈逸天 一种确定目标单词的拼读分区的方法和电子设备
CN109376358B (zh) * 2018-10-25 2021-07-16 陈逸天 一种借用历史拼读经验的单词学习方法、装置和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867812A (en) * 1992-08-14 1999-02-02 Fujitsu Limited Registration apparatus for compound-word dictionary
CN101211559A (zh) * 2006-12-26 2008-07-02 国际商业机器公司 用于拆分语音的方法和设备
CN101630457A (zh) * 2009-02-25 2010-01-20 范海涛 一种英语单词切片及教学拼写记忆系列卡
CN101706797A (zh) * 2009-11-24 2010-05-12 无敌科技(西安)有限公司 通过语音查询单词的系统及其方法
CN104239289A (zh) * 2013-06-24 2014-12-24 富士通株式会社 音节划分方法和音节划分设备
CN109002454A (zh) * 2018-04-28 2018-12-14 陈逸天 一种确定目标单词的拼读分区的方法和电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1308908C (zh) * 2003-09-29 2007-04-04 摩托罗拉公司 用于文字到语音合成的方法
CN1883959A (zh) * 2005-06-21 2006-12-27 容毅 英文电子词典数据中单词和音标的压缩方法
CN102176310B (zh) * 2005-12-08 2013-08-21 纽昂斯奥地利通讯有限公司 具有巨大词汇量的语音识别系统
CN104252800B (zh) * 2014-09-12 2017-10-10 广东小天才科技有限公司 一种单词播报评分的方法和装置
JP6641680B2 (ja) * 2014-09-22 2020-02-05 カシオ計算機株式会社 音声出力装置、音声出力プログラムおよび音声出力方法
CN105760356B (zh) * 2016-03-17 2018-10-19 广东小天才科技有限公司 一种英文单词听写题目备选选项自动生成方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867812A (en) * 1992-08-14 1999-02-02 Fujitsu Limited Registration apparatus for compound-word dictionary
CN101211559A (zh) * 2006-12-26 2008-07-02 国际商业机器公司 用于拆分语音的方法和设备
CN101630457A (zh) * 2009-02-25 2010-01-20 范海涛 一种英语单词切片及教学拼写记忆系列卡
CN101706797A (zh) * 2009-11-24 2010-05-12 无敌科技(西安)有限公司 通过语音查询单词的系统及其方法
CN104239289A (zh) * 2013-06-24 2014-12-24 富士通株式会社 音节划分方法和音节划分设备
CN109002454A (zh) * 2018-04-28 2018-12-14 陈逸天 一种确定目标单词的拼读分区的方法和电子设备

Also Published As

Publication number Publication date
CN109002454B (zh) 2022-05-27
CN109002454A (zh) 2018-12-14

Similar Documents

Publication Publication Date Title
US11455542B2 (en) Text processing method and device based on ambiguous entity words
CN108091328B (zh) 基于人工智能的语音识别纠错方法、装置及可读介质
CN106534548B (zh) 语音纠错方法和装置
CN106570180B (zh) 基于人工智能的语音搜索方法及装置
WO2017127296A1 (en) Analyzing textual data
Pedler Computer correction of real-word spelling errors in dyslexic text
JP2006190006A5 (zh)
US9646512B2 (en) System and method for automated teaching of languages based on frequency of syntactic models
CN103914996B (zh) 一种从图片获取文字学习资料的方法和装置
TW200900967A (en) Multi-mode input method editor
US20160055763A1 (en) Electronic apparatus, pronunciation learning support method, and program storage medium
TWI610294B (zh) 語音辨識系統及其方法、詞彙建立方法與電腦程式產品
US8204738B2 (en) Removing bias from features containing overlapping embedded grammars in a natural language understanding system
WO2022267353A1 (zh) 文本纠错的方法、装置、电子设备及存储介质
WO2019205917A1 (zh) 一种确定目标单词的拼读分区的方法和电子设备
CN109524008A (zh) 一种语音识别方法、装置及设备
US20220238039A1 (en) Game-based method for developing foreign language vocabulary learning application
CN114896382A (zh) 人工智能问答模型生成方法、问答方法、装置及存储介质
TWI676167B (zh) 用於分割句子的系統和方法及相關的非暫時性電腦可讀取媒體
EP3185132B1 (en) Method for writing a foreign language in a pseudo language phonetically resembling native language of the speaker
KR100892003B1 (ko) 영어 작문 학습 시스템에서, 자동 철자오류 검출 및교정정보 제공 장치 및 그 방법
US11341961B2 (en) Multi-lingual speech recognition and theme-semanteme analysis method and device
US20160267811A1 (en) Systems and methods for teaching foreign languages
CN109376358B (zh) 一种借用历史拼读经验的单词学习方法、装置和电子设备
RU2722423C1 (ru) Способ подбора слов для создания мнемотехнических словарей

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19792014

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19792014

Country of ref document: EP

Kind code of ref document: A1