CN114675750A - Input method, input device, electronic equipment and storage medium - Google Patents

Input method, input device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114675750A
CN114675750A CN202210308651.1A CN202210308651A CN114675750A CN 114675750 A CN114675750 A CN 114675750A CN 202210308651 A CN202210308651 A CN 202210308651A CN 114675750 A CN114675750 A CN 114675750A
Authority
CN
China
Prior art keywords
target
spelling
sequence
character string
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210308651.1A
Other languages
Chinese (zh)
Inventor
费腾
唐维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210308651.1A priority Critical patent/CN114675750A/en
Publication of CN114675750A publication Critical patent/CN114675750A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0237Character input methods using prediction or retrieval techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses an input method, an input device, electronic equipment and a storage medium, wherein the input method comprises the following steps: acquiring an input character string; if the input character string is a full spelling character string, extracting a target initial consonant sequence and a target vowel sequence from the input character string; matching in a simple spelling tree according to the target initial sequence, and determining a first target path matched with the target initial sequence; one child node in the simple spelling tree corresponds to one initial consonant; a final index associated with each child node is arranged in the simple spelling tree, and the final index indicates a mapping relation between a final sequence and a candidate word set; determining a target candidate word set corresponding to the target final sequence in a final index corresponding to a target child node in the first target path; displaying candidate words in the target candidate word set; the scheme of the application can shorten the time spent on matching and searching and improve the input efficiency.

Description

Input method, input device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an input method, an input device, an electronic device, and a storage medium.
Background
The input method is an encoding method used to input characters to an electronic device. In the application of the input method, a dictionary tree is generally constructed in advance, each child node in the dictionary tree corresponds to one character, and a candidate word set is set for each child node correspondingly. After an input character string input by a user is obtained, matching characters in the input character string in a dictionary tree, and determining a target path matched with the input character string, wherein the character string obtained by character combination of each child node through which the target path passes in the dictionary tree is the input character string. In the related art, part of users prefer to use a full-spelling input mode, that is, the input character string is a full-spelling character string, so that the length of the full-spelling character string is long, and if all characters in the input character string are matched in a dictionary tree, the matching time is long, and further the input efficiency of contents is low.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present application provide an input method, an input apparatus, an electronic device, and a storage medium to improve the foregoing problems.
According to an aspect of an embodiment of the present application, there is provided an input method, including; acquiring an input character string; if the input character string is a full spelling character string, extracting a target initial consonant sequence and a target vowel sequence from the input character string; matching in a simple spelling tree according to the target initial sequence, and determining a first target path matched with the target initial sequence; one child node in the simple spelling tree corresponds to one initial consonant; a final index associated with each child node is arranged in the simple spelling tree, and the final index indicates a mapping relation between a final sequence and a candidate word set; determining a target candidate word set corresponding to the target final sequence in a final index corresponding to a first target child node in the first target path; the first target child node is a child node in the first target path which is farthest away from a root node of the simple spelling tree; and displaying the candidate words in the target candidate word set.
According to an aspect of an embodiment of the present application, there is provided an input apparatus including: the acquisition module is used for acquiring an input character string; the extraction module is used for extracting a target initial consonant sequence and a target vowel sequence from the input character string if the input character string is a full spelling character string; the matching module is used for matching in the simple spelling tree according to the target initial consonant sequence and determining a first target path matched with the target initial consonant sequence; one child node in the simple spelling tree corresponds to one initial consonant; a final index associated with each child node is arranged in the simple spelling tree, and the final index indicates a mapping relation between a final sequence and a candidate word set; a target candidate word set determining module, configured to determine a target candidate word set corresponding to the target final sequence in a final index corresponding to a first target child node in the first target path; the first target child node is a child node in the first target path that is farthest from a root node of the simple spelling tree; and the display module is used for displaying the candidate words in the target candidate word set.
In some embodiments, the input device further comprises: the second matching module is used for matching in the simple spelling tree if the input character string is a simple spelling character string, and determining a second target path matched with the input character string; the first candidate word set determining module is used for determining a first candidate word set corresponding to a second target child node in the second target path; the second target child node is a child node in the second target path that is farthest from the root node of the simple spelling tree; and the second display module is used for displaying the candidate words in the first candidate word set.
In some embodiments, the input device further comprises: the first construction module is used for constructing the simply spliced tree; in this embodiment, the first building block includes: a first history input word acquiring unit configured to acquire a plurality of first history input words; the first full spelling character string determining unit is used for determining a first full spelling character string corresponding to each first historical input word; the first history vowel sequence determining unit is used for extracting vowels of the first full spelling character string corresponding to each first history input word to obtain a corresponding first history vowel sequence; a first history initial consonant sequence determining unit, configured to perform initial consonant extraction on a first full-spelling character string corresponding to each of the first history input words, so as to obtain a corresponding first history initial consonant sequence; a first determining unit, configured to determine, according to each first historical initial sequence, an initial corresponding to each child node in the simple spelling tree; a final index determining unit, configured to determine a final index corresponding to each child node in the simple spelling tree according to each first history input word and a first history final sequence corresponding to the first history input word; and the first association storage unit is used for associating and storing the initial consonants corresponding to the child nodes in the simple spelling tree and the final indexes corresponding to the child nodes.
In some embodiments, the input device further comprises: the history input character string acquisition unit is used for acquiring a plurality of history input character strings of a user within a set time length; a preference input mode determining unit configured to determine a preference input mode of the user from the plurality of history input character strings; and if the preference input mode of the user is a simple spelling input mode, turning to a first construction module.
In some embodiments, the input device further comprises: the second construction module is used for constructing a full spelling tree if the preference input mode of the user is a full spelling input mode; and one child node in the full spelling tree corresponds to one syllable, and a candidate word set is associated with each child node.
In some embodiments, the input device further comprises: the syllable sequence acquisition module is used for acquiring a syllable sequence if the input character string is a full spelling character string and the loaded dictionary tree is the full spelling tree, wherein the syllable sequence is obtained by segmenting the input character string; a third target path determining module, configured to perform syllable matching in the full-spelling tree, and determine a third target path matching the syllable sequence; a second candidate word set determining module, configured to determine a second candidate word set associated with a third target child node in the third target path, where the third target child node is a child node in the third target path that is farthest from a root node of the full-spelling tree; and the third display module is used for displaying the candidate words in the second candidate word set.
In some embodiments, the input device further comprises: a candidate full-spelling character string set determining module, configured to determine a candidate full-spelling character string set corresponding to the input character string if the input character string is a simple spelling character string and the loaded dictionary tree is the full-spelling tree; a fourth target path determining module, configured to perform syllable matching on each candidate full-spelling character string in the candidate full-spelling character string set in the full-spelling tree, and determine a fourth target path matched with each candidate full-spelling character string; a third candidate word set determining module, configured to determine a third candidate word set corresponding to a fourth target child node in each fourth target path, where the fourth target child node is a child node in the corresponding fourth target path that is farthest from the root node of the full-spelling tree; a fourth candidate word set determining module, configured to determine a fourth candidate word set according to a third candidate word set corresponding to a fourth target child node in each fourth target path; and the fourth display module is used for displaying the candidate words in the fourth candidate word set.
In some embodiments, the second building block comprises: a second history input word acquiring unit configured to acquire a plurality of second history input words; a second full spelling character string determining unit, configured to determine a second full spelling character string corresponding to each of the second historical input words; a second determining unit, configured to determine, according to the second full spelling strings, a syllable corresponding to each child node in the full spelling tree; a third determining unit, configured to determine, according to a second full-spelling character string corresponding to each second historical input word, a candidate word set corresponding to each child node in the full-spelling tree; and the second association storage unit is used for performing association storage on the syllables corresponding to the sub-nodes in the full spelling tree and the corresponding candidate word sets.
In some embodiments, the extraction module comprises: a syllable sequence obtaining unit, configured to obtain a syllable sequence obtained by segmenting the input character string; a target initial sequence determining unit, configured to extract initial consonants in each syllable in the syllable sequence, and combine the extracted initial consonants according to an arrangement order of the syllables in the syllable sequence to obtain the target initial sequence; and the target final sequence determining unit is used for extracting the final in each syllable in the syllable sequence and combining the extracted final according to the arrangement sequence of the syllables in the syllable sequence to obtain the target final sequence.
In some embodiments, the input device further comprises: and the type identification module is used for identifying the type of the input character string to obtain a type identification result, and the type identification result is used for indicating that the input character string is a full spelling character string or a simple spelling character string.
In some embodiments, a display module, comprises: the use frequency acquisition unit is used for acquiring the use frequency of the user for each candidate word in the target candidate word set; the sorting unit is used for sorting the candidate words in the target candidate word set according to the sequence of the use frequency from high to low to obtain a target sorting; and the sequencing display unit is used for sequencing and displaying the candidate words in the target candidate word set according to the targets.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the input method as described above.
According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor, implement an input method as described above.
According to an aspect of embodiments herein, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement the input method as described above.
According to the scheme, a simple spelling tree is constructed, a child node in the simple spelling tree corresponds to an initial consonant, each child node is provided with a related vowel index, the vowel index indicates a mapping relation between a vowel sequence and a candidate word set, on the basis, if an input character string is a full spelling character string, a target initial sequence and a target vowel sequence are extracted from the input character string, the target initial sequence is matched in the simple spelling tree, a first target path matched with the target initial sequence is determined, then, in a vowel index corresponding to a first target child node in the first target path, a target candidate word set corresponding to the target vowel sequence is determined, and candidate words in the target candidate word set are displayed. It can be seen that, in the scheme of the application, the initials in the target initial sequence corresponding to the input character string are matched in the simple spelling tree, and the candidate words are retrieved by combining a mode of searching the target final sequence in the final index corresponding to the child node, without retrieving and matching all characters in the input character string which is the full spelling character string in the simple spelling tree. Therefore, compared with a mode that all characters in the input character string are subjected to matching retrieval in the dictionary tree in the related art, the scheme of the application can greatly shorten the time spent on matching retrieval, improve the efficiency of retrieving candidate words and the input efficiency of phrases for the input character string, and further improve the user experience.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1A is a schematic diagram illustrating an application scenario of the present application according to an embodiment of the present application.
FIG. 1B is a schematic diagram of an input interface shown according to an embodiment of the application.
FIG. 2 is a flow diagram illustrating an input method according to one embodiment of the present application.
Fig. 3 is a schematic diagram of a simplified mosaic tree according to an embodiment of the present application.
FIG. 4 is a flowchart illustrating steps subsequent to step 210 according to one embodiment of the present application.
Fig. 5 is a flowchart illustrating a process of constructing a puzzle tree according to an embodiment of the present application.
FIG. 6 is a schematic diagram of a full-spelling tree according to an embodiment of the present application.
Fig. 7 is a flowchart illustrating steps subsequent to step 210 according to another embodiment of the present application.
Fig. 8 is a flowchart illustrating steps subsequent to step 210 according to another embodiment of the present application.
FIG. 9 is a flow diagram illustrating building a full mosaic tree according to an embodiment of the present application.
Fig. 10 is a flow chart illustrating an input method according to another embodiment of the present application.
FIG. 11 is a block diagram illustrating an input device according to an embodiment of the present application.
FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It should be noted that: reference herein to "a plurality" means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Before making the detailed description, the terms referred to in the present application are explained as follows:
syllable: the Chinese character is the minimum structural unit in the voice, and for the Chinese character, the Chinese character is the pinyin of the Chinese character, the Chinese character is generally a voice unit formed by spelling initial consonants and final consonants, and a single final consonant can also be a syllable, such as 'a'.
Full spelling character string: a string of one or more complete syllables, such as "nihao".
Simple spelling character string: it means that there is a character string of an incomplete syllable, such as "nh".
A dictionary tree, also called Trie tree, is a quick retrieval multi-branch tree structure, where each node holds a character and a path represents a character string.
Fig. 1A is a schematic diagram illustrating an application scenario of the present application according to an embodiment of the present application. As shown in fig. 1A, the application scenario includes a terminal 101 and a server 102, and the terminal 101 and the server 102 are communicatively connected through a wired network or a wireless network. The terminal 101 may be a smartphone, tablet, laptop, desktop, in-vehicle terminal, payment terminal, and other device that can run an input method application, or run other applications that can invoke an input method application (e.g., an instant messaging application, a shopping application, a search application, a gaming application, a reading application, a mapping application, etc.).
The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. The server 102 may be used to provide services for input applications run by the terminal 101.
The terminal 101 may report a plurality of first history input words input by the user history to the server 102, and thus the server 102 may construct the abbreviated spelling tree according to the plurality of first history input words. And synchronizes the constructed shorthand tree to the input method application run by the terminal 101.
In the process that the terminal 101 runs the input method application, the terminal 101 responds to the character input operation of a user to obtain an input character string, searches a candidate word set corresponding to the input character string in the simple spelling tree according to the scheme of the application, and displays the determined candidate word set in an input interface of the terminal, so that the user can select one candidate word from the candidate word set as the input content corresponding to the input character string.
Fig. 1B is a schematic diagram of an input interface according to an embodiment of the present application, and as shown in fig. 1B, the input interface 103 includes a character input area 1031, a character display area 1032, and a candidate word display area 1033, where the character input area 1031 is used to display a virtual keyboard, and the character display area 1032 is used to display a splitting sequence obtained by splitting an input character string, for example, in the case shown in fig. 1B, the input character string is "nihao", and the splitting sequence displayed in the character display area 1032 is "ni' hao". The candidate word display area 1033 is used to display candidate words in the candidate word set determined for the input character string, for example, the candidate word corresponding to the input character string "nihao" may be "hello", "do-it-yourself", and the like in fig. 1B.
Of course, in other embodiments, a simple spelling tree may also be constructed on the terminal, and the slave terminal may implement the method of the present application based on the constructed simple spelling tree.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 2 is a flowchart illustrating an input method according to an embodiment of the present application, which may be executed by a computer device with processing capability, such as a terminal device, for example, a smart phone, a tablet computer, a notebook computer, a kiosk, and the like, and is not limited in detail herein. Referring to fig. 2, the method includes at least steps 210 to 240, which are described in detail as follows:
step 210, obtaining an input character string.
The user can input characters through the keyboard or the virtual keyboard to obtain an input character string. The method can be applied to input method applications, or can call other applications (such as instant messaging applications, search applications and the like) of the input method applications, and based on the input method applications, a user can obtain an input character string by inputting coded characters on a keyboard (or a virtual keyboard), and then match and determine a candidate word corresponding to the input character string according to the input character string. It will be appreciated that the characters in the input string are encoded characters on a keyboard or virtual keyboard.
And step 220, if the input character string is a full spelling character string, extracting a target initial consonant sequence and a target vowel sequence from the input character string.
Prior to step 220, the method further comprises: and performing type recognition on the input character string to obtain a type recognition result, wherein the type recognition result is used for indicating that the input character string is a full spelling character string or a simple spelling character string.
The method can be applied to Chinese pinyin input methods. As described above, a pinyin string refers to a string of one or more complete syllables. For example, for a text "do you have a meal", the corresponding full spelling string is: nichifanlema.
The abbreviated spelling string refers to a string in which incomplete syllables exist, and may be a string composed entirely of incomplete syllables or a mixed string composed of complete syllables and incomplete syllables. The incomplete syllable may be an abbreviation character of the syllable, and the abbreviation character of a syllable may be an initial in the syllable or a first letter of the initial of the syllable. The pinyin of chinese includes 21 initial consonants, including single-character initial consonants, such as b, p, m, f, d, t, n. When the initial is a multi-character, the initial may be an abbreviated character of the syllable, or the initial of the initial may be an abbreviated character of the syllable. For example, for the syllable "zhuang", the corresponding abbreviated character can be the initial "zh" or the initial "z" of the initial.
In some embodiments, the input character string may be segmented in units of character segment units representing one word, resulting in a segmentation sequence comprising a plurality of sequentially arranged character segment units. The character segment unit may be an abbreviated character corresponding to the syllable, such as the initial in the syllable of the word, or the first letter of the initial in the syllable of the word. The character segment units may also be syllables. After a segmentation sequence corresponding to an input character string is obtained, counting the total number of character segment units in the segmentation sequence, the number of character segment units which are abbreviated characters and the number of character segment units which are syllables, then calculating a first ratio of the number of the character segment units which are syllables to the number of all the character segment units in the segmentation sequence, and if the first ratio is greater than a first threshold value, determining that the input character string is a full-spelling character string; and if the first ratio is not larger than the first threshold value, determining that the input character string is the abbreviated character string.
Of course, in other embodiments, a second ratio of the number of the character segment units of the abbreviated character to the number of all the character segment units in the segmentation sequence may be counted in a similar manner, and then the input character string is determined to be the abbreviated character string or the full-spelling character string according to the second ratio. The type recognition of the input character string is realized through the first ratio and the second ratio.
In other embodiments, on the basis of the segmentation sequence, if it is determined that the segmentation sequence includes a character segment unit that is an abbreviated character, the input character string is determined to be an abbreviated character string, and otherwise, if it is determined that the character segment units in the segmentation sequence are all syllables, the input character string is determined to be a full-spelling character string.
In some embodiments, the input character string may be segmented by a character segmentation model, which outputs a segmented segmentation sequence based on the input character string. Furthermore, the character segmentation model can further identify the type of each character segment unit in the segmentation sequence, wherein the type of each character segment unit comprises an initial consonant type, an initial consonant type and a syllable type, namely, if the type of a character segment unit is the initial consonant type, the character segment unit is represented as the initial consonant; if the type of a segment unit is the initial type, it means that the segment unit is the initial of the initial, and if the type of a segment unit is the syllable type, it means that the segment unit is the complete syllable.
The character segmentation model may be constructed by one or more neural networks, such as a convolutional neural network, a fully-connected neural network, a cyclic neural network, a (bidirectional, unidirectional) long-term memory neural network, and the like, and is not particularly limited herein.
In order to ensure the accuracy of the character segmentation model for the segmentation sequence output by the input character string, the character segmentation model needs to be trained in advance. Specifically, the character segmentation model can be trained through the following processes: acquiring a plurality of sample character strings and marking information corresponding to each sample character string, wherein the marking information corresponding to the sample character strings is used for indicating the segmentation position of the sample character strings; inputting the sample character string into a character segmentation model, segmenting the sample character string by the character segmentation model, and obtaining a prediction segmentation sequence corresponding to the sample character string, wherein the prediction segmentation sequence indicates a predicted segmentation position aiming at the sample character string; calculating a first model loss according to the marking information corresponding to the sample character string and the prediction segmentation position indicated by the prediction segmentation sequence; and reversely adjusting parameters of the character segmentation model according to the model loss until a training end condition is reached.
The loss function can be set for the character segmentation model in advance, the function value of the loss function is calculated according to the label information corresponding to the sample character string and the prediction segmentation position indicated by the prediction segmentation sequence, and the obtained function value is the first model loss. The set loss function may be a cross entropy loss function, a mean square error loss function, an absolute value loss function, or the like, and is not particularly limited herein.
The sample character string can be manually segmented by taking the character segment unit as a unit to obtain an actual segmentation sequence corresponding to the sample character string, and then the marking information corresponding to the sample character string is determined based on the segmentation position indicated by the actual segmentation sequence. It will be appreciated that to ensure the training effect, the sample strings may include a concatemer sample string of the concatemer type and a full-speller sample string of the full-speller type.
Further, if the character segmentation model is required to further identify the type of the segmented character segment unit on the basis of segmenting the character string, the labeling information corresponding to the sample character string also comprises the type of each character unit in the actual segmentation sequence obtained by actually segmenting the sample character string. After the sample character string is input into the character segmentation model, the character segmentation model performs type recognition on each character segment unit in the prediction segmentation sequence on the basis of segmenting the sample character string, and outputs a type recognition result of the character segment unit in the prediction segmentation sequence corresponding to the sample character string, wherein the type recognition result is used for indicating the type (syllable type, initial consonant type or syllable type) of the corresponding character segment unit.
In this case, the second model loss may be calculated based on the type of each character unit in the actual segmentation sequence indicated by the label information corresponding to the sample character string and the type recognition result of the character segment unit in the predicted segmentation sequence, and then the total loss is calculated according to the first model loss and the second model loss; and reversely adjusting the parameters of the character segmentation model according to the total loss until the training end condition is reached. The training end condition may be that the iteration number of the character segmentation model reaches a number threshold, or the prediction accuracy of the character segmentation model reaches an accuracy threshold, or the loss function of the character segmentation model converges.
In the case where the input character string is a full-spelling character string, the input character string is composed of complete syllables, and the target initial sequence is a sequence obtained by combining initial consonants extracted from each syllable in the input character string in the order of arrangement of the syllables in the input character string. Similarly, the target final sequence is a sequence obtained by combining the finals extracted from the syllables in the input character string according to the arrangement sequence of the syllables in the input character string.
In some embodiments, step 220 comprises: obtaining a syllable sequence, wherein the syllable sequence is obtained by segmenting an input character string; extracting initial consonants in each syllable in the syllable sequence, and combining the extracted initial consonants according to the arrangement sequence of the syllables in the syllable sequence to obtain a target initial sequence; extracting the vowels in each syllable in the syllable sequence, and combining the extracted vowels according to the arrangement sequence of the syllables in the syllable sequence to obtain the target vowel sequence.
If the input character string is a full spelling character string, a character segment unit in the segmentation sequence corresponding to the input character string corresponds to a syllable, and at the moment, the segmentation sequence is the syllable sequence.
For syllables, the syllable generally comprises an initial consonant and a final, in which case, the initial consonants of each syllable in the syllable sequence are sequentially extracted and combined to obtain a target initial consonant sequence; and sequentially extracting the vowels in each syllable in the syllable sequence, and combining to obtain the target vowel sequence. For example, if the slicing sequence is: ni/chi/fan/le/ma, then the target initial sequence extracted from it is: n/ch/f/l/m, wherein the final sequence of the target final sequence extracted from the n/ch/f/l/m is as follows: i/i/an/e/a.
It is worth mentioning that, since the syllable sequence indicates the position of each syllable in the syllable sequence, in order to ensure that the position of each initial in the target initial sequence extracted from the syllable sequence is consistent with the position of the syllable from which the initial is derived in the target initial sequence, and to avoid the occurrence of mistaking the multi-character initial as the single-character initial, the position of the initial indicated by the target initial sequence in the target initial sequence is consistent with the position of the syllable from which the initial is derived in the syllable sequence. Similarly, the position of the final indicated by the target final sequence in the target final sequence is consistent with the position of the syllable from which the final originates in the syllable sequence.
Since only the final and no initial may be included in a part of the syllable, in this case, the initial in the syllable may be represented by a null character, and all the characters in the syllable may be used as the final in the syllable. For example, assume with "? If the input character string is a null character, the corresponding segmentation sequence of the input character string is as follows: an/quant/di/yi, then the target initial sequence extracted from it is: is there a Q/d/y; the sequence of the target vowel extracted from the Chinese character is as follows: an/a/i/i. By adding the null character in the target initial sequence, the positions of the initial consonants in the target initial sequence and the final consonants in the same syllable are kept consistent.
Step 230, matching in the simple spelling tree according to the target initial sequence, and determining a first target path matched with the target initial sequence; one child node in the simple spelling tree corresponds to one initial consonant; simple spelling trees are provided with final indexes associated with the child nodes, and the final indexes indicate the mapping relation between a final sequence and a candidate word set.
The simple spelling tree is an ordered tree in a tree structure and comprises a root node and a plurality of child nodes extending from the root node, wherein the root node corresponds to a null character string, and one child node corresponds to one initial consonant. In the application, the child node is also correspondingly provided with an associated vowel index. In the simple spelling tree, a vowel index corresponding to a child node may include one or more vowel sequences, and a vowel sequence is provided with a candidate word set.
In the abbreviated spelling tree, a character sequence obtained by combining the initials corresponding to the child nodes passed by the path from the root node to a certain child node (assumed to be the child node N) is referred to as an initial sequence corresponding to the child node N.
In the present application, a syllable sequence of a candidate word in a candidate word set corresponding to a vowel sequence (assumed to be a vowel sequence a1) in a vowel index corresponding to a child node (assumed to be a child node N) in a simple spelling tree is a full spelling character string obtained by correspondingly combining the initial sequence corresponding to the child node N and the vowel sequence a 1.
For example, if the initial consonant sequence corresponding to the child node N in the simple spelling tree is "N/h", the final index corresponding to the child node N includes a final sequence "i/ao", the initial consonant sequence "N/h" and the final sequence "i/ao" are correspondingly combined according to the position, and the obtained full spelling character string is "ni/hao". Then the candidate word in the candidate word set corresponding to the final index "i/ao" in the final index corresponding to the child node N is the word whose corresponding syllable sequence is "ni/hao", such as the word of hello, your number, fitting, etc.
Fig. 3 is a schematic structural diagram of a simple-spelled tree according to an embodiment of the present application, and as shown in fig. 3, the simple-spelled tree includes multiple levels of child nodes, in fig. 3, a first node 310 is a root node in the simple-spelled tree, and other nodes except the root node in the simple-spelled tree are called child nodes in the simple-spelled tree, that is, a first level node extending from the first node 310, a second level node extending from the first level node, and the like are child nodes in the simple-spelled tree.
In fig. 3, the primary node extending from the first node includes a child node corresponding to the initial consonant x and a child node corresponding to the initial consonant n, the secondary node extending from the child node corresponding to the initial consonant x includes a child node corresponding to the initial consonant x, and the secondary node extending from the child node corresponding to the initial consonant n includes a child node corresponding to the initial consonant h. It can be seen that, in fig. 3, the initial sequence corresponding to the child node corresponding to the initial h in the secondary node is: n/h.
FIG. 3 further shows the first vowel index 320 corresponding to the child node of the initial h in the secondary node. As shown in fig. 3, the first vowel index 320 includes vowel sequences (e.g., a/a, i/ao, i/e.. in fig. 3) and a candidate word set corresponding to each vowel sequence. In fig. 3, which and na are candidate words in the candidate word set corresponding to the final sequence a/a; the best, the your number and the best fit are candidate words in a candidate word set corresponding to a vowel sequence i/ao; and you are the candidate words in the candidate word set corresponding to the vowel sequence i/e. It can be seen that the initial consonant sequence n/h and the final sequence a/a are combined with the initial consonant and the final at the same position, and the obtained syllable sequence is as follows: na/ha, which is the syllable sequence corresponding to the candidate word (e.g. which word, Naha) in the candidate word set corresponding to the final sequence a/a in FIG. 3.
In some embodiments, for each child node in the simple spelling tree, the candidate word sets corresponding to all the vowel sequences in the vowel index corresponding to the child node are stored in consecutive memory locations, so that a target physical storage location of a first candidate word can be stored, and storage locations of other candidate words after the first candidate word are represented by offset locations relative to the target physical storage location. And all candidate words in the candidate word set corresponding to the same final sequence are stored in a memory with continuous memory positions, so that the candidate word set corresponding to each final sequence in the final index corresponding to the child node can be further determined based on the offset position of each candidate word. In the same final index, for each final sequence, each final sequence may be associated with an offset position corresponding to the first candidate in the candidate set corresponding to the final sequence, so that a candidate stored at an offset position from the offset position associated with one final sequence (assumed to be the final sequence T1) to the offset position associated with the next final sequence is a candidate in the candidate set corresponding to the final sequence T1.
Continuing with the example of fig. 3, for the child node corresponding to the initial h in the secondary node, in the final index, which is the first candidate word in the final index corresponding to the child node, on the basis of setting the target physical storage location of the phrase which is the "word", the offset location of the child node relative to the target physical storage location is set to be 0, and on the basis, the offset locations of the subsequent "which is the" which is the second node, is the second level; the offset positions for "you and" you drink "are 13 and 14, respectively. Setting a bias position associated with a vowel sequence 'a/a' to be 0, setting a bias position associated with a vowel sequence 'i/ao' to be 10, and setting a bias position associated with a vowel sequence 'i/e' to be 13, determining that a candidate word with a corresponding bias position of 0-9 is a candidate word in a candidate word set corresponding to the vowel sequence 'a/a'; and the candidate words with the corresponding bias positions of 10-12 are candidate words in the candidate word set corresponding to the vowel sequence i/ao.
In this application, for the sake of distinction, a path in the spell tree that matches the target initial sequence is referred to as a first target path.
In the process of matching in the simple spelling tree based on the target initial sequence, starting from a root node in the simple spelling tree, searching a child node (assumed as a child node N1) corresponding to the first initial in the target initial sequence from a first-level child node in the simple spelling tree; then, in the secondary node (or called sub-tree corresponding to the child node N1) extending from the child node N1, finding the child node (assumed as child node N2) corresponding to the second initial in the target initial sequence; then, in a third-level node (or called a sub-tree corresponding to the child node N2) extended from the child node N2, a child node corresponding to the third initial consonant in the target initial sequence is searched, and so on until a child node (assumed as a child node Nk) corresponding to the last initial consonant in the target initial sequence is found in the shorthand tree; a path sequentially formed from the root node, the child node N1, and the child node N2.. the child node Nk is referred to as a path matching the target initial sequence.
Step 240, determining a target candidate word set corresponding to a target final sequence in a final index corresponding to a first target child node in a first target path; the first target child node is a child node in the first target path that is farthest from the root node of the concierge tree.
As described above, the matched first target path is equivalent to a path formed by arranging child nodes matched with the initials of the target initial sequence from near to far from the root node of the simple spelling tree. The first target child node also corresponds to the last child node in the first target path. After the first target path is determined, a first target child node in the first target path may be correspondingly determined, and then a final index corresponding to the first target child node in the first target path is determined. Because each subnode is associated with a final index, and the final index indicates the mapping relation between the final sequence and the candidate word set, on the basis, the target final sequence is matched with the final sequence in the final index corresponding to the first target subnode, and further, the final sequence matched with the target final sequence is determined, and the candidate word set corresponding to the final sequence matched with the target final sequence is correspondingly determined. In the application, for convenience of distinguishing, a candidate word set corresponding to a final sequence matched with a target final sequence in a final index corresponding to a first target child node in a first target path is called a target candidate word set corresponding to the target final sequence.
Specifically, the final sequence matching the target final sequence may be a final sequence identical to the target final sequence. Thus, in step 240, based on the target final sequence, the position of the target final sequence may be located in the final index corresponding to the first target child node in the first target path, and further, the candidate word set corresponding to the target final sequence may be determined as the target candidate word set. The target candidate word set includes one or more words, where a word may include one or more words, which is not specifically limited herein.
Step 250, displaying the candidate words in the target candidate word set.
In some embodiments, a display order of the candidate words in the target candidate word set may be set, and then the candidate words in the target candidate word set may be displayed in the display order. The display order may be determined randomly, or may be determined according to the usage frequency of each candidate word for the entire users (or users currently logged in).
In some embodiments, step 250 comprises: acquiring the use frequency of a user for each candidate word in a target candidate word set; sorting the candidate words in the target candidate word set according to the sequence of the use frequency from high to low to obtain a target sorting; and displaying the candidate words in the target candidate word set according to the target sequence. Based on the displayed candidate words, the user may select one of the displayed candidate words as input content, e.g., enter the selected candidate word into a chat interface, a document, a search box.
According to the scheme, a simple spelling tree is constructed, wherein a child node in the simple spelling tree corresponds to an initial consonant, each child node is provided with an associated vowel index, the vowel index indicates a mapping relation between a vowel sequence and a candidate word set, on the basis, if an input character string is a full spelling character string, a target initial sequence and a target vowel sequence are extracted from the input character string, the target initial sequence is matched in the simple spelling tree, a first target path matched with the target initial sequence is determined, then, in the vowel index corresponding to the first target child node in the first target path, a target candidate word set corresponding to the target vowel sequence is determined, and candidate words in the target candidate word set are displayed. It can be seen that, in the scheme of the application, the initials in the target initial sequence corresponding to the input character string are matched in the simple spelling tree, and the candidate words are retrieved by combining a mode of searching the target final sequence in the final index corresponding to the child node, without retrieving and matching all characters in the input character string which is the full spelling character string in the simple spelling tree. Therefore, compared with a mode that all characters in the input character string are subjected to matching retrieval in the dictionary tree in the related art, the scheme of the application can greatly shorten the time spent on matching retrieval, improve the efficiency of retrieving candidate words and the input efficiency of phrases for the input character string, and further improve the user experience. Particularly, under the condition that the length of the input character string is long, the time spent on matching and searching can be greatly shortened by adopting the scheme of the application.
In some embodiments, after step 210, as shown in fig. 4, the method further comprises:
and step 410, if the input character string is a simple spelling character string, matching in the simple spelling tree, and determining a second target path matched with the input character string.
In some embodiments, if the input character string is directly used as the matching object, as described above, one node in the simple spelling tree corresponds to one initial, and in the matching process, a character string formed by combining the initial corresponding to the passed child nodes from the root node in the simple spelling tree is used as the path of the input character string, and is determined as the second target path matched with the input character string.
Before matching an input character string, obtaining a segmentation sequence corresponding to the input character string, wherein it can be understood that, under the condition that the input character string is a simple spelling character string, each character segment unit in the corresponding segmentation sequence may be a consonant, and at this time, the segmentation sequence corresponding to the input character string may be called a consonant sequence. For example, if the input string is "ncflm", its corresponding initial sequence may be "n/c/f/l/m". Therefore, the initial consonants in the initial consonant sequence are sequentially extracted in the matching process, and the child nodes corresponding to the initial consonants are searched in the simple spelling tree.
In some embodiments, a certain segment unit in the segmentation sequence corresponding to the input character string may only represent the first letter of a certain initial, in which case, the segmentation sequence may be complemented, that is, the segment unit in the initial sequence that may represent the first letter of the initial is complemented (for convenience of description, the sequence obtained by performing initial completion on the initial sequence corresponding to the input character string is referred to as the first initial sequence). For example, the initial letter "c" in the input character string with the initial letter "n/c/f/l/m" may represent the initial letter of the initial letter "ch", and therefore, the first initial letter sequence "n/ch/f/l/m" is obtained by complementing the initial letter "c". It will be appreciated that the input string may include a plurality of characters that may represent the initial of the initial, and that the first string resulting from the completion of the initial may be a plurality.
On the basis, the initial consonant sequence corresponding to the input character string and the first initial consonant sequence obtained by performing initial consonant complementation on the initial consonant sequence can be used as matching objects to be matched in the simple spelling tree, the character string formed by initial combination corresponding to the path child node from the root node in the simple spelling tree is the path of the input character string, and the character string formed by initial combination corresponding to the path child node is the path of the first initial sequence and is used as a second target path matched with the input character string.
In some embodiments, as described above, although it is determined that the input character string is an abbreviated character string, the input character string may include partial syllables, in this case, the character segment unit that is a syllable in the segmentation sequence may be replaced with a initial in the syllable based on the segmentation sequence corresponding to the input character string (for convenience of description, the segmentation sequence after replacement is referred to as a third initial sequence), for example, if the segmentation sequence corresponding to the input character string is "n/ch/fan/l/m", the third initial sequence obtained by replacement is "n/ch/f/l/m". And then, taking the third initial consonant sequence as a matching object, matching in the simple spelling tree, and determining a path of the third initial consonant sequence as a path of the third initial consonant sequence, wherein the path is formed by combining the initial consonants corresponding to the path child nodes from the root node in the simple spelling tree.
Step 420, determining a first candidate word set corresponding to a second target child node in a second target path, where the second target child node is a child node in the second target path that is farthest from the root node of the simple spelling tree.
Similarly, the second target child node corresponds to the last child node in the second target path. As described above, the final index associated with the second target child node in the second target path may include one or more final sequences, one final sequence corresponds to one candidate word set, and a set formed by candidate word sets corresponding to all final sequences of the final index associated with the last child node in the second target path is used as the first candidate word set.
Step 430, displaying the candidate words in the first candidate word set.
Similarly, in step 430, the candidate words in the first candidate word set may be sorted according to the usage frequency of each candidate word in the first candidate word set from high to low, and the candidate words in the first candidate word set may be displayed according to the determined sorting.
In some embodiments, if the determined second target paths matching the input character string are multiple, the candidate words in the first candidate word set corresponding to the second target child nodes in each of the second target paths may be sorted respectively.
It can be seen from the above embodiment that, based on the constructed simple spelling tree, the method not only can be suitable for phrase retrieval and matching under the condition that the input character string is the full spelling character string, but also can be suitable for indexing under the condition that the input character string is the simple spelling character string, thereby the scheme of the application can be suitable for scenes of various input modes, and moreover, the matching efficiency can be ensured, and further the input efficiency is ensured.
In some embodiments, the method further comprises: constructing a simple splicing tree; specifically, as shown in fig. 5, the step of constructing the simple spelling tree includes:
step 510, a plurality of first history input words are obtained.
In the present application, the history input word used for constructing the abbreviated spelling tree is referred to as a first history input word. The plurality of first historical input words can be words input by a plurality of users in history, so that the source breadth of the first historical input records is ensured, and the coverage of the constructed simple spelling tree is ensured.
In other embodiments, the plurality of first historical input words may be phrases from the same user historical input, so that the simplified spelling tree can be specifically constructed for the user based on the plurality of first historical input records from the same user, and the constructed simplified spelling tree is ensured to conform to the input habit of the user.
Step 520, determining a first full spelling character string corresponding to each first historical input word.
For the sake of distinction, the syllable sequence corresponding to the first history input word is referred to as a first full spelling string. Specifically, the syllables of the characters in the first historical input word are determined, and the syllables of the characters are combined according to the arrangement sequence of the characters in the first historical input word, so that a full spelling character string corresponding to the first historical input word is obtained, and the full spelling character string can also be understood as a syllable sequence corresponding to the first historical input word.
Step 530, extracting the vowels of the first full spelling character string corresponding to each first history input word to obtain a corresponding first history vowel sequence.
Specifically, the first full spelling character string comprises syllables of all characters of the first history input word, on the basis, finals are extracted from the syllables of all the characters in the first full spelling character string, and the extracted finals are combined according to the arrangement sequence of the syllables in the first full spelling character string, so that the first history final sequence corresponding to the first history input word is obtained.
And 540, performing initial consonant extraction on the first full-spelling character string corresponding to each first historical input word to obtain a corresponding first historical initial sequence.
Similarly, initial consonants are extracted from syllables of each character in the first full-spelling character string, and the extracted initial consonants are combined according to the arrangement sequence of the syllables in the first full-spelling character string, so that a first history initial consonant sequence corresponding to the first history input word is obtained.
Step 550, determining the initial corresponding to each child node in the simple spelling tree according to the first history initial sequence corresponding to each first history input word.
Specifically, a root node in the simple spelling tree may be predetermined, and then, for a first history initial sequence corresponding to each first history input word, a child node (assumed as child node M1) is created as a next-level child node of the root node, and a first initial in the first history initial sequence is taken as an initial corresponding to child node M1; then, a child node (assumed as child node M2) is created as the next-level child node of child node M1, and the second initial consonant in the first history initial consonant sequence is taken as the initial consonant corresponding to child node M2; then, a child node (assumed as child node M3) is created as the next-level child node of the child node M2, and the third initial consonant in the first history initial consonant sequence is taken as the initial consonant corresponding to the child node M3; .., and so on, after creating the child node corresponding to the last initial in the first historical initial sequence, repeating the above process, creating other child nodes in the simplified spelling tree based on other first historical initial sequences, and determining the initial corresponding to the other child nodes in the simplified spelling tree.
And step 560, determining the vowel index corresponding to each child node in the simple spelling tree according to each first history input word and the first history vowel sequence corresponding to the first history input word.
Through the above step 550, the child node corresponding to each initial in the first history initial sequence corresponding to each first history input word in the abbreviated spelling tree can be determined, and on this basis, for the child node corresponding to the last initial in the first history initial sequence corresponding to the first history input word, the first history final sequence corresponding to the first history input word is used as a final sequence in the final index of the child node, and the first history input word is added to the candidate word set corresponding to the first history final sequence. Repeating the above process, a candidate word set corresponding to each first history vowel sequence can be constructed, and then adding the first history vowel sequence and the corresponding candidate word set to the vowel index corresponding to the corresponding child node. Through the process, the final index corresponding to each child node in the simple spelling tree can be constructed.
Step 570, storing the initial consonants corresponding to the child nodes in the simple spelling tree and the final index corresponding to the child nodes in an associated manner.
Specifically, the node identifier corresponding to each child node, the initial consonant corresponding to the child node, and the final index corresponding to the child node may be stored in an associated manner.
The construction of the abbreviated tree can be realized through the steps 510-570 as above. Further, the simple spelling tree can be pre-loaded before step 220, so that in step 220, matching can be performed based on the simple spelling tree.
In some embodiments, after the simple spelling tree is constructed through the steps 510-570, the simple spelling tree can be updated by using the subsequent input words of the user, for example, the child nodes in the newly added simple spelling tree, the vowel sequences in the vowel indexes corresponding to the newly added child nodes, the candidate word set corresponding to the updated vowel sequences, and the like. Therefore, the simple spelling tree is updated according to the input record of the user, and the simple spelling tree is ensured to be adaptive to the input habit of the user.
In a specific embodiment, the simple spelling tree can be constructed and updated at a terminal or a server. And if the simple spelling tree is constructed at the server, the server issues the constructed simple spelling tree to a terminal for running the input method application, so that the simple spelling tree is loaded in advance when the terminal needs to run the input method application. Similarly, after the simplified spelling tree is updated at the server, the simplified spelling tree is correspondingly synchronized to the terminal for running the input method application.
In some embodiments, before constructing the shorthand tree, the method further comprises: acquiring a plurality of historical input character strings of a user within a set time length; determining a preference input mode of a user according to a plurality of historical input character strings; and if the preference input mode of the user is the simple spelling input mode, executing the step of constructing the simple spelling tree.
Specifically, type recognition may be performed on each historical input character string, so as to determine a type recognition result of each historical input character string, where the type recognition result is used to indicate that the corresponding historical input character string is a simple pinyin character string or a full pinyin character string. Then, the user's preferred input mode is determined by counting the number of input strings for the simple pinyin (assumed as the first number) and the number of input strings for the full pinyin (assumed as the second number) among the plurality of history input strings.
In some embodiments, a first ratio between the first number and the total number of the historical input strings may be calculated, and if the first ratio is greater than a set first ratio threshold, the user's preferred input mode is determined to be the abbreviated input mode. Further, a second ratio between the second number and the total number of the historical input character strings can be calculated, and if the second ratio is larger than a set second ratio threshold, the preference input mode of the user is determined to be a full spelling input mode.
Namely, the dictionary tree is constructed in a targeted manner according to the preference input mode of the user, and if the preference input mode of the user is the simplified spelling input mode, the simplified spelling tree is constructed. On the contrary, if the preference input mode of the user is a full spelling input mode, a full spelling tree is constructed; one sub-node in the full spelling tree corresponds to one syllable, and each sub-node is provided with a candidate word set in an associated mode.
FIG. 6 is a schematic diagram of a full-spelling tree according to an embodiment of the present application. The full-spelling tree is also an ordered tree with a tree structure, and comprises a root node and a plurality of levels of child nodes extending from the root node, wherein the root node in the full-spelling tree corresponds to a null character string, one child node corresponds to a syllable, and each child node is correspondingly provided with a candidate word set. After all candidate words in the candidate word set corresponding to a child node are subjected to syllable conversion, the obtained syllable sequence is a syllable sequence obtained by combining syllables corresponding to all byte points on a path from a root node in the full spelling tree to the child node.
The second node 610 in FIG. 6 is the root node in the full-spelled tree, and as shown in FIG. 6, the first level of child nodes in the full-spelled tree include child nodes corresponding to the syllable "a", the syllable "ni", the syllable "zuo", and the syllable "ba", respectively.
Fig. 6 also shows the child node corresponding to the syllable "ai" from which the syllable "ni" extends and the candidate word set corresponding to the child node corresponding to the syllable "ai", which includes the words of love, or being close to you. It can be seen that the syllable sequence corresponding to each candidate word in the candidate word set is "ni/ai", and the syllable sequence is a sequence obtained by combining syllables corresponding to all child nodes on a path from a root node in the full-spelling tree to the child node corresponding to the syllable "ai".
FIG. 6 further shows the child node corresponding to the syllable "hao" from which the syllable "ni" extends and the candidate word set corresponding to the child node corresponding to the syllable "hao", which includes the phrases of hello, your number, goodlike, etc. It can be seen that the syllable sequence corresponding to each candidate word in the candidate word set is "ni/hao", and the syllable sequence is a sequence obtained by combining syllables corresponding to all child nodes on a path from a root node in the full-spelling tree to a child node corresponding to the syllable "hao".
In some embodiments, after step 210, as shown in fig. 7, the method further comprises:
step 710, if the input character string is a full-spelling character string and the loaded dictionary tree is a full-spelling tree, obtaining a syllable sequence, wherein the syllable sequence is obtained by segmenting the input character string.
It can be understood that, in the case that the input character string is a full pinyin character string, each character segment unit in the input character string is a syllable, and the segmentation sequence corresponding to the input character string is a syllable sequence of the input character string.
And 720, performing syllable matching in the full spelling tree, and determining a third target path matched with the syllable sequence.
In this application, for the sake of distinction, the path in the spell tree that matches the syllable sequence is referred to as a third target path.
In the matching process, starting from the root node in the full-spelling tree, searching a child node (assumed as child node P1) corresponding to the first syllable in the syllable sequence in the first-level child nodes of the full-spelling tree; then, in the secondary node (or called sub-tree corresponding to the sub-node P1) extending from the sub-node P1, the sub-node (assumed as the sub-node P2) corresponding to the second syllable in the syllable sequence is searched; then, in the three-level node (or called sub-tree corresponding to the sub-node P2) extended from the sub-node P2, the sub-node corresponding to the third syllable in the syllable sequence is searched; .... so on until the child node (assumed to be child node Px) corresponding to the last syllable in the syllable sequence is found in the full pinyin tree; a path formed sequentially from the root node, the child node P1, and the child node P2.. the child node Px is referred to as a path matching the syllable sequence.
Step 730, determining a second candidate word set associated with a third target child node in the third target path, where the third target child node is a child node in the third target path that is farthest from the root node of the full-spelling tree.
As described above, each child node in the full-spelling tree is associated with a candidate word set. On the basis of determining the third target path, a candidate word set (i.e., a second candidate word set) associated with a third target child node in the third target path may be correspondingly determined, where the third target child node corresponds to a last child node in the third target path.
Step 740, displaying the candidate words in the second candidate word set.
Similarly, in step 740, the usage frequency of each candidate word in the first candidate word set is determined, then the candidate words in the first candidate word set are sorted according to the order of the usage frequency from high to low, and then the candidate words in the second candidate word set are displayed according to the determined order.
Through the above steps 710 and 740, under the condition that the input character string is the full spelling character string and the full spelling tree is loaded, matching is performed in the full spelling tree to determine the second candidate word set corresponding to the input character string.
In some embodiments, after step 210, as shown in fig. 8, the method further comprises:
step 810, if the input character string is a simple spelling character string and the loaded dictionary tree is a full spelling tree, determining a candidate full spelling character string set corresponding to the input character string.
In some embodiments, alternative syllable sets may be constructed in advance for each initial, and the initial in the syllable of the alternative syllable set corresponding to an initial is the initial corresponding to the alternative syllable set. On the basis, if the input character string is a simple spelling character string, the segmentation sequence corresponding to the input character string is corresponding to a consonant sequence, so that the selectable syllable set corresponding to each consonant in the consonant sequence can be determined according to the consonants included in the consonant sequence corresponding to the input character string, then, a syllable is selected from the selectable syllable sets corresponding to each consonant, the selected syllables are combined according to the arrangement sequence of the consonants in the consonant sequence, and the syllable sequence obtained by combination is used as a candidate full spelling character string corresponding to the input character string. And then continuing to select syllables from the selectable syllable set corresponding to each initial consonant respectively, combining the syllables according to the same mode to obtain a plurality of candidate full-spelling character strings corresponding to the input character string, and calling a set formed by all the candidate full-spelling character strings corresponding to the obtained input character string as a candidate full-spelling character string set corresponding to the input character string.
In other embodiments, syllable conversion may be performed on the historical phrases based on a plurality of historical abbreviated pinyin character strings input historically and the historical phrases selected under each historical abbreviated pinyin character string to obtain a historical syllable sequence corresponding to the historical phrase; and then, based on the historical syllable sequence corresponding to the historical phrase, performing initial extraction to obtain a historical initial sequence. On the basis of obtaining the historical syllable sequences corresponding to a plurality of historical phrases and the corresponding initial sequences, adding the initial sequences into an initial sequence set, and constructing a syllable sequence set corresponding to each initial sequence in the initial sequence set, wherein the syllable sequence in the syllable sequence set corresponding to one initial sequence is subjected to initial extraction to obtain a sequence, namely the initial sequence.
On this basis, if it is determined that the input character string is a simple pinyin character string, the initial sequence corresponding to the simple pinyin character string may be located in the initial sequence set, and then, the syllable sequence set corresponding to the located initial sequence is used as a candidate full-pinyin character string set corresponding to the input character string.
And step 820, performing syllable matching on each candidate full-spelling character string in the candidate full-spelling character string set in the full-spelling tree, and determining a fourth target path matched with each candidate full-spelling character string.
In this application, for the sake of distinction, a path in the spell tree that matches the candidate spell string is referred to as a fourth target path. It can be understood that the candidate full-pinyin character string is a candidate full-pinyin syllable sequence, and therefore syllable matching is performed sequentially in the full-pinyin tree based on the arrangement order of the syllables in the candidate full-pinyin character string.
In the matching process, for each candidate full-spelling character string in the candidate full-spelling character string set, starting from a root node in the full-spelling tree, searching a child node (assumed as child node Q1) corresponding to the first syllable in the candidate full-spelling character string in a first-level child node of the full-spelling tree; then, in the secondary node (or called sub-tree corresponding to the sub-node Q1) extending from the sub-node Q1, the sub-node (assumed as the sub-node Q2) corresponding to the second syllable in the candidate full-spelling string is searched; then, in the three-level node (or called sub-tree corresponding to the sub-node Q2) extended from the sub-node Q2, searching the sub-node corresponding to the third syllable in the candidate full-spelling character string; .... so on until the child node (assumed to be child node Qy) corresponding to the last syllable in the candidate full-pinyin string is found in the full-pinyin tree; a path formed sequentially from the root node, the child node Q1, and the child node Q2.. the child node Qy is referred to as a path matching the candidate full pinyin character string.
Step 830, determining a third candidate word set corresponding to a fourth target child node in each fourth target path; the fourth target child node is the child node which is farthest from the root node of the full-spelling tree in the corresponding fourth target path.
For a candidate full-spelling string, a fourth target path matching the candidate full-spelling string may be determined in the full-spelling tree. Because the full-spelling tree is correspondingly provided with the candidate word sets corresponding to the child nodes, on the basis of determining the fourth target path corresponding to each candidate full-spelling character string, the candidate word set corresponding to the fourth target child node in the fourth target path in the full-spelling tree is used as the third candidate word set; similarly, the fourth target child node in a fourth target path is equivalent to the last child node in the fourth target path.
Step 840, determining a fourth candidate word set according to the third candidate word set corresponding to the fourth target child node in each fourth target path.
And when the fourth target path is one, taking a third candidate word set corresponding to the last child node of the fourth target path as a fourth candidate word set. And under the condition that the fourth target path is multiple, taking a set formed by a third candidate word set corresponding to the last child node in all the fourth target paths as a fourth candidate word set.
Step 850, displaying the candidate words in the fourth candidate word set.
Similarly, in step 850, the display order of the candidate words in the fourth candidate word set may be determined, and then the candidate words in the fourth candidate word set may be displayed according to the determined display order. Specifically, the display order of the candidate words in the fourth candidate word set may be randomly determined, or the candidate words in the fourth candidate word set may be sorted according to the order from high use frequency to low use frequency, and the sorting result is used as the corresponding display order.
Through the steps 810 and 850 as above, on the basis of loading the full spelling tree, if the input character string is the simple spelling character string, the candidate word set corresponding to the input character string is determined based on the simple spelling character string and the full spelling tree.
In some embodiments, as shown in FIG. 9, constructing a full mosaic tree comprises:
step 910, a plurality of second history input words are obtained.
In the present application, the history input word used to construct the spell tree is referred to as a second history input word. The history input words used for constructing the full spelling tree and the history input words used for constructing the simple spelling tree may be completely the same, completely different, or only partially the same, and are not specifically limited herein. The plurality of second history input words may be collected for the same user or may be collected for a plurality of users, as the first history input word.
Step 920, determining a second full spelling character string corresponding to each second historical input word.
And for each second historical input word, determining syllables of each character in the second historical input word, and combining the syllables of the character according to the arrangement sequence of each character in the second historical input word to obtain a syllable sequence corresponding to the second historical input word.
Step 930, determining syllables corresponding to each child node in the full spelling tree according to the plurality of second full spelling character strings.
Specifically, the root node in the full-spelling tree may be predetermined, and then, for each second full-spelling character string, first, a child node (assumed as child node T1) is created as a next-level child node of the root node, and the first syllable in the second full-spelling character string is taken as a syllable corresponding to child node T1; then, a sub-node (assumed as the sub-node T2) is created as the next level sub-node of the sub-node T1, and the second syllable in the second full-spelling string is used as the syllable corresponding to the sub-node T2; then, a sub-node (assumed as the sub-node T3) is created as the next level sub-node of the sub-node T2, and the third syllable in the second full-spelling string is taken as the syllable corresponding to the sub-node T3; .., and so on, after creating the child node corresponding to the last syllable in the second spell string, repeating the above process, creating other child nodes in the spell tree based on the other second spell strings, and determining syllables corresponding to the other child nodes in the spell tree.
And 940, determining a candidate word set corresponding to each child node in the full spelling tree according to the second full spelling character string corresponding to each second historical input word.
Through the above step 930, the child node corresponding to each syllable in the second full-spelling character string corresponding to each second historical input word in the full-spelling tree can be determined, and on this basis, for the child node corresponding to the last syllable in each second historical input word, the second historical input word is added to the candidate word set corresponding to the child node corresponding to the last syllable in the second historical input word. Repeating the above process can construct a candidate word set corresponding to each child node.
And step 950, performing associated storage on the syllables corresponding to the child nodes in the full spelling tree and the corresponding candidate word sets.
Specifically, the node identifier corresponding to each child node, the syllable corresponding to the child node, and the candidate word set corresponding to the child node may be stored in an associated manner.
Construction of the full mosaic tree may be accomplished by steps 910 and 950 as described above. Furthermore, the full spelling tree is convenient to load in the subsequent process, and candidate words corresponding to the input character string are inquired and determined based on the full spelling tree.
In a specific embodiment, the full-spelling tree can be constructed and updated at the terminal or the server. If the full spelling tree is constructed at the server side, the server side issues the constructed full spelling tree to a terminal for running the input method application, so that the full spelling tree is loaded in advance when the terminal needs to run the input method application. And in the same way, after the full spelling tree is updated at the server, the full spelling tree is correspondingly synchronized to the terminal for running the input method application.
Fig. 10 is a flowchart illustrating an input method according to an embodiment of the present application, as shown in fig. 10, the method including:
and step 1010, initializing. The initialization is mainly to initialize the kernel of the input method application to load the word stock in the input method application, such as a system word stock, a user word stock, and the like. The user lexicon may include historical phrases historically input by the user.
Step 1020, identifying a preference input mode of a login user; if the preference input mode of the login user is a simple spelling input mode, executing step 1031; if the preferred input mode of the login user is the full spelling input mode, go to step 1032. The process of specifically identifying the preference input mode of the login user is described above, and is not described herein again.
And step 1031, constructing a simple splicing tree.
Step 1040, obtain the input character string.
Step 1051, identifying whether the input character string is a simple spelling character string; if the input character string is a simple spelling character string, matching in a simple spelling tree, and determining a candidate word set corresponding to the input character string; if the input string is a full pinyin string, step 1061 is performed.
Step 1061, matching in a simple spelling tree according to a target initial consonant sequence corresponding to an input character string, and determining a first target path; the target initial consonant sequence corresponding to the input character string is obtained by performing initial consonant extraction on the input character string and then combining the initial consonant sequence. The process of matching in the abbreviated spelling tree based on the target initial sequence is described above, and is not described herein again.
Step 1071, according to the target vowel sequence corresponding to the input character string, performing a secondary query in the vowel index corresponding to the last child node in the first target path. The second query of the input character string based on the target final sequence corresponding to the input character string means that the target final sequence is positioned in the final index corresponding to the last child node in the first target path, and then the candidate word set corresponding to the target final sequence in the final index is used as the candidate word set corresponding to the input character string.
Step 1032, constructing a full-spelling tree; thereafter, steps 1040 and 1052 are performed.
Step 1052, identifying whether the input character string is a full spelling character string; if the input character string is a full spelling character string, matching in a full spelling tree, and determining a candidate word set corresponding to the input character string; if the input string is a concatemer string, go to step 1062.
Step 1062, determining the candidate full-spelling syllable sequence set corresponding to the input character string.
Step 1072, traverse the candidate full-spelling syllable sequence set, and match each candidate full-spelling syllable sequence in the full-spelling tree.
Based on the scheme, the dictionary tree is constructed according to the preference input mode of the user, namely the simplified spelling tree is constructed if the user prefers to use the simplified spelling input mode, and the full spelling tree is constructed if the user prefers to use the full spelling input mode. It can be understood that, if the preferred input mode of the user is a simple spelling input mode, the probability that the input character string input by the user is the simple spelling character string is high, so that the input character string is directly matched in the simple spelling tree, a candidate word set corresponding to the input character string can be quickly determined, the speed of candidate word retrieval can be greatly improved, and the user experience of input method application is improved; of course, even if the preferred input mode of the user is a simple spelling input mode, the input character string input by the user can be a full spelling character string, and in this case, because the related final index is constructed for each child node in the simple spelling tree, matching and final filtering in the simple spelling tree based on the full spelling character string are supported, so that the speed of candidate word query is ensured.
Similarly, if the preferred input mode of the user is a full-spelling input mode, the probability that the input character string input by the user is the full-spelling character string is high, so that the input character string of the full-spelling input character string is directly matched in the full-spelling tree based on the full-spelling tree constructed for the user, a candidate word set corresponding to the input character string can be quickly determined, the candidate word retrieval rate can be greatly improved, and the user experience of the input method application is improved; certainly, even if the preferred input mode of the user is a full-spelling input mode, the input character string input by the user can be a simple spelling character string, under the condition, a candidate full-spelling character string set corresponding to the input character string is determined firstly, then all candidate full-spelling character strings are matched in a full-spelling tree, and candidate word query based on the simple spelling character string in the full-spelling tree can be realized.
Therefore, the scheme of the application can support the search of the candidate words in the full spelling input mode and the simple spelling input mode, and can adapt to the full spelling input scene and the simple spelling input scene.
In some embodiments of the application, a simple spelling tree and a full spelling tree can be constructed, and the simple spelling tree and the full spelling tree are loaded when the input method is applied and operated, so that if the input character string is a simple spelling character string, the input character string is matched in the simple spelling tree to determine a candidate word set; and if the input character string is a full spelling character string, matching the input character string in a full spelling tree to determine a candidate word set. In this way, the input character string is matched with the corresponding dictionary tree, and therefore, the efficiency of candidate word retrieval can be further improved. However, since two sets of dictionary trees need to be loaded at the same time, the memory usage is large, and therefore, the scheme of the embodiment can be applied to a scene where the memory usage is not considered, for example, a scene where the running memory of the terminal itself is large.
Embodiments of the apparatus of the present application are described below, which may be used to perform the methods of the above-described embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the above-described embodiments of the method of the present application.
Fig. 11 is a block diagram of an input device according to an embodiment of the application, and as shown in fig. 11, the input device includes: an obtaining module 1110, configured to obtain an input character string; the extracting module 1120 is configured to extract a target initial consonant sequence and a target vowel sequence from the input character string if the input character string is a full pinyin character string; a matching module 1130, configured to perform matching in the simple spelling tree according to the target initial sequence, and determine a first target path matching the target initial sequence; one child node in the simple spelling tree corresponds to one initial consonant; simple spelling trees are provided with final indexes associated with the child nodes, and the final indexes indicate the mapping relation between a final sequence and a candidate word set; a target candidate word set determining module 1140, configured to determine a target candidate word set corresponding to a target final sequence in a final index corresponding to a first target child node in a first target path, where the first target child node is a child node in the first target path that is farthest from a root node of the simple spelling tree; a display module 1150, configured to display candidate words in the target candidate word set.
In some embodiments, the input device further comprises: the second matching module is used for matching in the simple spelling tree if the input character string is a simple spelling character string, and determining a second target path matched with the input character string; the first candidate word set determining module is used for determining a first candidate word set corresponding to a second target child node in a second target path, wherein a fourth target child node is a child node which is farthest from a root node of the full-spelling tree in the corresponding fourth target path; and the second display module is used for displaying the candidate words in the first candidate word set.
In some embodiments, the input device further comprises: the first construction module is used for constructing a simply spliced tree; in this embodiment, the first building block includes: a first history input word acquisition unit configured to acquire a plurality of first history input words; the first full spelling character string determining unit is used for determining a first full spelling character string corresponding to each first historical input word; the first history vowel sequence determining unit is used for extracting vowels of the first full spelling character string corresponding to each first history input word to obtain a corresponding first history vowel sequence; the first history initial consonant sequence determining unit is used for extracting initial consonants of the first full spelling character string corresponding to each first history input word to obtain a corresponding first history initial consonant sequence; the first determining unit is used for determining initial consonants corresponding to the child nodes in the simple spelling tree according to the first historical initial consonant sequences; a final index determining unit, configured to determine a final index corresponding to each child node in the simple spelling tree according to each first history input word and a first history final sequence corresponding to the first history input word; and the first association storage unit is used for associating and storing the initial consonants corresponding to the child nodes in the simple spelling tree and the final indexes corresponding to the child nodes.
In some embodiments, the input device further comprises: the history input character string acquisition unit is used for acquiring a plurality of history input character strings of a user within a set time length; a preference input mode determining unit for determining a preference input mode of the user based on the plurality of history input character strings; and if the preference input mode of the user is a simple spelling input mode, turning to the first construction module.
In some embodiments, the input device further comprises: the second building module is used for building a full spelling tree if the preference input mode of the user is a full spelling input mode; one sub-node in the full spelling tree corresponds to one syllable, and each sub-node is provided with a candidate word set in an associated mode.
In some embodiments, the input device further comprises: the syllable sequence acquisition module is used for acquiring a syllable sequence if the input character string is a full-spelling character string and the loaded dictionary tree is a full-spelling tree, wherein the syllable sequence is obtained by segmenting the input character string; the third target path determining module is used for carrying out syllable matching in the full spelling tree and determining a third target path matched with the syllable sequence; the second candidate word set determining module is used for determining a second candidate word set associated with a third target child node in a third target path, wherein the third target child node is a child node in the third target path, and the child node is farthest from a root node of the full-spelling tree; and the third display module is used for displaying the candidate words in the second candidate word set.
In some embodiments, the input device further comprises: the candidate full-spelling character string set determining module is used for determining a candidate full-spelling character string set corresponding to the input character string if the input character string is a simple spelling character string and the loaded dictionary tree is a full-spelling tree; the fourth target path determining module is used for carrying out syllable matching on each candidate full-spelling character string in the candidate full-spelling character string set in the full-spelling tree and determining a fourth target path matched with each candidate full-spelling character string; the third candidate word set determining module is used for determining a third candidate word set corresponding to a fourth target child node in each fourth target path; the fourth target child node is a child node which is farthest from the root node of the full-spelling tree in the corresponding fourth target path; the fourth candidate word set determining module is used for determining a fourth candidate word set according to a third candidate word set corresponding to a fourth target child node in each fourth target path; and the fourth display module is used for displaying the candidate words in the fourth candidate word set.
In some embodiments, the second building block comprises: a second history input word acquiring unit configured to acquire a plurality of second history input words; the second full spelling character string determining unit is used for determining a second full spelling character string corresponding to each second historical input word; the second determining unit is used for determining syllables corresponding to the child nodes in the full spelling tree according to the plurality of second full spelling character strings; the third determining unit is used for determining a candidate word set corresponding to each child node in the full spelling tree according to a second full spelling character string corresponding to each second historical input word; and the second association storage unit is used for performing association storage on the syllables corresponding to the child nodes in the full spelling tree and the corresponding candidate word sets.
In some embodiments, the extraction module 1120 includes: the syllable sequence acquisition unit is used for acquiring a syllable sequence, and the syllable sequence is obtained by segmenting an input character string; the target initial consonant sequence determining unit is used for extracting initial consonants in all syllables in the syllable sequence and combining the extracted initial consonants according to the arrangement sequence of the syllables in the syllable sequence to obtain a target initial consonant sequence; and the target final sequence determining unit is used for extracting the final in each syllable in the syllable sequence and combining the extracted final according to the arrangement sequence of the syllables in the syllable sequence to obtain the target final sequence.
In some embodiments, the input device further comprises: and the type identification module is used for carrying out type identification on the input character string to obtain a type identification result, and the type identification result is used for indicating that the input character string is a full spelling character string or a simple spelling character string.
In some embodiments, the display module 1150 includes: the using frequency acquiring unit is used for acquiring the using frequency of each candidate word in the target candidate word set by the user; the sorting unit is used for sorting the candidate words in the target candidate word set according to the sequence of the use frequency from high to low to obtain a target sorting; and the sequencing display unit is used for sequencing and displaying the candidate words in the target candidate word set according to the targets.
FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device may be a terminal or the like in fig. 1A, and it should be noted that the computer system 1200 of the electronic device shown in fig. 12 is only an example, and should not bring any limitation to the functions and the use scope of the embodiment of the present application.
As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU)1201, which can perform various appropriate actions and processes, such as executing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for system operation are also stored. The CPU1201, ROM1202, and RAM 1203 are connected to each other by a bus 1204. An Input/Output (I/O) interface 1205 is also connected to bus 1204.
The following components are connected to the I/O interface 1205: an input portion 1206 including a keyboard, a mouse, and the like; an output section 1207 including a Display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 1201.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries computer readable instructions which, when executed by a processor, implement the method of any of the embodiments described above.
According to an aspect of the present application, there is also provided an electronic device, including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any of the above embodiments.
According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method in any of the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

1. An input method, comprising:
acquiring an input character string;
if the input character string is a full spelling character string, extracting a target initial consonant sequence and a target vowel sequence from the input character string;
matching in a simple spelling tree according to the target initial sequence, and determining a first target path matched with the target initial sequence; one child node in the simple spelling tree corresponds to one initial consonant; a final index associated with each child node is arranged in the simple spelling tree, and the final index indicates a mapping relation between a final sequence and a candidate word set;
determining a target candidate word set corresponding to the target final sequence in a final index corresponding to a first target child node in the first target path, wherein the first target child node is a child node in the first target path, which is farthest from a root node of the simple spelling tree;
and displaying the candidate words in the target candidate word set.
2. The method of claim 1, wherein after the obtaining the input string, the method further comprises:
if the input character string is a simple spelling character string, matching in the simple spelling tree, and determining a second target path matched with the input character string;
determining a first candidate word set corresponding to a second target child node in the second target path, wherein the second target child node is a child node in the second target path, which is farthest from a root node of the simple spelling tree;
and displaying the candidate words in the first candidate word set.
3. The method according to claim 1 or 2, characterized in that the method further comprises: constructing the simple spelling tree;
the constructing the simple spelling tree comprises the following steps:
acquiring a plurality of first historical input words;
determining a first full spelling character string corresponding to each first historical input word;
extracting vowels of the first full spelling character string corresponding to each first historical input word to obtain a corresponding first historical vowel sequence;
performing initial consonant extraction on a first full spelling character string corresponding to each first historical input word to obtain a corresponding first historical initial sequence;
determining initial consonants corresponding to the child nodes in the simple spelling tree according to the first historical initial consonant sequences;
determining a vowel index corresponding to each child node in the simple spelling tree according to each first history input word and a first history vowel sequence corresponding to the first history input word;
and performing associated storage on the initial consonants corresponding to the child nodes in the simple spelling tree and the final index corresponding to the child nodes.
4. The method of claim 3, wherein before the building the simple mosaic tree, the method further comprises:
acquiring a plurality of historical input character strings of a user within a set time length;
determining a preference input mode of the user according to a plurality of historical input character strings;
and if the preference input mode of the user is a simple spelling input mode, executing the step of constructing the simple spelling tree.
5. The method of claim 4, wherein after determining the preferred input mode corresponding to the user according to the plurality of historical input strings, the method further comprises:
if the preference input mode of the user is a full spelling input mode, constructing a full spelling tree; and one child node in the full spelling tree corresponds to one syllable, and a candidate word set is associated with each child node.
6. The method of claim 5, wherein after obtaining the input string, the method further comprises:
if the input character string is a full spelling character string and the loaded dictionary tree is the full spelling tree, obtaining a syllable sequence which is obtained by segmenting the input character string;
performing syllable matching in the full spelling tree, and determining a third target path matched with the syllable sequence;
determining a second candidate word set associated with a third target child node in the third target path; the third target child node is a child node of the third target path that is farthest from a root node of the full-spelling tree;
and displaying the candidate words in the second candidate word set.
7. The method of claim 5, wherein after the obtaining the input string, the method further comprises:
if the input character string is a simple spelling character string and the loaded dictionary tree is the full spelling tree, determining a candidate full spelling character string set corresponding to the input character string;
performing syllable matching on each candidate full-spelling character string in the candidate full-spelling character string set in the full-spelling tree, and determining a fourth target path matched with each candidate full-spelling character string;
determining a third candidate word set corresponding to a fourth target child node in each fourth target path; the fourth target child node is a child node which is farthest from the root node of the full-spelling tree in the corresponding fourth target path;
determining a fourth candidate word set according to a third candidate word set corresponding to a fourth target child node in each fourth target path;
and displaying the candidate words in the fourth candidate word set.
8. The method according to any one of claims 5-7, wherein the constructing a full mosaic tree comprises:
acquiring a plurality of second historical input words;
determining a second full spelling character string corresponding to each second historical input word;
determining syllables corresponding to the child nodes in the full spelling tree according to the second full spelling character strings;
determining a candidate word set corresponding to each child node in the full spelling tree according to a second full spelling character string corresponding to each second historical input word;
and performing associated storage on the syllables corresponding to the child nodes in the full spelling tree and the corresponding candidate word sets.
9. The method of claim 1, wherein extracting a target initial sequence and a target final sequence from the input string comprises:
obtaining a syllable sequence, wherein the syllable sequence is obtained by segmenting the input character string;
extracting initial consonants in each syllable in the syllable sequence, and combining the extracted initial consonants according to the arrangement sequence of the syllables in the syllable sequence to obtain the target initial sequence;
extracting the final in each syllable in the syllable sequence, and combining the extracted final according to the arrangement sequence of the syllables in the syllable sequence to obtain the target final sequence.
10. The method of claim 1, wherein before the extracting the target initial sequence and the target final sequence from the input string if the input string is a full pinyin string, the method further comprises:
and performing type recognition on the input character string to obtain a type recognition result, wherein the type recognition result is used for indicating that the input character string is a full spelling character string or a simple spelling character string.
11. The method of claim 1, wherein the displaying candidate words in the target set of candidate words comprises:
acquiring the use frequency of the user for each candidate word in the target candidate word set;
sorting the candidate words in the target candidate word set according to the sequence of the use frequency from high to low to obtain a target sorting;
and displaying the candidate words in the target candidate word set according to the target sequence.
12. An input device, comprising:
the acquisition module is used for acquiring an input character string;
the extraction module is used for extracting a target initial consonant sequence and a target vowel sequence from the input character string if the input character string is a full spelling character string;
the matching module is used for matching in the simple spelling tree according to the target initial consonant sequence and determining a first target path matched with the target initial consonant sequence; one child node in the simple spelling tree corresponds to one initial consonant; a final index associated with each child node is arranged in the simple spelling tree, and the final index indicates a mapping relation between a final sequence and a candidate word set;
a target candidate word set determining module, configured to determine a target candidate word set corresponding to the target final sequence in a final index corresponding to a target child node in the first target path; the target child node is a child node in the first target path which is farthest from a root node of the simple spelling tree;
and the display module is used for displaying the candidate words in the target candidate word set.
13. An electronic device, comprising:
a processor;
a memory having computer-readable instructions stored thereon which, when executed by the processor, implement the method of any of claims 1-11.
14. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-11.
15. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, implement the method according to any of claims 1-11.
CN202210308651.1A 2022-03-24 2022-03-24 Input method, input device, electronic equipment and storage medium Pending CN114675750A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210308651.1A CN114675750A (en) 2022-03-24 2022-03-24 Input method, input device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210308651.1A CN114675750A (en) 2022-03-24 2022-03-24 Input method, input device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114675750A true CN114675750A (en) 2022-06-28

Family

ID=82075813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210308651.1A Pending CN114675750A (en) 2022-03-24 2022-03-24 Input method, input device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114675750A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968682A (en) * 2009-09-09 2011-02-09 戎美川 Chinese input method and system thereof
WO2014012521A1 (en) * 2012-07-19 2014-01-23 Liang Chen Keyboard input method based on sequence of finals
WO2016202101A1 (en) * 2015-06-16 2016-12-22 北京奇虎科技有限公司 Method and device for displaying candidate item based on input method
CN109947779A (en) * 2019-03-29 2019-06-28 北京金山安全软件有限公司 Storage method, device and equipment for user input vocabulary
CN110471539A (en) * 2019-08-20 2019-11-19 符浩 The input method and input unit that sound shape combines
CN111026281A (en) * 2019-10-31 2020-04-17 重庆小雨点小额贷款有限公司 Phrase recommendation method for client, client and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968682A (en) * 2009-09-09 2011-02-09 戎美川 Chinese input method and system thereof
WO2014012521A1 (en) * 2012-07-19 2014-01-23 Liang Chen Keyboard input method based on sequence of finals
WO2016202101A1 (en) * 2015-06-16 2016-12-22 北京奇虎科技有限公司 Method and device for displaying candidate item based on input method
CN109947779A (en) * 2019-03-29 2019-06-28 北京金山安全软件有限公司 Storage method, device and equipment for user input vocabulary
CN110471539A (en) * 2019-08-20 2019-11-19 符浩 The input method and input unit that sound shape combines
CN111026281A (en) * 2019-10-31 2020-04-17 重庆小雨点小额贷款有限公司 Phrase recommendation method for client, client and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
殷知磊 等: "基于程序框架Qt的嵌入式系统汉字库设计与实现", 成都信息工程学院学报, 15 February 2010 (2010-02-15) *

Similar Documents

Publication Publication Date Title
CN111460083B (en) Method and device for constructing document title tree, electronic equipment and storage medium
US20180336193A1 (en) Artificial Intelligence Based Method and Apparatus for Generating Article
CN110019732B (en) Intelligent question answering method and related device
CN111967262A (en) Method and device for determining entity tag
CN110826335B (en) Named entity identification method and device
CN108304373B (en) Semantic dictionary construction method and device, storage medium and electronic device
CN112256860A (en) Semantic retrieval method, system, equipment and storage medium for customer service conversation content
CN112084381A (en) Event extraction method, system, storage medium and equipment
CN110457672A (en) Keyword determines method, apparatus, electronic equipment and storage medium
US10915756B2 (en) Method and apparatus for determining (raw) video materials for news
CN110008473B (en) Medical text named entity identification and labeling method based on iteration method
CN115062134B (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
US11468346B2 (en) Identifying sequence headings in a document
CN111753029A (en) Entity relationship extraction method and device
CN113408273B (en) Training method and device of text entity recognition model and text entity recognition method and device
CN115858773A (en) Keyword mining method, device and medium suitable for long document
CN112560425B (en) Template generation method and device, electronic equipment and storage medium
CN113919424A (en) Training of text processing model, text processing method, device, equipment and medium
CN110874408B (en) Model training method, text recognition device and computing equipment
CN116049370A (en) Information query method and training method and device of information generation model
CN110941713A (en) Self-optimization financial information plate classification method based on topic model
CN115168537A (en) Training method and device of semantic retrieval model, electronic equipment and storage medium
CN114675750A (en) Input method, input device, electronic equipment and storage medium
JP2022055334A (en) Text processing method, apparatus, device and computer-readable storage medium
CN114647727A (en) Model training method, device and equipment applied to entity information recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination