CN109426358B - Information input method and device - Google Patents

Information input method and device Download PDF

Info

Publication number
CN109426358B
CN109426358B CN201710781155.7A CN201710781155A CN109426358B CN 109426358 B CN109426358 B CN 109426358B CN 201710781155 A CN201710781155 A CN 201710781155A CN 109426358 B CN109426358 B CN 109426358B
Authority
CN
China
Prior art keywords
word
character string
sub
words
substring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710781155.7A
Other languages
Chinese (zh)
Other versions
CN109426358A (en
Inventor
李阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710781155.7A priority Critical patent/CN109426358B/en
Publication of CN109426358A publication Critical patent/CN109426358A/en
Application granted granted Critical
Publication of CN109426358B publication Critical patent/CN109426358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application discloses an information input method and device. One embodiment of the method comprises: acquiring a character string input by a user, and dividing the character string into a plurality of sub-character strings; respectively determining words corresponding to each sub-character string based on the character string serial number and word combination information of each sub-character string in the plurality of sub-character strings; and generating a sentence corresponding to the character string based on the word corresponding to each sub-character string. In the sentence generating process, the words having the combination relation with the determined words are inquired only through the word sequence numbers on the basis of the determined words, whether the combination relation exists among all the words corresponding to each sub-character string is not needed to be judged, and the sentence generating speed is improved.

Description

Information input method and device
Technical Field
The application relates to the field of computers, in particular to the field of input methods, and particularly relates to an information input method and device.
Background
Currently, some input methods provide a function of converting a character string input by a user into a sentence and generating a sentence corresponding to the character string. The general sentence transformation method is as follows: dividing the character string into a plurality of sub-character strings, sequentially inquiring all terms corresponding to each sub-character string, judging whether a combination relation exists between all terms corresponding to each sub-character string, finally determining the terms corresponding to each sub-character string for generating the sentence corresponding to the character string according to the judgment result, and generating the sentence corresponding to the character string.
However, the above whole sentence transformation method needs to separately determine whether there is a combination relationship between all words corresponding to each substring, and some words not related to the whole sentence to be generated need to also determine whether there is a combination relationship with other words, which results in a large expense for generating the sentence.
Disclosure of Invention
The application provides an information input method and an information input device, which are used for solving the technical problems existing in the background technology part.
In a first aspect, the present application provides an information input method, comprising: acquiring a character string input by a user, and dividing the character string into a plurality of sub-character strings; respectively determining words corresponding to each substring based on the character string serial number and word combination information of each substring in the plurality of substrings, wherein the word combination information comprises: the word sequence number of the word with the combination relation is determined based on the character string sequence number corresponding to the word set to which the word belongs and the sequence of the word in the word set; and generating a sentence corresponding to the character string based on the word corresponding to each sub-character string.
In a second aspect, the present application provides an information input device comprising: the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a character string input by a user and divide the character string into a plurality of sub-character strings; the query unit is configured to determine a term corresponding to each substring based on the string number and term combination information of each substring in the plurality of substrings, and the term combination information includes: the word sequence number of the word with the combination relation is determined based on the character string sequence number corresponding to the word set to which the word belongs and the sequence of the word in the word set; and the generating unit is configured to generate a sentence corresponding to the character string based on the word corresponding to each sub-character string.
According to the information input method and the information input device, the character string input by the user is obtained, and the character string is divided into a plurality of sub-character strings; respectively determining words corresponding to each substring based on the character string serial number and word combination information of each substring in the plurality of substrings, wherein the word combination information comprises: the word sequence number of the word with the combination relation is determined based on the character string sequence number corresponding to the word set to which the word belongs and the sequence of the word in the word set; and generating a sentence corresponding to the character string based on the word corresponding to each sub-character string. In the sentence generating process, the words having the combination relation with the determined words are inquired only through the word sequence numbers on the basis of the determined words, whether the combination relation exists among all the words corresponding to each sub-character string is not needed to be judged, and the sentence generating speed is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 illustrates an exemplary system architecture to which embodiments of the information input method or apparatus of the present application may be applied;
FIG. 2 shows a flow diagram of one embodiment of an information input method according to the present application;
FIG. 3 is a diagram illustrating an effect of correspondence between a string number and a word number;
FIG. 4 is a schematic diagram of a binary relation table;
FIG. 5 is a diagram illustrating a string number for querying a string from a dictionary tree;
FIG. 6 shows a schematic structural diagram of one embodiment of an information input device according to the present application;
FIG. 7 is a block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture to which embodiments of the information input method or apparatus of the present application may be applied.
As shown in fig. 1, the system architecture may include terminals 101, 102, 103, a network 104 and a server 105. The network 104 is used to provide the medium of transmission links between the terminals 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless transmission links, or fiber optic cables, among others.
The user may use the terminals 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminals 101, 102, 103 may have an input method application and a browser application installed thereon.
The terminals 101, 102, 103 may be various electronic devices having a display screen and supporting network communications, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides support for input method applications on the terminals 101, 102, 103. The server 105 may receive an input request including a character string input by the user of the terminal 101, 102, 103 transmitted from the terminal 101, 102, 103, query a sentence corresponding to the character string input by the user of the terminal 101, 102, 103, and transmit the query sentence corresponding to the character string input by the user to the terminal 101, 102, 103.
It should be understood that the number of terminals, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminals, networks, and servers, as desired for an implementation.
Referring to fig. 2, a flow diagram of one embodiment of an information input method according to the present application is shown. It should be noted that the information input method provided by the present embodiment may be executed by a terminal or a server (for example, the terminals 101, 102, 103 or the server 105 in fig. 1). The method comprises the following steps:
step 201, acquiring a character string input by a user, and dividing the character string into a plurality of sub-character strings.
In this embodiment, the length of the character string input by the user may be greater than the length threshold, and after the character string input by the user is received, a sentence corresponding to the character string input by the user may be queried.
In this embodiment, the character string input by the user may be acquired by the terminal or the server, and divided into a plurality of sub-character strings. When the sentence corresponding to the character string input by the user is generated by the terminal, the input operation of the user can be detected on the terminal, and the character string input by the user can be acquired on the terminal. When the server generates a sentence corresponding to a character string input by a user of the terminal, the input operation of the user may be detected at the terminal, and after the character string input by the user is acquired, an input request including the character string input by the user of the terminal may be sent to the server by the terminal, so that the server may acquire the character string input by the user of the terminal.
After the character string input by the user is acquired, the character string input by the user may be divided into a plurality of sub-character strings. For example, the character string input by the user is "jinjitongzhi", and the character string may be divided into a sub-character string "jinji" and a sub-character string "tongzhi".
Step 202, determining words corresponding to each sub-character string respectively based on the character string serial number and word combination information of each sub-character string.
In this embodiment, the character string input by the user may be pinyin of a plurality of words, and the terminal or the server may determine, based on the character string number and word combination information of each of a plurality of sub-character strings obtained by dividing the character string input by the user, a word corresponding to each sub-character string.
In this embodiment, a string may have a unique string number, and the string number is an integer. The word set corresponding to the string sequence number of a string may include all the words of the string whose corresponding pinyins are all the words of the string. For example, the string number of the string "tongzhi" is 2, the word set corresponding to the string number 2 includes all the words of which the corresponding pinyins are "tongzhi", for example, the word set corresponding to the string number 2 includes words of "notify", "conquer", "symptomatically", and the like.
In this embodiment, each word in a word set may have a unique word number, and the word number is an integer. The word sequence number of a word may be determined based on the string sequence number corresponding to the word set to which the word belongs and the order of the word in the word set. A word rank of a word in a word set may be 1 greater than a word rank of a word immediately preceding the word. The word sequence number of the first word in a word set may be 1 greater than the word sequence number of the last word in the word set corresponding to the previous string sequence number of the string sequence numbers corresponding to the word set.
For example, the string number of the string "jinjin" is 1, and the string number of the string "tongzhi" is 2. The word set corresponding to the character string number 1 includes all the words whose corresponding pinyins are "jinjin", for example, the word set corresponding to the character string number 1 includes "urgent", "advance", "keep in mind". The word set corresponding to the character string number 2 includes all the words whose pinyin is "tongzhi", for example, the word set corresponding to the character string number 2 includes "notification", "dominance", "co-treatment". The word sequence number of "urgent", "promotion", or "remembered" in the word set corresponding to the character string sequence number 1 may be 1, 2, or 3. The word sequence number of the first word in the word set corresponding to the character string sequence number 2 may start from 4, and the word sequence numbers of "notification", "dominance" and "simultaneous dominance" in the word set corresponding to the character string sequence number 2 may be 4, 5 and 6, respectively.
In some optional implementations of the present embodiment, the order of each word in the word set may be determined according to the number of times the word is input into the input area, i.e., the number of times the word is input into the input area. For example, the string number of the string "tongzhi" is 2, the word set corresponding to the string number 2 includes "notify", "govern" and the word number of the first word in the word set corresponding to the string number 2 starts from 4, and the input times corresponding to "notify", "govern" and "govern" are sequentially "notify", "govern" and "govern" from high to low, so that the word numbers corresponding to "notify", "govern", "inform", "govern" and "govern" in sequence in the word set corresponding to the string number 2 may be 4, 5 and 6, respectively.
In this embodiment, the word combination information includes: the combination relationship may be a binary relationship. The word combination information may contain a plurality of information items, for example, when the combination relationship is a binary relationship, each information item may contain a word number of two words having the combination relationship. For example, the words "urgent" and "notice" have a combined relationship, and the words "urgent" and "notice" may correspond to an information item that may contain the word numbers 1, 4 of the words "urgent" and "notice".
Please refer to fig. 3, which illustrates an effect diagram of the correspondence relationship between the character string sequence number and the word sequence number.
In this embodiment, a correspondence table indicating correspondence between a string number of a string and a word set corresponding to the string number and a word number of a word in the word set corresponding to the string number may be established in advance, and if a plurality of words are present in the word set corresponding to one string, then in the correspondence table, one string number may correspond to word numbers of a plurality of words. Each entry in the correspondence table may include a string number of a string, a word set corresponding to the string number, and a word number of each word in the word set corresponding to the string number.
In this embodiment, when determining the word corresponding to each sub-character string based on the character string number and the word combination information of each sub-character string in the plurality of sub-character strings, the character string number of each sub-character string in the character strings input by the user may be first queried from a pre-established correspondence table between the character strings and the character string numbers. Then, a word set corresponding to the character string number of each sub-character string and a word sequence number of each word in the word set corresponding to the character string number of each sub-character string, which are obtained by dividing the character string input by the user, may be searched from a correspondence table indicating correspondence between the character string number of the character string and the word set corresponding to the character string number of the character string and the word sequence number of the word in the word set corresponding to the character string number of each sub-character string.
In this embodiment, one character string has one unique character string number, and for an adjacent sub-character string in a plurality of sub-character strings obtained by dividing a character string input by a user, a word number of a word in a word set corresponding to each character string number of two adjacent sub-character strings may also be referred to as an adjacent word number.
After the term sequence number of each term in the term set corresponding to the character string sequence number of each substring and the term sequence number of each term in the term set corresponding to the character string sequence number of each substring are queried, term sequence numbers meeting the following conditions can be queried: the adjacent word serial numbers all correspond to one information item in the word combination information.
For example, the combination relationship may be a binary relationship, each information item in the term combination information may include term numbers of two terms having a binary relationship, and the queried adjacent term numbers satisfying the above condition may be equivalent to the queried terms corresponding to the adjacent term numbers all having a binary relationship. Therefore, the searched terms corresponding to each term number meeting the above conditions can be used as the terms corresponding to each substring, and the sentence corresponding to the character string input by the user can be generated.
In some optional implementation manners of this embodiment, when determining the term corresponding to each sub-character string based on the character string number and term combination information of each sub-character string in the plurality of sub-character strings obtained by dividing the character string input by the user, the term corresponding to the character string number of the subsequent sub-character string of the sub-character string corresponding to the most recently queried term, that is, the most recently queried term, may be queried by performing a query operation once. And executing a plurality of query operations until the term corresponding to each substring is queried. The query operation includes: determining a character string sequence number of a subsequent sub-character string of the sub-character string corresponding to the latest queried word, wherein the latest queried word is a word selected from a word set corresponding to a first sub-character string in the plurality of sub-character strings when a query operation is executed for the first time; determining a word set corresponding to the character string serial number of the next substring of the substring corresponding to the newly inquired word and the word serial number of the word in the word set; inquiring a table item containing the word serial number of the latest inquired word and the word serial number of one word in a word set corresponding to the character string serial number of the next substring of the substring corresponding to the latest inquired word from a binary relation table, wherein the table item in the binary relation table contains the word serial numbers of two words with a combination relation; taking a term corresponding to a term sequence number of a term in a term set corresponding to a character string sequence number of a sub-character string corresponding to a newly queried term in a queried table item as a term corresponding to a latter sub-character string of a sub-character string corresponding to a newly queried term; judging whether the next sub-character string of the sub-character string corresponding to the newly inquired word is the last sub-character string or not; if the next sub-character string of the sub-character strings corresponding to the newly inquired terms is the last sub-character string, stopping executing the inquiry operation; and if the next sub-character string of the sub-character string corresponding to the newly inquired term is not the last sub-character string, taking the term corresponding to the next sub-character string of the sub-character string corresponding to the newly inquired term as the newly inquired term, and executing the inquiry operation again.
In this embodiment, the process of respectively querying one term corresponding to each character string may be referred to as a sequential query process. Before the query operation is executed for the first time in the query process, when a term set corresponding to the character string sequence number of the first substring contains a term, the term can be used as the latest queried term, and then the first query operation in the query process is executed.
Before a first query operation in a query process is executed, when only a plurality of terms are included in a term set corresponding to a character string number of a first substring, one term may be selected from the plurality of terms in the term set corresponding to the character string number of the first substring as a newly queried term. Then, the first query operation of the query process is executed again, after the term corresponding to each substring is queried by executing the query operation for multiple times, another term can be selected from the terms in the term set corresponding to the character string sequence number of the first substring again to serve as the newly queried term, the first query operation of the next query process is executed again, and the term corresponding to each substring is queried by executing the query operation for multiple times again.
In this embodiment, starting from a first sub-character string of a plurality of sub-character strings obtained by dividing a character string input by a user, each time a query operation is executed, a term corresponding to one sub-character string of the plurality of sub-character strings obtained by dividing the character string input by the user can be queried according to a latest query, and a term corresponding to a next sub-character string of the sub-character string can be queried until a term corresponding to each sub-character string of the character string input by the user is queried in sequence.
Please refer to fig. 4, which shows a schematic structural diagram of the binary relation table.
In fig. 4, entries in the binary relation table containing the word numbers 1 and 4 of "urgent" and "notice" having a binary relation and the word numbers 9 and 11 of "river" and "drama" having a binary relation are shown.
Taking the example that the adjacent sub-character strings included in the character string input by the user are "jinji", "tongzhi", the newly queried word is the word in the word set to which the sub-character string "jinji" belongs is "urgent", the query operation is executed to determine the word corresponding to the next character string "tongzhi" of the sub-character string "jinji" corresponding to the newly queried word, the character string number of the character string "jinjin" is 1, and the character string number of the character string "tongzhi" is 2. The word set corresponding to the character string number 1 includes all the words whose corresponding pinyins are "jinjin", for example, the word set corresponding to the character string number 1 includes "urgent", "advance", "keep in mind". The word set corresponding to the character string number 2 includes all the words whose pinyin is "tongzhi", for example, the word set corresponding to the character string number 2 includes "notification", "dominance", "co-treatment". The word sequence number of "urgent", "promotion", or "remembered" in the word set corresponding to the character string sequence number 1 may be 1, 2, or 3. The word sequence number of the first word in the word set corresponding to the character string sequence number 2 may be 4, and the word sequence numbers of "notification", "dominance", and "simultaneous dominance" in the word set corresponding to the character string sequence number 2 may be 4, 5, and 6, respectively. The binary relation table contains the sequence numbers of words with binary relations. The "urgent" and "notification" correspond to an entry in the binary relationship table, which may contain the word sequence numbers of "urgent" and "notification", i.e., 1 and 4.
When the latest queried word is the word "urgent" in the word set to which the substring "jinji" belongs, and the query operation is executed to query the word corresponding to the next substring "tongzhi" of the substring "jinji", first, the string number of the next substring "tongzhi" of the substring "jinji" corresponding to the latest queried word is determined to be 2. Then, the word set corresponding to the character string number 2 and the word numbers of the words "notify", "govern" and "govern at the same time" in the word set corresponding to the character string number 2, that is, 4, 5 and 6, may be obtained. The binary relation table may be queried to determine whether the entry contains any one of the sequence numbers 4, 5, and 6 of the terms and the term sequence number 1 of the newly queried term "urgent". When the table entry containing the term sequence number 1 and the term sequence number 4 is queried from the binary relation table, it can be determined that the term corresponding to the term sequence number 1 and the term corresponding to the term sequence number 4 have a binary relation, and the term "notification" corresponding to the term sequence number 4 can be used as the term corresponding to the next sub-string "tongzhi" of the sub-string "jinji" corresponding to the newly queried term "emergency". Then, the "notification" may be used as the latest queried term, the query operation is executed again, and the term corresponding to the next sub-string of the sub-string "tongzhi" is continuously queried.
In some optional implementation manners of this embodiment, a dictionary tree may be constructed in advance, each non-leaf node in the dictionary tree corresponds to one syllable element, each leaf node corresponds to one character string, and a character string corresponding to a leaf node is a character string composed of syllable elements corresponding to non-leaf nodes on a path from the leaf node to a root node. Syllable elements can be initials or finals. In the dictionary tree, all leaf nodes may be sorted according to the position of each leaf node, and after sorting, one leaf node may have a node sequence number, and the node sequence number may also be used as a character string sequence number of a character string corresponding to the leaf node. When the character string sequence number of a substring is determined in the query operation, a leaf node corresponding to the substring can be queried in the dictionary tree, and the node sequence number in the node data of the leaf node is read, so that the character string sequence number of the substring can be obtained. For example, in the query operation, the character string sequence number of the subsequent sub-character string of the sub-character string corresponding to the newly queried word is determined, a leaf node corresponding to the subsequent sub-character string of the sub-character string corresponding to the newly queried word may be found in the dictionary tree, and the node sequence number of the leaf node in the node data of the leaf node is read, so that the character string sequence number of the subsequent sub-character string of the sub-character string corresponding to the newly queried word may be obtained.
Referring to fig. 5, a diagram of looking up a string number of a string from a dictionary tree is shown.
Fig. 5 shows a root node indicated by 0, and leaf nodes corresponding to a leaf node, i.e., a character string "shide", among child nodes of non-leaf nodes indicated by "sh", "i", "d", "e", and "e", respectively. The character string corresponding to the leaf node is "shide", that is, a character string composed of character strings corresponding to non-leaf nodes on a path between the leaf node and the root node, that is, "sh", "i", "d", and "e". Accordingly, the leaf node corresponds to the word set including the words "yes", "similar", "cause", etc. corresponding to the character string number of the character string "shide". When the serial number of the character string "shide" needs to be queried, the non-leaf nodes corresponding to "sh", "i", "d" and "e" can be sequentially queried in the dictionary tree, so that the leaf node corresponding to the character string "shide" can be quickly queried, the node serial number in the attribute information of the leaf node can be read, and the node serial number of the leaf node is determined.
In some optional implementation manners of this embodiment, a query array may be pre-constructed, and a subscript of each array element in the query array corresponds to one string number. Each array element includes: and the word set corresponding to the character string serial number corresponding to the subscript of the array element and the word serial number of each word in the word set. When a term set corresponding to the character string number of the next substring of the substring corresponding to the most recently queried term and the term number of the term in the term set are obtained in the query operation, an array element whose subscript is the character string number of the next substring of the substring corresponding to the most recently queried term in the query data can be read, and the array element contains the term set corresponding to the character string number of the next substring of the substring corresponding to the most recently queried term and the term number of each term in the term set.
Step 203, generating a sentence corresponding to the character string based on the word corresponding to each sub-character string.
In this embodiment, the terminal or the server may generate a sentence corresponding to the character string input by the user based on the queried word corresponding to each sub-character string. When the terminal generates a sentence corresponding to the character string input by the user based on the word corresponding to each queried substring, the generated sentence corresponding to the character string input by the user may be presented to the user as a candidate result after the sentence corresponding to the character string input by the user is generated, and when the user selects the candidate result for input, the generated sentence corresponding to the character string input by the user may be input into the input area. When the server generates a sentence corresponding to the character string input by the user based on the queried word corresponding to each sub-character string, the server may send the generated sentence corresponding to the character string input by the user to the terminal. After receiving the sentence corresponding to the character string input by the user, the terminal may present the sentence corresponding to the character string input by the user as a candidate result to the user, and when the user selects the candidate result for input, input the generated sentence corresponding to the character string input by the user into the input area.
In this embodiment, the number of terms corresponding to one substring queried through the query operation may be multiple, that is, each of the terms corresponding to the substring queried through the query operation has a binary relationship with a term corresponding to a previous substring of the substring queried through the query operation.
In this embodiment, words corresponding to each sub-character string obtained by dividing the character string input by the user for generating the sentence may be finally determined according to the association degree between the words corresponding to the sub-character strings obtained by dividing the character string input by the user, which is obtained by querying. For example, in the word graph, a weight indicating a degree of association exists between a term corresponding to each sub-character string queried by the query operation and a term corresponding to other adjacent sub-characters queried by the query operation, a path with the largest corresponding weight can be found in the word graph, and a term corresponding to each sub-character string on the path is taken as a term corresponding to a final sub-character string used for generating a sentence, thereby generating a sentence corresponding to a character string input by the user.
Referring to fig. 6, as an implementation of the method shown in the above figures, the present application provides an embodiment of an information input device, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied to various electronic devices.
As shown in fig. 6, the information input device of the present embodiment includes: acquisition section 601, inquiry section 602, and generation section 603. The acquiring unit 601 is configured to acquire a character string input by a user, and divide the character string into a plurality of sub-character strings; the querying unit 602 is configured to determine a term corresponding to each sub-character string based on the character string serial number and term combination information of each sub-character string in the plurality of sub-character strings, where the term combination information includes: the word sequence number of the word with the combination relation is determined based on the character string sequence number corresponding to the word set to which the word belongs and the sequence of the word in the word set; the generating unit 603 is configured to generate a sentence corresponding to the character string input by the user based on the word corresponding to each sub-character string.
In some optional implementations of this embodiment, the querying unit 602 includes: a term-by-term query unit configured to perform a query operation: determining a character string sequence number of a subsequent sub-character string of the sub-character string corresponding to the latest queried word, wherein the latest queried word is a word selected from a word set corresponding to the character string sequence number of a first sub-character string in the plurality of sub-character strings when a query operation is executed for the first time; acquiring a word set corresponding to the character string serial number of the next substring of the substring corresponding to the word which is determined to be inquired out latest and the word serial number of the word in the word set; inquiring a table item of a word serial number of a word in a word set corresponding to the character string serial number of a sub-character string which comprises the word serial number of the latest inquired word and the character string serial number of the latest inquired word serial number from a binary relation table, wherein the table item in the binary relation table comprises the word serial numbers of two words with a combination relation; taking the term corresponding to the term serial number of one term in the term set corresponding to the character string serial number of the next sub-character string of the latest queried term serial number in the queried table entry as the term corresponding to the next sub-character string; judging whether the next sub-character string is the last sub-character string or not; if yes, stopping executing the query operation; if not, taking the term corresponding to the next substring as the latest queried term, and executing the query operation again.
In some optional implementations of this embodiment, the querying unit 602 is further configured to: searching a leaf node corresponding to a subsequent substring of a substring corresponding to a newly searched word in a dictionary tree, wherein each non-leaf node in the dictionary tree corresponds to a syllable element, and the character string corresponding to each leaf node is a character string consisting of syllable elements corresponding to non-leaf nodes on a path from the leaf node to a root node; and taking the node serial number of the leaf node corresponding to the next substring of the substring corresponding to the newly inquired term as the character string serial number of the next substring of the substring corresponding to the newly inquired term.
In some optional implementations of this embodiment, the querying unit 602 is further configured to: querying array elements corresponding to the string sequence number of the next substring in a query array, wherein a subscript of each array element in the query array corresponds to a string sequence number, and querying the array elements in the array comprises: and the word set corresponding to the character string serial number corresponding to the subscript of the array element and the word serial number of the word in the word set.
FIG. 7 illustrates a schematic block diagram of a computer system suitable for use to implement a server according to embodiments of the present application. The server shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 7, the computer system includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the computer system are also stored. The CPU 701, the ROM 702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706; an output section 707; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, the processes described in the embodiments of the present application may be implemented as computer programs. For example, embodiments of the present application include a computer program product comprising a computer program carried on a computer readable medium, the computer program comprising instructions for carrying out the method illustrated by the flow chart. The computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU) 701, performs the above-described functions defined in the method of the present application.
The present application also provides a server, which may be configured with one or more processors; a memory for storing one or more programs, the one or more programs may include instructions for performing the operations described in steps 201-203 of the above embodiments. The one or more programs, when executed by the one or more processors, cause the one or more processors to perform the operations described in steps 201-203 in the embodiments described above.
The present application also provides a computer readable medium, which may be included in a server; or the device can exist independently and is not assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: acquiring a character string input by a user, and dividing the character string into a plurality of sub-character strings; respectively determining words corresponding to each substring based on the character string serial number and word combination information of each substring in the plurality of substrings, wherein the word combination information comprises: the word sequence number of the word with the combination relation is determined based on the character string sequence number corresponding to the word set to which the word belongs and the sequence of the word in the word set; and generating a sentence corresponding to the character string based on the word corresponding to each sub-character string.
It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (11)

1. An information input method, characterized in that the method comprises:
the method comprises the steps of obtaining a character string input by a user, and dividing the character string into a plurality of sub-character strings, wherein each sub-character string has a unique character string serial number, a word set corresponding to the character string serial number of one character string comprises all words of which corresponding Pinyin is the character string, and each word in the word set has a unique word serial number;
respectively determining the words corresponding to each substring string based on the character string serial number and word combination information of each substring string in the plurality of substrings through a pre-established correspondence table representing correspondence between the character string serial number of the character string and the word set corresponding to the character string serial number and the word serial number of the words in the word set corresponding to the character string serial number, and the method comprises the following steps: on the basis of an iteration mode, inquiring words corresponding to a next sub-character string of the current sub-character string, which have a combination relation with the words corresponding to the current sub-character string, one by one from the plurality of sub-character strings; the word combination information includes: the word sequence numbers of the words with the combination relationship comprise a binary relationship, each information item of the word combination information comprises the word sequence numbers of two words with the combination relationship, and the word sequence numbers of the words are determined based on the character string sequence numbers corresponding to the word set to which the words belong and the sequence of the words in the word set;
generating a sentence corresponding to the character string based on the word corresponding to each sub-character string,
generating a sentence corresponding to the character string based on the words corresponding to each sub-character string, including:
in the word graph, a weight for representing the degree of association is arranged between a term corresponding to each sub-character string inquired through inquiry operation and a term corresponding to other adjacent inquired sub-characters, a path with the maximum corresponding weight is found in the word graph, and the term corresponding to each sub-character string on the path is used as the final term corresponding to the sub-character string used for generating the sentence.
2. The method of claim 1, wherein determining the word corresponding to each substring based on the string number and the word combination information of each substring of the plurality of substrings comprises:
and executing a query operation:
determining a character string sequence number of a subsequent sub-character string of the sub-character string corresponding to the latest queried word, wherein the latest queried word is a word selected from a word set corresponding to the character string sequence number of a first sub-character string in the plurality of sub-character strings when a query operation is executed for the first time;
acquiring a word set corresponding to the character string sequence number of the next sub-character string and word sequence numbers of words in the word set;
inquiring a table item containing the word serial number of the latest inquired word and the word serial number of one word in the word set corresponding to the character string serial number of the next substring from a binary relation table, wherein the table item in the binary relation table contains the word serial numbers of two words with a combination relation;
taking the word corresponding to the word serial number of one word in the word set corresponding to the character string serial number of the next sub-character string in the inquired list item as the word corresponding to the next sub-character string;
judging whether the next sub-character string is the last sub-character string or not;
if yes, stopping executing the query operation;
if not, taking the term corresponding to the next sub-character string as the latest queried term, and executing the query operation again.
3. The method of claim 2, wherein determining a string order number of a sub-string subsequent to a sub-string corresponding to the newly queried term comprises:
searching a leaf node corresponding to a subsequent substring of a substring corresponding to a newly searched word in a dictionary tree, wherein each non-leaf node in the dictionary tree corresponds to a syllable element, and the character string corresponding to each leaf node is a character string consisting of syllable elements corresponding to the non-leaf nodes on a path from the leaf node to a root node;
and taking the node serial number of the leaf node corresponding to the next substring of the substring corresponding to the newly inquired term as the character string serial number of the next substring of the substring corresponding to the newly inquired term.
4. The method of claim 3, wherein the obtaining the word set corresponding to the string number of the next substring and the word number of the word in the word set comprises:
querying array elements corresponding to the string sequence number of the next substring in a query array, wherein a subscript of each array element in the query array corresponds to one string sequence number, and querying the array elements in the array comprises: and the word set corresponding to the character string serial number corresponding to the subscript of the array element and the word serial number of the word in the word set.
5. The method of claim 4, further comprising:
determining the order of the words in the word set to which the words belong based on the input times corresponding to the words.
6. An information input apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition unit, a judgment unit and a display unit, wherein the acquisition unit is configured to acquire a character string input by a user and divide the character string into a plurality of sub-character strings, each sub-character string has a unique character string serial number, a word set corresponding to the character string serial number of one character string comprises all words of which the corresponding Pinyin is the character string, and each word in the word set has a unique word serial number;
a query unit configured to determine, based on a string number and word combination information of each of the plurality of substrings, a word corresponding to each substring, respectively, by using a correspondence table that represents a correspondence between a string number of a string and a word set corresponding to the string number, and a word number of a word in the word set corresponding to the string number, the query unit including: on the basis of an iteration mode, inquiring words corresponding to a next sub-character string of the current sub-character string, which have a combination relation with the words corresponding to the current sub-character string, one by one from the plurality of sub-character strings; the word combination information includes: the word sequence numbers of the words with the combination relationship comprise a binary relationship, each information item of the word combination information comprises the word sequence numbers of two words with the combination relationship, and the word sequence numbers of the words are determined based on the character string sequence numbers corresponding to the word set to which the words belong and the sequence of the words in the word set;
the generating unit is configured to generate a sentence corresponding to the character string based on the word corresponding to each sub-character string;
the generating unit is further configured to find a path with the largest corresponding weight in the word graph, and use the term corresponding to each substring on the path as the final term corresponding to the substring used for generating the statement, where the term corresponding to each substring found by the query operation has a weight indicating a degree of association with the term corresponding to other queried adjacent substrings in the word graph.
7. The apparatus of claim 6, wherein the query unit comprises:
a term-by-term query unit configured to perform a query operation: determining a character string sequence number of a subsequent sub-character string of the sub-character string corresponding to the latest queried word, wherein the latest queried word is a word selected from a word set corresponding to the character string sequence number of a first sub-character string in the plurality of sub-character strings when a query operation is executed for the first time; acquiring a word set corresponding to the character string sequence number of the next sub-character string and word sequence numbers of words in the word set; inquiring a table item containing the word serial number of the latest inquired word and the word serial number of one word in the word set corresponding to the character string serial number of the next substring from a binary relation table, wherein the table item in the binary relation table contains the word serial numbers of two words with a combination relation; taking the word corresponding to the word serial number of one word in the word set corresponding to the character string serial number of the next sub-character string in the inquired list item as the word corresponding to the next sub-character string; judging whether the next sub-character string is the last sub-character string; if yes, stopping executing the query operation; if not, taking the term corresponding to the next substring as the latest queried term, and executing the query operation again.
8. The apparatus of claim 7, wherein the querying element is further configured to: searching a leaf node corresponding to a subsequent substring of a substring corresponding to a newly searched word in a dictionary tree, wherein each non-leaf node in the dictionary tree corresponds to a syllable element, and the character string corresponding to each leaf node is a character string consisting of syllable elements corresponding to the non-leaf nodes on a path from the leaf node to a root node; and taking the node serial number of the leaf node corresponding to the next substring of the substring corresponding to the newly inquired term as the character string serial number of the next substring of the substring corresponding to the newly inquired term.
9. The apparatus of claim 8, wherein the querying element is further configured to: querying array elements corresponding to the string sequence number of the next substring in a query array, wherein a subscript of each array element in the query array corresponds to one string sequence number, and querying the array elements in the array comprises: and the word set corresponding to the character string sequence number corresponding to the subscript of the array element and the word sequence numbers of the words in the word set.
10. A server, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-5.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201710781155.7A 2017-09-01 2017-09-01 Information input method and device Active CN109426358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710781155.7A CN109426358B (en) 2017-09-01 2017-09-01 Information input method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710781155.7A CN109426358B (en) 2017-09-01 2017-09-01 Information input method and device

Publications (2)

Publication Number Publication Date
CN109426358A CN109426358A (en) 2019-03-05
CN109426358B true CN109426358B (en) 2023-04-07

Family

ID=65513079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710781155.7A Active CN109426358B (en) 2017-09-01 2017-09-01 Information input method and device

Country Status (1)

Country Link
CN (1) CN109426358B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035597B (en) * 2020-09-04 2023-11-21 常州新途软件有限公司 Vehicle-mounted input method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5754736A (en) * 1994-09-14 1998-05-19 U.S. Philips Corporation System and method for outputting spoken information in response to input speech signals
CN103823814A (en) * 2012-11-19 2014-05-28 腾讯科技(深圳)有限公司 Information processing method and information processing device
CN106843520A (en) * 2017-02-27 2017-06-13 百度在线网络技术(北京)有限公司 Method and apparatus for exporting whole sentence

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4769031B2 (en) * 2005-06-24 2011-09-07 マイクロソフト コーポレーション Method for creating language model, kana-kanji conversion method, apparatus, computer program, and computer-readable storage medium
CN100444167C (en) * 2005-12-21 2008-12-17 中国科学院计算技术研究所 Method for managing and searching dictionary with perfect even numbers group TRIE Tree
CN100458795C (en) * 2007-02-13 2009-02-04 北京搜狗科技发展有限公司 Intelligent word input method and input method system and updating method thereof
CN101290632B (en) * 2008-05-30 2011-09-14 北京搜狗科技发展有限公司 Input method for user words participating in intelligent word-making and input method system
CN101644961A (en) * 2009-08-14 2010-02-10 北京搜狗科技发展有限公司 Encoded string sequencing method, device and character input method and device
CN102147796B (en) * 2010-02-05 2014-10-15 阿里巴巴集团控股有限公司 Vocabulary searching method and device
CN103198149B (en) * 2013-04-23 2017-02-08 中国科学院计算技术研究所 Method and system for query error correction
CN103927299A (en) * 2014-04-25 2014-07-16 百度在线网络技术(北京)有限公司 Method for providing candidate sentences in input method and method and device for recommending input content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5754736A (en) * 1994-09-14 1998-05-19 U.S. Philips Corporation System and method for outputting spoken information in response to input speech signals
CN103823814A (en) * 2012-11-19 2014-05-28 腾讯科技(深圳)有限公司 Information processing method and information processing device
CN106843520A (en) * 2017-02-27 2017-06-13 百度在线网络技术(北京)有限公司 Method and apparatus for exporting whole sentence

Also Published As

Publication number Publication date
CN109426358A (en) 2019-03-05

Similar Documents

Publication Publication Date Title
US10795939B2 (en) Query method and apparatus
US11062089B2 (en) Method and apparatus for generating information
JP6905098B2 (en) Sentence extraction method and system
US10546002B2 (en) Multiple sub-string searching
CN107241260B (en) News pushing method and device based on artificial intelligence
CN108121814B (en) Search result ranking model generation method and device
CN111247528B (en) Query processing
CN112988753B (en) Data searching method and device
CN110874532A (en) Method and device for extracting keywords of feedback information
CN110245357B (en) Main entity identification method and device
CN109426358B (en) Information input method and device
CN111435406A (en) Method and device for correcting database statement spelling errors
CN107656627B (en) Information input method and device
CN113792232B (en) Page feature calculation method, page feature calculation device, electronic equipment, page feature calculation medium and page feature calculation program product
CN113468529B (en) Data searching method and device
CN110555204A (en) emotion judgment method and device
CN111680508B (en) Text processing method and device
CN110209829B (en) Information processing method and device
CN109308299B (en) Method and apparatus for searching information
CN112148865B (en) Information pushing method and device
CN110765271B (en) Combined processing method and device for entity discovery and entity link
CN110647623B (en) Method and device for updating information
CN109426357B (en) Information input method and device
CN109426356B (en) Information input method and device
CN108664535B (en) Information output method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant