CN105677809A - Chinese word entry index compression method based on mobile terminal and mobile terminal - Google Patents

Chinese word entry index compression method based on mobile terminal and mobile terminal Download PDF

Info

Publication number
CN105677809A
CN105677809A CN201511032929.3A CN201511032929A CN105677809A CN 105677809 A CN105677809 A CN 105677809A CN 201511032929 A CN201511032929 A CN 201511032929A CN 105677809 A CN105677809 A CN 105677809A
Authority
CN
China
Prior art keywords
phrase
mobile terminal
index
keyword
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201511032929.3A
Other languages
Chinese (zh)
Other versions
CN105677809B (en
Inventor
郭金林
覃炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huaduo Network Technology Co Ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN201511032929.3A priority Critical patent/CN105677809B/en
Publication of CN105677809A publication Critical patent/CN105677809A/en
Application granted granted Critical
Publication of CN105677809B publication Critical patent/CN105677809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The embodiment of the invention discloses a Chinese word entry index compression method based on a mobile terminal and the mobile terminal. The quantity of data stored in an ROM or an SD of the mobile terminal is greatly reduced, so that the processing speed of subsequent index retrieving is increased, and the technical problem that for a traditional database index scheme, due to the fact that Chinese word entries are not optimized, the efficiency is low when information of word entries containing partial characters or words is retrieved is solved. The Chinese word entry index compression method based on the mobile terminal comprises the steps that all classified word groups are associated according to keywords, and a corresponding associated word group list is established; the associated word group list is subjected to coding processing based on a position relationship of the keywords in the word groups, and a corresponding compression character string is generated; the compression character string is stored.

Description

A kind of Chinese vocabulary entry index compression method based on mobile terminal and mobile terminal
Technical field
The present invention relates to data mining technology field, particularly relate to a kind of Chinese vocabulary entry index compression method based on mobile terminal and mobile terminal.
Background technology
Index is the structure that the value to row one or more in database table (such as, name (name) row of employee table) is ranked up. If it is desired to search him or she by the surname of specific office worker, then compared with searching for all of row in table, index contributes to obtaining quickly information. It is catalogue before a book that database index cans be compared to, and can accelerate the inquiry velocity of data base. Index is divided into clustered index and Nonclustered index two kinds, and clustered index is the physical location deposited according to data is order, and Nonclustered index is just different; Clustered index can improve the speed of many line retrievals, and Nonclustered index for single file retrieval quickly. Function according to data base, it is possible to create three kinds of indexes in database designers: unique index, major key index and aggregat ion pheromones.
In mobile phone plane plate mobile terminals such as Android, IOS, when doing off-line data inquiry, go out result to high efficiency quick-searching, it is necessary to store a large amount of source data in ROM or SD, and the space of mobile phone ROM and SD, limited with operational capability, do not store mass data, retrieval performance is underground also, due to traditional database indexing schemes, it is not optimized for Chinese vocabulary entry, therefore, causes in the low technical problem of the information timeliness rate in the face of retrieving the entry comprising partial words or word.
Summary of the invention
A kind of Chinese vocabulary entry index compression method based on mobile terminal of embodiment of the present invention offer and mobile terminal, achieve the data volume of storage in ROM or SD of mobile terminal to be greatly reduced, so that the processing speed of follow-up indexed search improves, solve traditional database indexing schemes, owing to not being optimized for Chinese vocabulary entry, the technical problem low in the information timeliness rate in the face of retrieving the entry comprising partial words or word caused.
A kind of Chinese vocabulary entry index compression method based on mobile terminal that the embodiment of the present invention provides, including:
Sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List;
Described conjunctive word Groups List is carried out the coded treatment based on described keyword position relationship in described phrase, and becomes corresponding squeezing characters string;
Described squeezing characters string is stored.
Preferably, sorted all phrases are associated according to keyword, and also include before setting up corresponding conjunctive word Groups List:
All described phrases are scanned, and the described phrase being associated each other is carried out classification process.
Preferably, described conjunctive word Groups List is carried out the coded treatment based on described keyword position relationship in described phrase, and also includes before becoming corresponding squeezing characters string:
The described keyword of described conjunctive word Groups List is carried out with spcial character the setting of described position relationship in corresponding described phrase.
Preferably, described conjunctive word Groups List is carried out the coded treatment based on described keyword position relationship in described phrase, and becomes corresponding squeezing characters string and specifically include:
According to described spcial character, described conjunctive word Groups List carried out the coded treatment based on described keyword, and become corresponding squeezing characters string.
Preferably, the described spcial character described position relationship in corresponding described phrase is phrase centre position relation, phrase anterior locations relation, phrase end position relation.
Preferably, described spcial character is additionally included in the described phrase of correspondence and replaces representing of current character with anteposition phrase.
Preferably, described spcial character also includes the expression of the part of speech type of corresponding phrase.
Preferably, described squeezing characters string is stored after also include:
The first Chinese character of the index phrase according to the input extracted, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, to index and retrieval process that described the first Chinese character is the position relationship that described keyword is set.
Preferably, the first Chinese character of the index phrase according to the input extracted, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, index and retrieval process to described the first Chinese character is the position relationship that described keyword is set specifically include:
Extract the first Chinese character of the index phrase of input;
Visual location positioning option is gone out according to described the first chinese disply;
According to the position location relation selected by described location positioning option, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, to index and retrieval process that described the first Chinese character is the position relationship that described keyword is set;
Corresponding phrase after index and retrieval process is shown.
It is a kind of for realizing the mobile terminal of any one described Chinese vocabulary entry index compression method based on mobile terminal mentioned in the embodiment of the present invention that the embodiment of the present invention provides, including:
Set up unit, for sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List;
Compression coding unit, for described conjunctive word Groups List carries out the coded treatment based on described keyword position relationship in described phrase, and becomes corresponding squeezing characters string;
Memory element, for storing described squeezing characters string.
Preferably, described mobile terminal also includes:
Taxon, for all described phrases are scanned, and carries out classification process by the described phrase being associated each other;
Setup unit, for carrying out the setting of described position relationship in corresponding described phrase to the described keyword of described conjunctive word Groups List with spcial character.
Preferably, described compression coding unit, specifically for described conjunctive word Groups List being carried out the coded treatment based on described keyword according to described spcial character, and become corresponding squeezing characters string;
Wherein, the described spcial character described position relationship in corresponding described phrase is phrase centre position relation, phrase anterior locations relation, phrase end position relation;
Described spcial character is additionally included in the described phrase of correspondence and replaces representing of current character with anteposition phrase;
Described spcial character also includes the expression of the part of speech type of corresponding phrase.
Preferably, described mobile terminal also includes:
Indexed search unit, the first Chinese character for the index phrase according to the input extracted, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, to index and retrieval process that described the first Chinese character is the position relationship that described keyword is set.
Preferably, described indexed search unit specifically includes:
Extract subelement, for extracting the first Chinese character of the index phrase of input;
Display subelement, for going out visual location positioning option according to described the first chinese disply;
Index and retrieval process subelement, for according to the position location relation selected by described location positioning option, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, to index and retrieval process that described the first Chinese character is the position relationship that described keyword is set;
Show subelement, for being shown by corresponding phrase after index and retrieval process.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that
A kind of Chinese vocabulary entry index compression method based on mobile terminal of embodiment of the present invention offer and mobile terminal, wherein, Chinese vocabulary entry index compression method based on mobile terminal includes: sorted all phrases are associated according to keyword, and sets up corresponding conjunctive word Groups List, conjunctive word Groups List is carried out the coded treatment based on keyword position relationship in phrase, and becomes corresponding squeezing characters string, squeezing characters string is stored. in the present embodiment, by sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List, conjunctive word Groups List is carried out the coded treatment based on keyword position relationship in phrase again, and become corresponding squeezing characters string, achieve the data volume of storage in ROM or SD of mobile terminal to be greatly reduced, so that the processing speed of follow-up indexed search improves, solve traditional database indexing schemes, owing to not being optimized for Chinese vocabulary entry, the technical problem low in the information timeliness rate in the face of retrieving the entry comprising partial words or word caused.
Further, it is achieved that recall precision is high, takies memory space little, save hardware cost, and when being used in mobile phone etc., in loading source process data packet, reduce communication flows, save the Advantageous Effects of communication cost.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, the accompanying drawing used required in embodiment or description of the prior art will be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of an embodiment of a kind of Chinese vocabulary entry index compression method based on mobile terminal of offer in the embodiment of the present invention;
Fig. 2 is the schematic flow sheet of another embodiment of a kind of Chinese vocabulary entry index compression method based on mobile terminal of offer in the embodiment of the present invention;
Fig. 3 is the schematic flow sheet of another embodiment of a kind of Chinese vocabulary entry index compression method based on mobile terminal of offer in the embodiment of the present invention;
Fig. 4 is the structural representation of an embodiment of a kind of mobile terminal of offer in the embodiment of the present invention;
Fig. 5 is the structural representation of another embodiment of a kind of mobile terminal of offer in the embodiment of the present invention;
Fig. 6 is the structural representation of another embodiment of a kind of mobile terminal of offer in the embodiment of the present invention;
Fig. 7 is the application examples schematic diagram of Fig. 2 embodiment;
The interface schematic diagram of the mobile terminal that Fig. 8 (a) to (d) is indexed search.
Detailed description of the invention
A kind of Chinese vocabulary entry index compression method based on mobile terminal of embodiment of the present invention offer and mobile terminal, achieve the data volume of storage in ROM or SD of mobile terminal to be greatly reduced, so that the processing speed of follow-up indexed search improves, solve traditional database indexing schemes, owing to not being optimized for Chinese vocabulary entry, the technical problem low in the information timeliness rate in the face of retrieving the entry comprising partial words or word caused.
ROM: be the abbreviation of read only memory (Read-OnlyMemory), is a kind of solid state semiconductor memory that can only read prior stored data. Its characteristic be once store data just cannot again by change or delete. In the electronics being generally used for not needing often to change data or computer system, and data will not disappear because of power-off. The mobile terminal such as mobile phone, panel computer equipment all can store data with this memorizer.
Index: make the customizing messages that index of reference can quickly access in database table. Index is a kind of structure that the value to string in database table or multiple row is ranked up.
Invent one herein and namely retrieved efficiently, save again a set of retrieval scheme of memory space, support that part of speech divides (phrase, Chinese idiom), solve the minimum memory space of mobile hold facility, a difficult problem for efficient retrieval data, this index scheme is equally applicable to other electronic equipments such as PC, server.
For making the goal of the invention of the present invention, feature, the advantage can be more obvious and understandable, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, the embodiments described below are only a part of embodiment of the present invention, and not all embodiment. Based on the embodiment in the present invention, all other embodiments that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention.
Referring to Fig. 1, a kind of embodiment based on the Chinese vocabulary entry index compression method of mobile terminal provided in the embodiment of the present invention includes:
101, sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List;
In the present embodiment, in order to reduce the space of mobile terminal ROM and SD, with during operational capability, it is necessary first to sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List.
102, conjunctive word Groups List is carried out the coded treatment based on keyword position relationship in phrase, and becomes corresponding squeezing characters string;
After sorted all phrases being associated according to keyword, and set up corresponding conjunctive word Groups List, it is necessary to conjunctive word Groups List is carried out the coded treatment based on keyword position relationship in phrase, and becomes corresponding squeezing characters string.
103, squeezing characters string is stored.
When conjunctive word Groups List being carried out the coded treatment based on keyword position relationship in phrase, and after becoming corresponding squeezing characters string, it is necessary to being stored by squeezing characters string, this storage is stored in mobile terminal ROM and SD.
In the present embodiment, by sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List, conjunctive word Groups List is carried out the coded treatment based on keyword position relationship in phrase again, and become corresponding squeezing characters string, achieve the data volume of storage in ROM or SD of mobile terminal to be greatly reduced, so that the processing speed of follow-up indexed search improves, solve traditional database indexing schemes, owing to not being optimized for Chinese vocabulary entry, the technical problem low in the information timeliness rate in the face of retrieving the entry comprising partial words or word caused.
Further, it is achieved that recall precision is high, takies memory space little, save hardware cost, and when being used in mobile phone etc., in loading source process data packet, reduce communication flows, save the Advantageous Effects of communication cost.
The above is the description that the process to the Chinese vocabulary entry index compression method based on mobile terminal carries out, additional process will be described in detail below, referring to Fig. 2, a kind of another embodiment based on the Chinese vocabulary entry index compression method of mobile terminal provided in the embodiment of the present invention includes:
201, all phrases are scanned, and the phrase being associated each other is carried out classification process;
In the present embodiment, in order to reduce the space of mobile terminal ROM and SD, with during operational capability, it is necessary first to all phrases are scanned, and the phrase being associated each other is carried out classification process.
202, sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List;
In the present embodiment, when all phrases are scanned, and after the phrase being associated each other is carried out classification process, it is necessary to sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List.
203, the keyword of conjunctive word Groups List is carried out with spcial character the setting of position relationship in corresponding phrase;
After sorted all phrases being associated according to keyword, and set up corresponding conjunctive word Groups List, it is necessary to the keyword of conjunctive word Groups List is carried out with spcial character the setting of position relationship in corresponding phrase.
204, conjunctive word Groups List is carried out the coded treatment based on keyword by the spcial character of basis, and becomes corresponding squeezing characters string;
After the setting of the position relationship keyword of conjunctive word Groups List carried out with spcial character in corresponding phrase, it is necessary to according to spcial character conjunctive word Groups List carried out the coded treatment based on keyword, and become corresponding squeezing characters string.
It should be noted that the position relationship that aforesaid spcial character is in corresponding phrase is phrase centre position relation, phrase anterior locations relation, phrase end position relation;
Aforesaid spcial character is additionally included in the phrase of correspondence and replaces representing of current character with anteposition phrase;
Aforesaid spcial character also includes the expression of the part of speech type of corresponding phrase.
205, squeezing characters string is stored.
When conjunctive word Groups List being carried out the coded treatment based on keyword position relationship in phrase, and after becoming corresponding squeezing characters string, it is necessary to being stored by squeezing characters string, this storage is stored in mobile terminal ROM and SD.
It is described with a concrete application scenarios below, as it is shown in fig. 7, application examples includes:
By scanning all phrases in advance, related phrase classified and is according to keywords associated, thus can efficiently navigate to valid data region in the search to carry out computing, saving more calculation times;
With special character, linked list is encoded, list coding is become a squeezing characters string, and this character both can as string delimiter (in order to reduce), also can make to give certain implied meaning, it is possible to part of speech is made differentiation. more information can be comprised than other indexes. and reduce space hold.
Such as: using the use region voluntarily of Unicode, E000-F8FF (has 6400 self-defining characters to use) as specific coding, for simplifying, logic is described, represents these spcial characters with Arabic numerals 1,2,3,4 herein.
Such as 1 represents before keyword occurs in, and 2 represent keyword occurs in end, and 3 represent with the replacement of previous phrase.
Distinguish plus part of speech, such as Chinese idiom, common words, it is possible to be further continued for extension semantic such as table 1:
0 represents keyword occurs in centre, belongs to phrase 4 represent keyword occurs in centre, belongs to Chinese idiom
1 represents before keyword occurs in, and belongs to phrase 5 represent before keyword occurs in, and belong to Chinese idiom
2 represent keyword occurs in end, belongs to phrase 6 represent keyword occurs in end, belongs to Chinese idiom
3 represent by anteposition word replacement current character, belong to phrase 7 represent by front hyte replacement current character, belong to Chinese idiom
Table 1
Compression ratio in a large amount of Chinese characters is considerable, because phrase is often by repeating situation about forming, all scanning entries such as " university ", " university students ", " student ", Deng more than 40 ten thousand words, extractor associates formation " keyword--> conjunctive word Groups List with the phrase forming this word ", linked list is carried out dictionary sequence/or Chinese pinyin sequence, linked list spcial character is encoded, for instance: representing before keyword occurs in word with 1,2 is after word, 3, for replacing with anteposition word, are greatly reduced number of characters. with " height " this Chinese character, remove the retrieval word containing " height " this Chinese character, index comprises: " eminence ", " it is lonely at the top ", " high ", " high just learned ", " high just great virtue ", " just ", " height is not ", " it is uneven ", " higher low tide ", " uneven bars ", " lowliness and nobleness " be not if indexing compression coding optimization, plus separator (comma), linked list preserves form, such as eminence, it is lonely at the top, Gao Cai, high just learned, high just great virtue, just, height is not, it is uneven, higher low tide, uneven bars, after lowliness and nobleness passes through sequence and the coded method of the present invention, it is encoded keyword " height " replacing, storage character string become: 1 place 3 do not overcome the coldness 13 learned 3 great virtues 1 low 3 not with regard to 3 injustice the 3 damp 3 high and low concordance list length of thick stick 3 by 47 characters, 29 characters are reduced into it, directly space hold is decreased 38%.
In the present embodiment, by sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List, conjunctive word Groups List is carried out the coded treatment based on keyword position relationship in phrase again, and become corresponding squeezing characters string, achieve the data volume of storage in ROM or SD of mobile terminal to be greatly reduced, so that the processing speed of follow-up indexed search improves, solve traditional database indexing schemes, owing to not being optimized for Chinese vocabulary entry, the technical problem low in the information timeliness rate in the face of retrieving the entry comprising partial words or word caused.
Further, it is achieved that recall precision is high, takies memory space little, save hardware cost, and when being used in mobile phone etc., in loading source process data packet, reduce communication flows, save the Advantageous Effects of communication cost.
The present invention by advance Chinese vocabulary entry being carried out data directory by the inventive method, then with the form of key-value pair be stored in data base (mobile phone terminal often uses SQLite data base) inquiry service is provided. Namely data first carry out special handling (index) according to the present invention program before being saved into routine data storehouse, and are not introduced directly into data base. And, in advance to off-line data (general off-line data is all finite aggregate, the Chinese phrase of such as 40W), scan comprehensively, extract the word used in all entries, and associate with the entry employing this word according to this word. Obtain a word and all word relation lists occurring in that this word. Then it is scanned replacing (coding) to this phrase list with the self-defining character with certain semantic, is combined into a new specific squeezing characters string, ultimately produces each word and the key-value pair data of the list character string occurring in that this word. This key-value pair information can preserve with any data base. By the index of the present invention, it is possible to save substantial amounts of index storage and take up room, empirical tests will save the memory space of 30%-60%. And at database layer in the face of the time complexity of the retrieval of key-value pair is o (1), search for very fast.
The above is that additional process is described in detail, below by the first Chinese character to the index phrase according to the input extracted, squeezing characters string in conjunction with conjunctive word Groups List and the storage of correspondence, to the first Chinese character it is the index of the position relationship that keyword is set and the process of retrieval process is described in detail, referring to Fig. 3, a kind of another embodiment based on the Chinese vocabulary entry index compression method of mobile terminal provided in the embodiment of the present invention includes:
301, the first Chinese character of the index phrase of input is extracted;
In the present embodiment, when the space reducing mobile terminal ROM and SD, and after operational capability, it is necessary to during based on Chinese vocabulary entry indexed search after the Chinese vocabulary entry index compression of Fig. 2 embodiment, it is necessary first to extract the first Chinese character of the index phrase of input.
302, visual location positioning option is gone out according to the first chinese disply;
After the first Chinese character of index phrase extracting input, it is necessary to go out visual location positioning option according to the first chinese disply.
303, according to the position location relation selected by location positioning option, in conjunction with the squeezing characters string of conjunctive word Groups List and the storage of correspondence, to index and retrieval process that the first Chinese character is the position relationship that keyword is set;
After going out visual location positioning option according to the first chinese disply, need according to the position location relation selected by location positioning option, in conjunction with the squeezing characters string of conjunctive word Groups List and the storage of correspondence, to index and retrieval process that the first Chinese character is the position relationship that keyword is set.
304, corresponding phrase after index and retrieval process is shown.
When according to the position location relation selected by location positioning option, squeezing characters string in conjunction with conjunctive word Groups List and the storage of correspondence, after the index that the first Chinese character is the position relationship that keyword is set and retrieval process, it is necessary to corresponding phrase after index and retrieval process is shown.
With a concrete application scenarios, indexed search being described in detail below, as shown in Fig. 8 (a) to (d), another application examples includes:
Have only to first Chinese character by searching for phrase, navigate to contingency table region, the disposable content that reads, then press rule reduction contingency table, and normal screen selects the data comprising search word, thus reach to save source data memory space and can optimize again the purpose of retrieval.
It is required under limited Chinese character entry, carry out dynamic retrieval (dynamically inputting search condition, quick-searching the goes out result) occasion that performance requirement is higher.
Such as: by input frame inputs search condition, go out the various situations of result according to the content quick-searching of input. Such as, input " university ", and require that " university " occurs in the beginning of Search Results, middle, ending.
First Chinese character according to term is had only to when retrieval, navigate to the squeezing characters string (this character string contains the word being occurred this word) of association, by backstepping mode, squeezing characters string is reduced into phrase list, when reduction according to initial conditions, filter out the word meeting search condition.
Due to a word, and to be combined into Chinese entry with this word be relatively small amount (generally within the scope of 2,000), owing to avoiding scanning whole entries (such as 400,000) situation during search, it is directly targeted in the whole data acquisition system containing this word (squeezing characters string), and when this squeezing characters string is reduced into phrase list, because space hold is little, whole process can all be placed in the internal memory of mobile terminal, it not be used on disk unit and carry out I/O operation, so response speed is very rapid.
Traditional database index scheme is that the lead-in according to character is indexed, so search condition if with search word beginning situation, data base querying is can be made a look up by the index of data base itself, such as, " university ", can be quickly through " greatly ", retrieve with " university " all entries of starting, thus reaching the purpose of quick-searching, but for need search out comprise " university " and phrase (such as search out " capital { university } hall ", traditional database obtains and finds, only by full table scan (such as scanning 40W number evidence) every time, the result meeting searching requirement, normally result collection only has tens or hundreds of bar, but substantial amounts of calculating resource is all be wasted in altogether irrelevant entry to contrast. therefore search efficiency is low.
Referring to Fig. 4, an embodiment of a kind of mobile terminal provided in the embodiment of the present invention includes:
For realizing the Chinese vocabulary entry index compression method based on mobile terminal mentioned in Fig. 1 to Fig. 3 embodiment;
Set up unit 401, for sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List;
Compression coding unit 402, for conjunctive word Groups List carries out the coded treatment based on keyword position relationship in phrase, and becomes corresponding squeezing characters string;
Memory element 403, for storing squeezing characters string.
In the present embodiment, by setting up unit 401, sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List, conjunctive word Groups List is carried out the coded treatment based on keyword position relationship in phrase by compression coding unit 402 again, and become corresponding squeezing characters string, achieve the data volume of storage in ROM or SD of mobile terminal to be greatly reduced, so that the processing speed of follow-up indexed search improves, solve traditional database indexing schemes, owing to not being optimized for Chinese vocabulary entry, the technical problem low in the information timeliness rate in the face of retrieving the entry comprising partial words or word caused.
Further, it is achieved that recall precision is high, takies memory space little, save hardware cost, and when being used in mobile phone etc., in loading source process data packet, reduce communication flows, save the Advantageous Effects of communication cost.
The above is that each unit to mobile terminal is described in detail, and extra cell will be described in detail below, and refer to Fig. 5, and another embodiment of a kind of mobile terminal provided in the embodiment of the present invention includes:
For realizing the Chinese vocabulary entry index compression method based on mobile terminal mentioned in Fig. 1 to Fig. 3 embodiment;
Taxon 501, for all phrases are scanned, and carries out classification process by the phrase being associated each other;
Set up unit 502, for sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List;
Setup unit 503, for carrying out the setting of position relationship in corresponding phrase to the keyword of conjunctive word Groups List with spcial character;
Compression coding unit 504, for conjunctive word Groups List carries out the coded treatment based on keyword position relationship in phrase, and becomes corresponding squeezing characters string;
Compression coding unit 504, carries out the coded treatment based on keyword specifically for the spcial character of basis, and becomes corresponding squeezing characters string conjunctive word Groups List;
Wherein, spcial character position relationship in corresponding phrase is phrase centre position relation, phrase anterior locations relation, phrase end position relation;
Spcial character is additionally included in the phrase of correspondence and replaces representing of current character with anteposition phrase;
Spcial character also includes the expression of the part of speech type of corresponding phrase.
Memory element 505, for storing squeezing characters string.
In the present embodiment, by setting up unit 502, sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List, conjunctive word Groups List is carried out the coded treatment based on keyword position relationship in phrase by compression coding unit 504 again, and become corresponding squeezing characters string, achieve the data volume of storage in ROM or SD of mobile terminal to be greatly reduced, so that the processing speed of follow-up indexed search improves, solve traditional database indexing schemes, owing to not being optimized for Chinese vocabulary entry, the technical problem low in the information timeliness rate in the face of retrieving the entry comprising partial words or word caused.
Further, it is achieved that recall precision is high, takies memory space little, save hardware cost, and when being used in mobile phone etc., in loading source process data packet, reduce communication flows, save the Advantageous Effects of communication cost.
The above is that extra cell is described in detail, and indexed search unit will be described in detail below, and refer to Fig. 6, and another embodiment of a kind of mobile terminal provided in the embodiment of the present invention includes:
For realizing the Chinese vocabulary entry index compression method based on mobile terminal mentioned in Fig. 1 to Fig. 3 embodiment;
Taxon 601, for all phrases are scanned, and carries out classification process by the phrase being associated each other;
Set up unit 602, for sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List;
Setup unit 603, for carrying out the setting of position relationship in corresponding phrase to the keyword of conjunctive word Groups List with spcial character;
Compression coding unit 604, for conjunctive word Groups List carries out the coded treatment based on keyword position relationship in phrase, and becomes corresponding squeezing characters string;
Compression coding unit 604, carries out the coded treatment based on keyword specifically for the spcial character of basis, and becomes corresponding squeezing characters string conjunctive word Groups List;
Wherein, spcial character position relationship in corresponding phrase is phrase centre position relation, phrase anterior locations relation, phrase end position relation;
Spcial character is additionally included in the phrase of correspondence and replaces representing of current character with anteposition phrase;
Spcial character also includes the expression of the part of speech type of corresponding phrase.
Memory element 605, for storing squeezing characters string.
The mobile terminal of the embodiment of the present invention also includes:
Indexed search unit 606, for the first Chinese character of the index phrase according to the input extracted, in conjunction with the squeezing characters string of conjunctive word Groups List and the storage of correspondence, to index and retrieval process that the first Chinese character is the position relationship that keyword is set.
Indexed search unit 606 specifically includes:
Extract subelement 6061, for extracting the first Chinese character of the index phrase of input;
Display subelement 6062, for going out visual location positioning option according to the first chinese disply;
Index and retrieval process subelement 6063, for according to the position location relation selected by location positioning option, in conjunction with the squeezing characters string of conjunctive word Groups List and the storage of correspondence, to index and retrieval process that the first Chinese character is the position relationship that keyword is set;
Show subelement 6064, for being shown by corresponding phrase after index and retrieval process.
Retrieval efficiently, saves again a set of retrieval scheme of memory space, supports that part of speech divides (phrase, Chinese idiom), solves the minimum memory space of mobile hold facility, a difficult problem for efficient retrieval data.
Those skilled in the art is it can be understood that arrive, for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, it is possible to reference to the corresponding process in preceding method embodiment, do not repeat them here.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method, it is possible to realize by another way. Such as, device embodiment described above is merely schematic, such as, the division of described unit, being only a kind of logic function to divide, actual can have other dividing mode when realizing, for instance multiple unit or assembly can in conjunction with or be desirably integrated into another system, or some features can ignore, or do not perform. Another point, shown or discussed coupling each other or direct-coupling or communication connection can be through INDIRECT COUPLING or the communication connection of some interfaces, device or unit, it is possible to be electrical, machinery or other form.
The described unit illustrated as separating component can be or may not be physically separate, and the parts shown as unit can be or may not be physical location, namely may be located at a place, or can also be distributed on multiple NE. Some or all of unit therein can be selected according to the actual needs to realize the purpose of the present embodiment scheme.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it is also possible to be that unit is individually physically present, it is also possible to two or more unit are integrated in a unit. Above-mentioned integrated unit both can adopt the form of hardware to realize, it would however also be possible to employ the form of SFU software functional unit realizes.
If described integrated unit is using the form realization of SFU software functional unit and as independent production marketing or use, it is possible to be stored in a computer read/write memory medium. Based on such understanding, part or all or part of of this technical scheme that prior art is contributed by technical scheme substantially in other words can embody with the form of software product, this computer software product is stored in a storage medium, including some instructions with so that a computer equipment (can be personal computer, server, or the network equipment etc.) perform all or part of step of method described in each embodiment of the present invention. And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (ROM, Read-OnlyMemory), the various media that can store program code such as random access memory (RAM, RandomAccessMemory), magnetic disc or CD.
The above, above example only in order to technical scheme to be described, is not intended to limit; Although the present invention being described in detail with reference to previous embodiment, it will be understood by those within the art that: the technical scheme described in foregoing embodiments still can be modified by it, or wherein portion of techniques feature is carried out equivalent replacement; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (14)

1. the Chinese vocabulary entry index compression method based on mobile terminal, it is characterised in that including:
Sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List;
Described conjunctive word Groups List is carried out the coded treatment based on described keyword position relationship in described phrase, and becomes corresponding squeezing characters string;
Described squeezing characters string is stored.
2. the Chinese vocabulary entry index compression method based on mobile terminal according to claim 1, it is characterised in that sorted all phrases are associated according to keyword, and also include before setting up corresponding conjunctive word Groups List:
All described phrases are scanned, and the described phrase being associated each other is carried out classification process.
3. the Chinese vocabulary entry index compression method based on mobile terminal according to claim 2, it is characterized in that, described conjunctive word Groups List is carried out the coded treatment based on described keyword position relationship in described phrase, and also includes before becoming corresponding squeezing characters string:
The described keyword of described conjunctive word Groups List is carried out with spcial character the setting of described position relationship in corresponding described phrase.
4. the Chinese vocabulary entry index compression method based on mobile terminal according to claim 3, it is characterized in that, described conjunctive word Groups List is carried out the coded treatment based on described keyword position relationship in described phrase, and becomes corresponding squeezing characters string and specifically include:
According to described spcial character, described conjunctive word Groups List carried out the coded treatment based on described keyword, and become corresponding squeezing characters string.
5. the Chinese vocabulary entry index compression method based on mobile terminal according to claim 4, it is characterized in that, the described spcial character described position relationship in corresponding described phrase is phrase centre position relation, phrase anterior locations relation, phrase end position relation.
6. the Chinese vocabulary entry index compression method based on mobile terminal according to claim 5, it is characterised in that described spcial character is additionally included in the described phrase of correspondence and replaces representing of current character with anteposition phrase.
7. the Chinese vocabulary entry index compression method based on mobile terminal according to claim 6, it is characterised in that described spcial character also includes the expression of the part of speech type of corresponding phrase.
8. the Chinese vocabulary entry index compression method based on mobile terminal as claimed in any of claims 1 to 7, it is characterised in that also include after described squeezing characters string is stored:
The first Chinese character of the index phrase according to the input extracted, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, to index and retrieval process that described the first Chinese character is the position relationship that described keyword is set.
9. the Chinese vocabulary entry index compression method based on mobile terminal according to claim 8, it is characterized in that, the first Chinese character of the index phrase according to the input extracted, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, index and retrieval process to described the first Chinese character is the position relationship that described keyword is set specifically include:
Extract the first Chinese character of the index phrase of input;
Visual location positioning option is gone out according to described the first chinese disply;
According to the position location relation selected by described location positioning option, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, to index and retrieval process that described the first Chinese character is the position relationship that described keyword is set;
Corresponding phrase after index and retrieval process is shown.
10. the mobile terminal being used for realizing the Chinese vocabulary entry index compression method based on mobile terminal as in one of claimed in any of claims 1 to 9, it is characterised in that including:
Set up unit, for sorted all phrases are associated according to keyword, and set up corresponding conjunctive word Groups List;
Compression coding unit, for described conjunctive word Groups List carries out the coded treatment based on described keyword position relationship in described phrase, and becomes corresponding squeezing characters string;
Memory element, for storing described squeezing characters string.
11. mobile terminal according to claim 10, it is characterised in that described mobile terminal also includes:
Taxon, for all described phrases are scanned, and carries out classification process by the described phrase being associated each other;
Setup unit, for carrying out the setting of described position relationship in corresponding described phrase to the described keyword of described conjunctive word Groups List with spcial character.
12. mobile terminal according to claim 11, it is characterised in that described compression coding unit, specifically for described conjunctive word Groups List being carried out the coded treatment based on described keyword according to described spcial character, and become corresponding squeezing characters string;
Wherein, the described spcial character described position relationship in corresponding described phrase is phrase centre position relation, phrase anterior locations relation, phrase end position relation;
Described spcial character is additionally included in the described phrase of correspondence and replaces representing of current character with anteposition phrase;
Described spcial character also includes the expression of the part of speech type of corresponding phrase.
13. mobile terminal according to claim 12, it is characterised in that described mobile terminal also includes:
Indexed search unit, the first Chinese character for the index phrase according to the input extracted, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, to index and retrieval process that described the first Chinese character is the position relationship that described keyword is set.
14. mobile terminal according to claim 13, it is characterised in that described indexed search unit specifically includes:
Extract subelement, for extracting the first Chinese character of the index phrase of input;
Display subelement, for going out visual location positioning option according to described the first chinese disply;
Index and retrieval process subelement, for according to the position location relation selected by described location positioning option, in conjunction with the described squeezing characters string of the storage of described conjunctive word Groups List and correspondence, to index and retrieval process that described the first Chinese character is the position relationship that described keyword is set;
Show subelement, for being shown by corresponding phrase after index and retrieval process.
CN201511032929.3A 2015-12-31 2015-12-31 A kind of Chinese vocabulary entry index compression method and mobile terminal based on mobile terminal Active CN105677809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511032929.3A CN105677809B (en) 2015-12-31 2015-12-31 A kind of Chinese vocabulary entry index compression method and mobile terminal based on mobile terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511032929.3A CN105677809B (en) 2015-12-31 2015-12-31 A kind of Chinese vocabulary entry index compression method and mobile terminal based on mobile terminal

Publications (2)

Publication Number Publication Date
CN105677809A true CN105677809A (en) 2016-06-15
CN105677809B CN105677809B (en) 2019-06-28

Family

ID=56189974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511032929.3A Active CN105677809B (en) 2015-12-31 2015-12-31 A kind of Chinese vocabulary entry index compression method and mobile terminal based on mobile terminal

Country Status (1)

Country Link
CN (1) CN105677809B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258993A (en) * 2020-01-09 2020-06-09 佛山科学技术学院 Method and device for filtering abnormal data of industrial big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1928850A (en) * 2006-08-11 2007-03-14 白杰 Method and apparatus for compressing data based on digital dictionary picture-representing data
CN101557399A (en) * 2009-05-20 2009-10-14 深圳市汇海科技开发有限公司 Method for compression and decompression of XMPP protocol transmission data
CN104283567A (en) * 2013-07-02 2015-01-14 北京四维图新科技股份有限公司 Method for compressing or decompressing name data, and equipment thereof
CN104408192A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Compression processing method and device of character string type column

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1928850A (en) * 2006-08-11 2007-03-14 白杰 Method and apparatus for compressing data based on digital dictionary picture-representing data
CN101557399A (en) * 2009-05-20 2009-10-14 深圳市汇海科技开发有限公司 Method for compression and decompression of XMPP protocol transmission data
CN104283567A (en) * 2013-07-02 2015-01-14 北京四维图新科技股份有限公司 Method for compressing or decompressing name data, and equipment thereof
CN104408192A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Compression processing method and device of character string type column

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258993A (en) * 2020-01-09 2020-06-09 佛山科学技术学院 Method and device for filtering abnormal data of industrial big data

Also Published As

Publication number Publication date
CN105677809B (en) 2019-06-28

Similar Documents

Publication Publication Date Title
US9710517B2 (en) Data record compression with progressive and/or selective decomposition
KR101972645B1 (en) Clustering storage method and device
US20080010238A1 (en) Index having short-term portion and long-term portion
CN106407360B (en) Data processing method and device
CN101136020A (en) System and method for automatically spreading reference data
CN101963965A (en) Document indexing method, data query method and server based on search engine
US20150112952A1 (en) Method of data sorting
CN102867049A (en) Chinese PINYIN quick word segmentation method based on word search tree
CN111708805A (en) Data query method and device, electronic equipment and storage medium
CN112131218A (en) Hash table look-up method, device and equipment for gene comparison and storage medium
CN111506621A (en) Data statistical method and device
CN111680043B (en) Method for quickly retrieving mass data
CN111782703A (en) Method and system for automatically managing and displaying incidence relation between irrigation area object data
CN103064847A (en) Indexing equipment, indexing method, search device, search method and search system
CN105677809A (en) Chinese word entry index compression method based on mobile terminal and mobile terminal
CN103279506A (en) Method for extracting journal paper unstructured data based on electric power technology
CN107291938A (en) Order Query System and method
CN101436203B (en) Recording index method and apparatus
CN101072252A (en) Method and device for identifying mobile phone number territoriality for mobile communication terminal
CN102915324B (en) Data storage and retrieval device and data storage and retrieval method
CN115328898A (en) Data processing method and device, electronic equipment and medium
CN110659344B (en) Block method based full text search method
CN111737461B (en) Text processing method and device, electronic equipment and computer readable storage medium
CN112395856B (en) Text matching method, text matching device, computer system and readable storage medium
CN102163199A (en) Index construction method and device thereof and query method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 511442 floor 28 floor B1 of Wanda Plaza, Panyu District South Village, Guangzhou City, Guangdong

Applicant after: Guangzhou Huaduo Network Technology Co., Ltd.

Address before: 510665, Guangzhou, Whampoa Avenue, No. 2, creative industrial park, building 3-08,

Applicant before: Guangzhou Huaduo Network Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant